All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-11 12:59 ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 12:59 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Hi all,

here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
containing primarily small bug fixes. It has also been
rebased on top of recent linux-next.

vaapi-tegra-driver has been updated to support the new UAPI
as well as Tegra186:

  https://github.com/cyndis/vaapi-tegra-driver

The `putsurface` program has been tested to work.

The test suite for the new UAPI is available at
https://github.com/cyndis/uapi-test

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
v4: https://www.spinics.net/lists/dri-devel/msg279897.html

Thank you,
Mikko

Mikko Perttunen (21):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  gpu: host1x: Reserve VBLANK syncpoints at initialization
  drm/tegra: Add new UAPI to header
  drm/tegra: Boot VIC during runtime PM resume
  drm/tegra: Set resv fields when importing/exporting GEMs
  drm/tegra: Allocate per-engine channel in core code
  drm/tegra: Implement new UAPI
  drm/tegra: Implement job submission part of new UAPI
  drm/tegra: Add job firewall

 drivers/gpu/drm/tegra/Makefile         |   4 +
 drivers/gpu/drm/tegra/dc.c             |  10 +-
 drivers/gpu/drm/tegra/drm.c            |  69 ++--
 drivers/gpu/drm/tegra/drm.h            |   9 +
 drivers/gpu/drm/tegra/gem.c            |   2 +
 drivers/gpu/drm/tegra/gr2d.c           |   4 +-
 drivers/gpu/drm/tegra/gr3d.c           |   4 +-
 drivers/gpu/drm/tegra/uapi.h           |  63 ++++
 drivers/gpu/drm/tegra/uapi/firewall.c  | 221 +++++++++++++
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
 drivers/gpu/drm/tegra/uapi/submit.c    | 436 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  21 ++
 drivers/gpu/drm/tegra/uapi/uapi.c      | 307 +++++++++++++++++
 drivers/gpu/drm/tegra/vic.c            | 118 ++++---
 drivers/gpu/host1x/Makefile            |   2 +
 drivers/gpu/host1x/bus.c               |   7 +-
 drivers/gpu/host1x/cdma.c              |  69 +++-
 drivers/gpu/host1x/debug.c             |  14 +-
 drivers/gpu/host1x/dev.c               |  15 +
 drivers/gpu/host1x/dev.h               |  16 +-
 drivers/gpu/host1x/fence.c             | 208 ++++++++++++
 drivers/gpu/host1x/fence.h             |  13 +
 drivers/gpu/host1x/hw/cdma_hw.c        |   2 +-
 drivers/gpu/host1x/hw/channel_hw.c     |  63 ++--
 drivers/gpu/host1x/hw/debug_hw.c       |  11 +-
 drivers/gpu/host1x/intr.c              |  32 +-
 drivers/gpu/host1x/intr.h              |   6 +-
 drivers/gpu/host1x/job.c               |  79 +++--
 drivers/gpu/host1x/job.h               |  14 +
 drivers/gpu/host1x/syncpt.c            | 187 ++++++-----
 drivers/gpu/host1x/syncpt.h            |  16 +-
 drivers/gpu/host1x/uapi.c              | 385 ++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h              |  22 ++
 drivers/staging/media/tegra-video/vi.c |   4 +-
 include/linux/host1x.h                 |  47 ++-
 include/uapi/drm/tegra_drm.h           | 338 +++++++++++++++++--
 include/uapi/linux/host1x.h            | 134 ++++++++
 38 files changed, 2771 insertions(+), 289 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.30.0


^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-11 12:59 ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 12:59 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Hi all,

here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
containing primarily small bug fixes. It has also been
rebased on top of recent linux-next.

vaapi-tegra-driver has been updated to support the new UAPI
as well as Tegra186:

  https://github.com/cyndis/vaapi-tegra-driver

The `putsurface` program has been tested to work.

The test suite for the new UAPI is available at
https://github.com/cyndis/uapi-test

The series can be also found in
https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.

Older versions:
v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
v4: https://www.spinics.net/lists/dri-devel/msg279897.html

Thank you,
Mikko

Mikko Perttunen (21):
  gpu: host1x: Use different lock classes for each client
  gpu: host1x: Allow syncpoints without associated client
  gpu: host1x: Show number of pending waiters in debugfs
  gpu: host1x: Remove cancelled waiters immediately
  gpu: host1x: Use HW-equivalent syncpoint expiration check
  gpu: host1x: Cleanup and refcounting for syncpoints
  gpu: host1x: Introduce UAPI header
  gpu: host1x: Implement /dev/host1x device node
  gpu: host1x: DMA fences and userspace fence creation
  gpu: host1x: Add no-recovery mode
  gpu: host1x: Add job release callback
  gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  gpu: host1x: Reset max value when freeing a syncpoint
  gpu: host1x: Reserve VBLANK syncpoints at initialization
  drm/tegra: Add new UAPI to header
  drm/tegra: Boot VIC during runtime PM resume
  drm/tegra: Set resv fields when importing/exporting GEMs
  drm/tegra: Allocate per-engine channel in core code
  drm/tegra: Implement new UAPI
  drm/tegra: Implement job submission part of new UAPI
  drm/tegra: Add job firewall

 drivers/gpu/drm/tegra/Makefile         |   4 +
 drivers/gpu/drm/tegra/dc.c             |  10 +-
 drivers/gpu/drm/tegra/drm.c            |  69 ++--
 drivers/gpu/drm/tegra/drm.h            |   9 +
 drivers/gpu/drm/tegra/gem.c            |   2 +
 drivers/gpu/drm/tegra/gr2d.c           |   4 +-
 drivers/gpu/drm/tegra/gr3d.c           |   4 +-
 drivers/gpu/drm/tegra/uapi.h           |  63 ++++
 drivers/gpu/drm/tegra/uapi/firewall.c  | 221 +++++++++++++
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
 drivers/gpu/drm/tegra/uapi/submit.c    | 436 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  21 ++
 drivers/gpu/drm/tegra/uapi/uapi.c      | 307 +++++++++++++++++
 drivers/gpu/drm/tegra/vic.c            | 118 ++++---
 drivers/gpu/host1x/Makefile            |   2 +
 drivers/gpu/host1x/bus.c               |   7 +-
 drivers/gpu/host1x/cdma.c              |  69 +++-
 drivers/gpu/host1x/debug.c             |  14 +-
 drivers/gpu/host1x/dev.c               |  15 +
 drivers/gpu/host1x/dev.h               |  16 +-
 drivers/gpu/host1x/fence.c             | 208 ++++++++++++
 drivers/gpu/host1x/fence.h             |  13 +
 drivers/gpu/host1x/hw/cdma_hw.c        |   2 +-
 drivers/gpu/host1x/hw/channel_hw.c     |  63 ++--
 drivers/gpu/host1x/hw/debug_hw.c       |  11 +-
 drivers/gpu/host1x/intr.c              |  32 +-
 drivers/gpu/host1x/intr.h              |   6 +-
 drivers/gpu/host1x/job.c               |  79 +++--
 drivers/gpu/host1x/job.h               |  14 +
 drivers/gpu/host1x/syncpt.c            | 187 ++++++-----
 drivers/gpu/host1x/syncpt.h            |  16 +-
 drivers/gpu/host1x/uapi.c              | 385 ++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h              |  22 ++
 drivers/staging/media/tegra-video/vi.c |   4 +-
 include/linux/host1x.h                 |  47 ++-
 include/uapi/drm/tegra_drm.h           | 338 +++++++++++++++++--
 include/uapi/linux/host1x.h            | 134 ++++++++
 38 files changed, 2771 insertions(+), 289 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h
 create mode 100644 include/uapi/linux/host1x.h

-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 12:59   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 12:59 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 7 ++++---
 include/linux/host1x.h   | 9 ++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 347fb962b6c9..8fc79e9cb652 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
  * device and call host1x_device_init(), which will in turn call each client's
  * &host1x_client_ops.init implementation.
  */
-int host1x_client_register(struct host1x_client *client)
+int __host1x_client_register(struct host1x_client *client,
+			   struct lock_class_key *key)
 {
 	struct host1x *host1x;
 	int err;
 
 	INIT_LIST_HEAD(&client->list);
-	mutex_init(&client->lock);
+	__mutex_init(&client->lock, "host1x client lock", key);
 	client->usecount = 0;
 
 	mutex_lock(&devices_lock);
@@ -742,7 +743,7 @@ int host1x_client_register(struct host1x_client *client)
 
 	return 0;
 }
-EXPORT_SYMBOL(host1x_client_register);
+EXPORT_SYMBOL(__host1x_client_register);
 
 /**
  * host1x_client_unregister() - unregister a host1x client
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index ce59a6a6a008..9eb77c87a83b 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -320,7 +320,14 @@ static inline struct host1x_device *to_host1x_device(struct device *dev)
 int host1x_device_init(struct host1x_device *device);
 int host1x_device_exit(struct host1x_device *device);
 
-int host1x_client_register(struct host1x_client *client);
+int __host1x_client_register(struct host1x_client *client,
+			     struct lock_class_key *key);
+#define host1x_client_register(class) \
+	({ \
+		static struct lock_class_key __key; \
+		__host1x_client_register(class, &__key); \
+	})
+
 int host1x_client_unregister(struct host1x_client *client);
 
 int host1x_client_suspend(struct host1x_client *client);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-01-11 12:59   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 12:59 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

To avoid false lockdep warnings, give each client lock a different
lock class, passed from the initialization site by macro.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/bus.c | 7 ++++---
 include/linux/host1x.h   | 9 ++++++++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
index 347fb962b6c9..8fc79e9cb652 100644
--- a/drivers/gpu/host1x/bus.c
+++ b/drivers/gpu/host1x/bus.c
@@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
  * device and call host1x_device_init(), which will in turn call each client's
  * &host1x_client_ops.init implementation.
  */
-int host1x_client_register(struct host1x_client *client)
+int __host1x_client_register(struct host1x_client *client,
+			   struct lock_class_key *key)
 {
 	struct host1x *host1x;
 	int err;
 
 	INIT_LIST_HEAD(&client->list);
-	mutex_init(&client->lock);
+	__mutex_init(&client->lock, "host1x client lock", key);
 	client->usecount = 0;
 
 	mutex_lock(&devices_lock);
@@ -742,7 +743,7 @@ int host1x_client_register(struct host1x_client *client)
 
 	return 0;
 }
-EXPORT_SYMBOL(host1x_client_register);
+EXPORT_SYMBOL(__host1x_client_register);
 
 /**
  * host1x_client_unregister() - unregister a host1x client
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index ce59a6a6a008..9eb77c87a83b 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -320,7 +320,14 @@ static inline struct host1x_device *to_host1x_device(struct device *dev)
 int host1x_device_init(struct host1x_device *device);
 int host1x_device_exit(struct host1x_device *device);
 
-int host1x_client_register(struct host1x_client *client);
+int __host1x_client_register(struct host1x_client *client,
+			     struct lock_class_key *key);
+#define host1x_client_register(class) \
+	({ \
+		static struct lock_class_key __key; \
+		__host1x_client_register(class, &__key); \
+	})
+
 int host1x_client_unregister(struct host1x_client *client);
 
 int host1x_client_suspend(struct host1x_client *client);
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Clean up host1x_syncpt_alloc signature to allow specifying
  a name for the syncpoint.
* Export the function.
---
 drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
 drivers/gpu/host1x/syncpt.h |  1 -
 include/linux/host1x.h      |  3 +++
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index fce7892d5137..5982fdf64e1c 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
 		base->requested = false;
 }
 
-static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
-						 struct host1x_client *client,
-						 unsigned long flags)
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name)
 {
 	struct host1x_syncpt *sp = host->syncpt;
+	char *full_name;
 	unsigned int i;
-	char *name;
 
 	mutex_lock(&host->syncpt_mutex);
 
@@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 			goto unlock;
 	}
 
-	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
-			 client ? dev_name(client->dev) : NULL);
-	if (!name)
+	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
+	if (!full_name)
 		goto free_base;
 
-	sp->client = client;
-	sp->name = name;
+	sp->name = full_name;
 
 	if (flags & HOST1X_SYNCPT_CLIENT_MANAGED)
 		sp->client_managed = true;
@@ -87,6 +85,7 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	mutex_unlock(&host->syncpt_mutex);
 	return NULL;
 }
+EXPORT_SYMBOL(host1x_syncpt_alloc);
 
 /**
  * host1x_syncpt_id() - retrieve syncpoint ID
@@ -401,7 +400,7 @@ int host1x_syncpt_init(struct host1x *host)
 	host1x_hw_syncpt_enable_protection(host);
 
 	/* Allocate sync point to use for clearing waits for expired fences */
-	host->nop_sp = host1x_syncpt_alloc(host, NULL, 0);
+	host->nop_sp = host1x_syncpt_alloc(host, 0, "reserved-nop");
 	if (!host->nop_sp)
 		return -ENOMEM;
 
@@ -423,7 +422,7 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 {
 	struct host1x *host = dev_get_drvdata(client->host->parent);
 
-	return host1x_syncpt_alloc(host, client, flags);
+	return host1x_syncpt_alloc(host, flags, dev_name(client->dev));
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
@@ -447,7 +446,6 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 	host1x_syncpt_base_free(sp->base);
 	kfree(sp->name);
 	sp->base = NULL;
-	sp->client = NULL;
 	sp->name = NULL;
 	sp->client_managed = false;
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 8e1d04dacaa0..3aa6b25b1b9c 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -33,7 +33,6 @@ struct host1x_syncpt {
 	const char *name;
 	bool client_managed;
 	struct host1x *host;
-	struct host1x_client *client;
 	struct host1x_syncpt_base *base;
 
 	/* interrupt data */
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 9eb77c87a83b..7137ce0e35d4 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -154,6 +154,9 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
 void host1x_syncpt_free(struct host1x_syncpt *sp);
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name);
 
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Syncpoints don't need to be associated with any client,
so remove the property, and expose host1x_syncpt_alloc.
This will allow allocating syncpoints without prior knowledge
of the engine that it will be used with.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Clean up host1x_syncpt_alloc signature to allow specifying
  a name for the syncpoint.
* Export the function.
---
 drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
 drivers/gpu/host1x/syncpt.h |  1 -
 include/linux/host1x.h      |  3 +++
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index fce7892d5137..5982fdf64e1c 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
 		base->requested = false;
 }
 
-static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
-						 struct host1x_client *client,
-						 unsigned long flags)
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name)
 {
 	struct host1x_syncpt *sp = host->syncpt;
+	char *full_name;
 	unsigned int i;
-	char *name;
 
 	mutex_lock(&host->syncpt_mutex);
 
@@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 			goto unlock;
 	}
 
-	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
-			 client ? dev_name(client->dev) : NULL);
-	if (!name)
+	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
+	if (!full_name)
 		goto free_base;
 
-	sp->client = client;
-	sp->name = name;
+	sp->name = full_name;
 
 	if (flags & HOST1X_SYNCPT_CLIENT_MANAGED)
 		sp->client_managed = true;
@@ -87,6 +85,7 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	mutex_unlock(&host->syncpt_mutex);
 	return NULL;
 }
+EXPORT_SYMBOL(host1x_syncpt_alloc);
 
 /**
  * host1x_syncpt_id() - retrieve syncpoint ID
@@ -401,7 +400,7 @@ int host1x_syncpt_init(struct host1x *host)
 	host1x_hw_syncpt_enable_protection(host);
 
 	/* Allocate sync point to use for clearing waits for expired fences */
-	host->nop_sp = host1x_syncpt_alloc(host, NULL, 0);
+	host->nop_sp = host1x_syncpt_alloc(host, 0, "reserved-nop");
 	if (!host->nop_sp)
 		return -ENOMEM;
 
@@ -423,7 +422,7 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 {
 	struct host1x *host = dev_get_drvdata(client->host->parent);
 
-	return host1x_syncpt_alloc(host, client, flags);
+	return host1x_syncpt_alloc(host, flags, dev_name(client->dev));
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
@@ -447,7 +446,6 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 	host1x_syncpt_base_free(sp->base);
 	kfree(sp->name);
 	sp->base = NULL;
-	sp->client = NULL;
 	sp->name = NULL;
 	sp->client_managed = false;
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 8e1d04dacaa0..3aa6b25b1b9c 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -33,7 +33,6 @@ struct host1x_syncpt {
 	const char *name;
 	bool client_managed;
 	struct host1x *host;
-	struct host1x_client *client;
 	struct host1x_syncpt_base *base;
 
 	/* interrupt data */
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 9eb77c87a83b..7137ce0e35d4 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -154,6 +154,9 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
 void host1x_syncpt_free(struct host1x_syncpt *sp);
+struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
+					  unsigned long flags,
+					  const char *name);
 
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 1b4997bda1c7..8a14880c61bb 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
 
 static void show_syncpts(struct host1x *m, struct output *o)
 {
+	struct list_head *pos;
 	unsigned int i;
 
 	host1x_debug_output(o, "---- syncpts ----\n");
@@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
 	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
 		u32 max = host1x_syncpt_read_max(m->syncpt + i);
 		u32 min = host1x_syncpt_load(m->syncpt + i);
+		unsigned int waiters = 0;
 
-		if (!min && !max)
+		spin_lock(&m->syncpt[i].intr.lock);
+		list_for_each(pos, &m->syncpt[i].intr.wait_head)
+			waiters++;
+		spin_unlock(&m->syncpt[i].intr.lock);
+
+		if (!min && !max && !waiters)
 			continue;
 
-		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
-				    i, m->syncpt[i].name, min, max);
+		host1x_debug_output(o,
+				    "id %u (%s) min %d max %d (%d waiters)\n",
+				    i, m->syncpt[i].name, min, max, waiters);
 	}
 
 	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Show the number of pending waiters in the debugfs status file.
This is useful for testing to verify that waiters do not leak
or accumulate incorrectly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/debug.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
index 1b4997bda1c7..8a14880c61bb 100644
--- a/drivers/gpu/host1x/debug.c
+++ b/drivers/gpu/host1x/debug.c
@@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
 
 static void show_syncpts(struct host1x *m, struct output *o)
 {
+	struct list_head *pos;
 	unsigned int i;
 
 	host1x_debug_output(o, "---- syncpts ----\n");
@@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
 	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
 		u32 max = host1x_syncpt_read_max(m->syncpt + i);
 		u32 min = host1x_syncpt_load(m->syncpt + i);
+		unsigned int waiters = 0;
 
-		if (!min && !max)
+		spin_lock(&m->syncpt[i].intr.lock);
+		list_for_each(pos, &m->syncpt[i].intr.wait_head)
+			waiters++;
+		spin_unlock(&m->syncpt[i].intr.lock);
+
+		if (!min && !max && !waiters)
 			continue;
 
-		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
-				    i, m->syncpt[i].name, min, max);
+		host1x_debug_output(o,
+				    "id %u (%s) min %d max %d (%d waiters)\n",
+				    i, m->syncpt[i].name, min, max, waiters);
 	}
 
 	for (i = 0; i < host1x_syncpt_nb_bases(m); i++) {
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Add parameter to flush, i.e. wait for all pending waiters to
  complete before returning. The reason this is not always true
  is that the pending waiter might be the place that is calling
  the put_ref.
---
 drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
 drivers/gpu/host1x/intr.h   |  4 +++-
 drivers/gpu/host1x/syncpt.c |  2 +-
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..70e1096a4fe9 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -242,18 +242,29 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
 	return 0;
 }
 
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush)
 {
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
+
+	if (flush) {
+		/* Wait until any concurrently executing handler has finished. */
+		while (atomic_read(&waiter->state) != WLS_HANDLED)
+			cpu_relax();
+	}
 
 	kref_put(&waiter->refcount, waiter_release);
 }
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..6ea55e615e3a 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -74,8 +74,10 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
  * Unreference an action submitted to host1x_intr_add_action().
  * You must call this if you passed non-NULL as ref.
  * @ref the ref returned from host1x_intr_add_action()
+ * @flush wait until any pending handlers have completed before returning.
  */
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref);
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush);
 
 /* Initialize host1x sync point interrupt */
 int host1x_intr_init(struct host1x *host, unsigned int irq_sync);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5982fdf64e1c..e48b4595cf53 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -293,7 +293,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		}
 	}
 
-	host1x_intr_put_ref(sp->host, sp->id, ref);
+	host1x_intr_put_ref(sp->host, sp->id, ref, true);
 
 done:
 	return err;
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Before this patch, cancelled waiters would only be cleaned up
once their threshold value was reached. Make host1x_intr_put_ref
process the cancellation immediately to fix this.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Add parameter to flush, i.e. wait for all pending waiters to
  complete before returning. The reason this is not always true
  is that the pending waiter might be the place that is calling
  the put_ref.
---
 drivers/gpu/host1x/intr.c   | 23 +++++++++++++++++------
 drivers/gpu/host1x/intr.h   |  4 +++-
 drivers/gpu/host1x/syncpt.c |  2 +-
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 9245add23b5d..70e1096a4fe9 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -242,18 +242,29 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
 	return 0;
 }
 
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush)
 {
 	struct host1x_waitlist *waiter = ref;
 	struct host1x_syncpt *syncpt;
 
-	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
-	       WLS_REMOVED)
-		schedule();
+	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
 
 	syncpt = host->syncpt + id;
-	(void)process_wait_list(host, syncpt,
-				host1x_syncpt_load(host->syncpt + id));
+
+	spin_lock(&syncpt->intr.lock);
+	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
+	    WLS_CANCELLED) {
+		list_del(&waiter->list);
+		kref_put(&waiter->refcount, waiter_release);
+	}
+	spin_unlock(&syncpt->intr.lock);
+
+	if (flush) {
+		/* Wait until any concurrently executing handler has finished. */
+		while (atomic_read(&waiter->state) != WLS_HANDLED)
+			cpu_relax();
+	}
 
 	kref_put(&waiter->refcount, waiter_release);
 }
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index aac38194398f..6ea55e615e3a 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -74,8 +74,10 @@ int host1x_intr_add_action(struct host1x *host, struct host1x_syncpt *syncpt,
  * Unreference an action submitted to host1x_intr_add_action().
  * You must call this if you passed non-NULL as ref.
  * @ref the ref returned from host1x_intr_add_action()
+ * @flush wait until any pending handlers have completed before returning.
  */
-void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref);
+void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
+			 bool flush);
 
 /* Initialize host1x sync point interrupt */
 int host1x_intr_init(struct host1x *host, unsigned int irq_sync);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 5982fdf64e1c..e48b4595cf53 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -293,7 +293,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		}
 	}
 
-	host1x_intr_put_ref(sp->host, sp->id, ref);
+	host1x_intr_put_ref(sp->host, sp->id, ref, true);
 
 done:
 	return err;
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 05/21] gpu: host1x: Use HW-equivalent syncpoint expiration check
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index e48b4595cf53..9ccdf7709946 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
 bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 {
 	u32 current_val;
-	u32 future_val;
 
 	smp_rmb();
 
 	current_val = (u32)atomic_read(&sp->min_val);
-	future_val = (u32)atomic_read(&sp->max_val);
-
-	/* Note the use of unsigned arithmetic here (mod 1<<32).
-	 *
-	 * c = current_val = min_val	= the current value of the syncpoint.
-	 * t = thresh			= the value we are checking
-	 * f = future_val  = max_val	= the value c will reach when all
-	 *				  outstanding increments have completed.
-	 *
-	 * Note that c always chases f until it reaches f.
-	 *
-	 * Dtf = (f - t)
-	 * Dtc = (c - t)
-	 *
-	 *  Consider all cases:
-	 *
-	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
-	 *	B) .....c.....f..t..	Dtf > Dtc	expired
-	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
-	 *
-	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
-	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
-	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
-	 *							Dtc!=0)
-	 *
-	 *  Other cases:
-	 *
-	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
-	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
-	 *	A) .....f..t..c.....	Dtf > Dtc	expired
-	 *
-	 *   So:
-	 *	   Dtf >= Dtc implies EXPIRED	(return true)
-	 *	   Dtf <  Dtc implies WAIT	(return false)
-	 *
-	 * Note: If t is expired then we *cannot* wait on it. We would wait
-	 * forever (hang the system).
-	 *
-	 * Note: do NOT get clever and remove the -thresh from both sides. It
-	 * is NOT the same.
-	 *
-	 * If future valueis zero, we have a client managed sync point. In that
-	 * case we do a direct comparison.
-	 */
-	if (!host1x_syncpt_client_managed(sp))
-		return future_val - thresh >= current_val - thresh;
-	else
-		return (s32)(current_val - thresh) >= 0;
+
+	return ((current_val - thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 05/21] gpu: host1x: Use HW-equivalent syncpoint expiration check
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Make syncpoint expiration checks always use the same logic used by
the hardware. This ensures that there are no race conditions that
could occur because of the hardware triggering a syncpoint interrupt
and then the driver disagreeing.

One situation where this could occur is if a job incremented a
syncpoint too many times -- then the hardware would trigger an
interrupt, but the driver would assume that a syncpoint value
greater than the syncpoint's max value is in the future, and not
clean up the job.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
 1 file changed, 2 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index e48b4595cf53..9ccdf7709946 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
 bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
 {
 	u32 current_val;
-	u32 future_val;
 
 	smp_rmb();
 
 	current_val = (u32)atomic_read(&sp->min_val);
-	future_val = (u32)atomic_read(&sp->max_val);
-
-	/* Note the use of unsigned arithmetic here (mod 1<<32).
-	 *
-	 * c = current_val = min_val	= the current value of the syncpoint.
-	 * t = thresh			= the value we are checking
-	 * f = future_val  = max_val	= the value c will reach when all
-	 *				  outstanding increments have completed.
-	 *
-	 * Note that c always chases f until it reaches f.
-	 *
-	 * Dtf = (f - t)
-	 * Dtc = (c - t)
-	 *
-	 *  Consider all cases:
-	 *
-	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
-	 *	B) .....c.....f..t..	Dtf > Dtc	expired
-	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
-	 *
-	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
-	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
-	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
-	 *							Dtc!=0)
-	 *
-	 *  Other cases:
-	 *
-	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
-	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
-	 *	A) .....f..t..c.....	Dtf > Dtc	expired
-	 *
-	 *   So:
-	 *	   Dtf >= Dtc implies EXPIRED	(return true)
-	 *	   Dtf <  Dtc implies WAIT	(return false)
-	 *
-	 * Note: If t is expired then we *cannot* wait on it. We would wait
-	 * forever (hang the system).
-	 *
-	 * Note: do NOT get clever and remove the -thresh from both sides. It
-	 * is NOT the same.
-	 *
-	 * If future valueis zero, we have a client managed sync point. In that
-	 * case we do a direct comparison.
-	 */
-	if (!host1x_syncpt_client_managed(sp))
-		return future_val - thresh >= current_val - thresh;
-	else
-		return (s32)(current_val - thresh) >= 0;
+
+	return ((current_val - thresh) & 0x80000000U) == 0U;
 }
 
 int host1x_syncpt_init(struct host1x *host)
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
- Remove host1x_syncpt_put in submit code, as job_put already
  puts the syncpoint.
- Changes due to rebase in VI driver.
v4:
- Update from _free to _put in VI driver as well
---
 drivers/gpu/drm/tegra/dc.c             |  4 +-
 drivers/gpu/drm/tegra/drm.c            | 14 ++---
 drivers/gpu/drm/tegra/gr2d.c           |  4 +-
 drivers/gpu/drm/tegra/gr3d.c           |  4 +-
 drivers/gpu/drm/tegra/vic.c            |  4 +-
 drivers/gpu/host1x/cdma.c              | 11 ++--
 drivers/gpu/host1x/dev.h               |  7 ++-
 drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
 drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
 drivers/gpu/host1x/job.c               |  5 +-
 drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
 drivers/gpu/host1x/syncpt.h            |  3 ++
 drivers/staging/media/tegra-video/vi.c |  4 +-
 include/linux/host1x.h                 |  8 +--
 15 files changed, 98 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 85dd7131553a..033032dfc4b9 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
 		drm_plane_cleanup(primary);
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return err;
 }
@@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
 	}
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index e45c8414e2a3..5a6037eff37f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	struct drm_tegra_syncpt syncpt;
 	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
 	struct drm_gem_object **refs;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = NULL;
 	struct host1x_job *job;
 	unsigned int num_refs;
 	int err;
@@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 		goto fail;
 	}
 
-	/* check whether syncpoint ID is valid */
-	sp = host1x_syncpt_get(host1x, syncpt.id);
+	/* Syncpoint ref will be dropped on job release. */
+	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
 	if (!sp) {
 		err = -ENOENT;
 		goto fail;
@@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->is_addr_reg = context->client->ops->is_addr_reg;
 	job->is_valid_class = context->client->ops->is_valid_class;
 	job->syncpt_incrs = syncpt.incrs;
-	job->syncpt_id = syncpt.id;
+	job->syncpt = sp;
 	job->timeout = 10000;
 
 	if (args->timeout && args->timeout < 10000)
@@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_read *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_incr *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -409,7 +409,7 @@ static int tegra_syncpt_wait(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_wait *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 1a0d3ba6e525..d857a99b21a7 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -67,7 +67,7 @@ static int gr2d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr2d->channel);
 	return err;
@@ -86,7 +86,7 @@ static int gr2d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr2d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index b0b8154e8104..24442ade0da3 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -76,7 +76,7 @@ static int gr3d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr3d->channel);
 	return err;
@@ -94,7 +94,7 @@ static int gr3d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr3d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index ade56b860cf9..cb476da59adc 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -197,7 +197,7 @@ static int vic_init(struct host1x_client *client)
 	return 0;
 
 free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 free_channel:
 	host1x_channel_put(vic->channel);
 detach:
@@ -221,7 +221,7 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(vic->channel);
 	host1x_client_iommu_detach(client);
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e8d3fda91d8a..6e6ca774f68d 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -273,15 +273,13 @@ static int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
 static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 				    struct host1x_job *job)
 {
-	struct host1x *host = cdma_to_host1x(cdma);
-
 	if (cdma->timeout.client) {
 		/* timer already started */
 		return;
 	}
 
 	cdma->timeout.client = job->client;
-	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt = job->syncpt;
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
 
@@ -312,7 +310,6 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 static void update_cdma_locked(struct host1x_cdma *cdma)
 {
 	bool signal = false;
-	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_job *job, *n;
 
 	/* If CDMA is stopped, queue is cleared and we can return */
@@ -324,8 +321,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	 * to consume as many sync queue entries as possible without blocking
 	 */
 	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
-		struct host1x_syncpt *sp =
-			host1x_syncpt_get(host1x, job->syncpt_id);
+		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
 		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
@@ -499,8 +495,7 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 		if (!cdma->timeout.initialized) {
 			int err;
 
-			err = host1x_hw_cdma_timeout_init(host1x, cdma,
-							  job->syncpt_id);
+			err = host1x_hw_cdma_timeout_init(host1x, cdma);
 			if (err) {
 				mutex_unlock(&cdma->lock);
 				return err;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index f781a9b0f39d..63010ae37a97 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -37,7 +37,7 @@ struct host1x_cdma_ops {
 	void (*start)(struct host1x_cdma *cdma);
 	void (*stop)(struct host1x_cdma *cdma);
 	void (*flush)(struct  host1x_cdma *cdma);
-	int (*timeout_init)(struct host1x_cdma *cdma, unsigned int syncpt);
+	int (*timeout_init)(struct host1x_cdma *cdma);
 	void (*timeout_destroy)(struct host1x_cdma *cdma);
 	void (*freeze)(struct host1x_cdma *cdma);
 	void (*resume)(struct host1x_cdma *cdma, u32 getptr);
@@ -261,10 +261,9 @@ static inline void host1x_hw_cdma_flush(struct host1x *host,
 }
 
 static inline int host1x_hw_cdma_timeout_init(struct host1x *host,
-					      struct host1x_cdma *cdma,
-					      unsigned int syncpt)
+					      struct host1x_cdma *cdma)
 {
-	return host->cdma_op->timeout_init(cdma, syncpt);
+	return host->cdma_op->timeout_init(cdma);
 }
 
 static inline void host1x_hw_cdma_timeout_destroy(struct host1x *host,
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 2f3bf94cf365..e49cd5b8f735 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -295,7 +295,7 @@ static void cdma_timeout_handler(struct work_struct *work)
 /*
  * Init timeout resources
  */
-static int cdma_timeout_init(struct host1x_cdma *cdma, unsigned int syncpt)
+static int cdma_timeout_init(struct host1x_cdma *cdma)
 {
 	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
 	cdma->timeout.initialized = true;
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5eaa29d171c9..d4c28faf27d1 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -86,8 +86,7 @@ static void submit_gathers(struct host1x_job *job)
 
 static inline void synchronize_syncpt_base(struct host1x_job *job)
 {
-	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
-	struct host1x_syncpt *sp = host->syncpt + job->syncpt_id;
+	struct host1x_syncpt *sp = job->syncpt;
 	unsigned int id;
 	u32 value;
 
@@ -118,7 +117,7 @@ static void host1x_channel_set_streamid(struct host1x_channel *channel)
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = job->syncpt;
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
@@ -126,10 +125,9 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
-	sp = host->syncpt + job->syncpt_id;
 	trace_host1x_channel_submit(dev_name(ch->dev),
 				    job->num_gathers, job->num_relocs,
-				    job->syncpt_id, job->syncpt_incrs);
+				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
@@ -163,7 +161,7 @@ static int channel_submit(struct host1x_job *job)
 		host1x_cdma_push(&ch->cdma,
 				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
 					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
+				 host1x_class_host_wait_syncpt(job->syncpt->id,
 					host1x_syncpt_read_max(sp)));
 	}
 
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f31bcfa1b837..ceb48229d14b 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -204,7 +204,7 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 		unsigned int i;
 
 		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d, first_get=%08x, timeout=%d num_slots=%d, num_handles=%d\n",
-				    job, job->syncpt_id, job->syncpt_end,
+				    job, job->syncpt->id, job->syncpt_end,
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 82d0a60ba3f7..adbdc225de8d 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->syncpt)
+		host1x_syncpt_put(job->syncpt);
+
 	kfree(job);
 }
 
@@ -674,7 +677,7 @@ EXPORT_SYMBOL(host1x_job_unpin);
  */
 void host1x_job_dump(struct device *dev, struct host1x_job *job)
 {
-	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt->id);
 	dev_dbg(dev, "    SYNCPT_VAL  %d\n", job->syncpt_end);
 	dev_dbg(dev, "    FIRST_GET   0x%x\n", job->first_get);
 	dev_dbg(dev, "    TIMEOUT     %d\n", job->timeout);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9ccdf7709946..8a189a7c8d68 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -75,6 +75,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	else
 		sp->client_managed = false;
 
+	kref_init(&sp->ref);
+
 	mutex_unlock(&host->syncpt_mutex);
 	return sp;
 
@@ -368,7 +370,7 @@ int host1x_syncpt_init(struct host1x *host)
  * host1x client drivers can use this function to allocate a syncpoint for
  * subsequent use. A syncpoint returned by this function will be reserved for
  * use by the client exclusively. When no longer using a syncpoint, a host1x
- * client driver needs to release it using host1x_syncpt_free().
+ * client driver needs to release it using host1x_syncpt_put().
  */
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags)
@@ -379,20 +381,9 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
-/**
- * host1x_syncpt_free() - free a requested syncpoint
- * @sp: host1x syncpoint
- *
- * Release a syncpoint previously allocated using host1x_syncpt_request(). A
- * host1x client driver should call this when the syncpoint is no longer in
- * use. Note that client drivers must ensure that the syncpoint doesn't remain
- * under the control of hardware after calling this function, otherwise two
- * clients may end up trying to access the same syncpoint concurrently.
- */
-void host1x_syncpt_free(struct host1x_syncpt *sp)
+static void syncpt_release(struct kref *ref)
 {
-	if (!sp)
-		return;
+	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
 	mutex_lock(&sp->host->syncpt_mutex);
 
@@ -404,7 +395,23 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 
 	mutex_unlock(&sp->host->syncpt_mutex);
 }
-EXPORT_SYMBOL(host1x_syncpt_free);
+
+/**
+ * host1x_syncpt_put() - free a requested syncpoint
+ * @sp: host1x syncpoint
+ *
+ * Release a syncpoint previously allocated using host1x_syncpt_request(). A
+ * host1x client driver should call this when the syncpoint is no longer in
+ * use.
+ */
+void host1x_syncpt_put(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kref_put(&sp->ref, syncpt_release);
+}
+EXPORT_SYMBOL(host1x_syncpt_put);
 
 void host1x_syncpt_deinit(struct host1x *host)
 {
@@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
 }
 
 /**
- * host1x_syncpt_get() - obtain a syncpoint by ID
+ * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
+ * @host: host1x controller
+ * @id: syncpoint ID
+ */
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
+					      unsigned int id)
+{
+	if (id >= host->info->nb_pts)
+		return NULL;
+
+	if (kref_get_unless_zero(&host->syncpt[id].ref))
+		return &host->syncpt[id];
+	else
+		return NULL;
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id);
+
+/**
+ * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
+ * 	increase the refcount.
  * @host: host1x controller
  * @id: syncpoint ID
  */
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
+						    unsigned int id)
 {
 	if (id >= host->info->nb_pts)
 		return NULL;
 
-	return host->syncpt + id;
+	return &host->syncpt[id];
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
+
+/**
+ * host1x_syncpt_get() - increment syncpoint refcount
+ * @sp: syncpoint
+ */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
+{
+	kref_get(&sp->ref);
+
+	return sp;
 }
 EXPORT_SYMBOL(host1x_syncpt_get);
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 3aa6b25b1b9c..a6766f8d55ee 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/host1x.h>
 #include <linux/kernel.h>
+#include <linux/kref.h>
 #include <linux/sched.h>
 
 #include "intr.h"
@@ -26,6 +27,8 @@ struct host1x_syncpt_base {
 };
 
 struct host1x_syncpt {
+	struct kref ref;
+
 	unsigned int id;
 	atomic_t min_val;
 	atomic_t max_val;
diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c
index 70e1e18644b2..e14be37cdb01 100644
--- a/drivers/staging/media/tegra-video/vi.c
+++ b/drivers/staging/media/tegra-video/vi.c
@@ -1131,8 +1131,8 @@ static void tegra_channel_host1x_syncpts_free(struct tegra_vi_channel *chan)
 	int i;
 
 	for (i = 0; i < chan->numgangports; i++) {
-		host1x_syncpt_free(chan->mw_ack_sp[i]);
-		host1x_syncpt_free(chan->frame_start_sp[i]);
+		host1x_syncpt_put(chan->mw_ack_sp[i]);
+		host1x_syncpt_put(chan->frame_start_sp[i]);
 	}
 }
 
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 7137ce0e35d4..107aea29bccb 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -142,7 +142,9 @@ struct host1x_syncpt_base;
 struct host1x_syncpt;
 struct host1x;
 
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp);
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_min(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_max(struct host1x_syncpt *sp);
@@ -153,7 +155,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		       u32 *value);
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
-void host1x_syncpt_free(struct host1x_syncpt *sp);
+void host1x_syncpt_put(struct host1x_syncpt *sp);
 struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 					  unsigned long flags,
 					  const char *name);
@@ -221,7 +223,7 @@ struct host1x_job {
 	dma_addr_t *reloc_addr_phys;
 
 	/* Sync point id, number of increments and end related to the submit */
-	u32 syncpt_id;
+	struct host1x_syncpt *syncpt;
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add reference counting for allocated syncpoints to allow keeping
them allocated while jobs are referencing them. Additionally,
clean up various places using syncpoint IDs to use host1x_syncpt
pointers instead.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
- Remove host1x_syncpt_put in submit code, as job_put already
  puts the syncpoint.
- Changes due to rebase in VI driver.
v4:
- Update from _free to _put in VI driver as well
---
 drivers/gpu/drm/tegra/dc.c             |  4 +-
 drivers/gpu/drm/tegra/drm.c            | 14 ++---
 drivers/gpu/drm/tegra/gr2d.c           |  4 +-
 drivers/gpu/drm/tegra/gr3d.c           |  4 +-
 drivers/gpu/drm/tegra/vic.c            |  4 +-
 drivers/gpu/host1x/cdma.c              | 11 ++--
 drivers/gpu/host1x/dev.h               |  7 ++-
 drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
 drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
 drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
 drivers/gpu/host1x/job.c               |  5 +-
 drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
 drivers/gpu/host1x/syncpt.h            |  3 ++
 drivers/staging/media/tegra-video/vi.c |  4 +-
 include/linux/host1x.h                 |  8 +--
 15 files changed, 98 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 85dd7131553a..033032dfc4b9 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
 		drm_plane_cleanup(primary);
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return err;
 }
@@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
 	}
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(dc->syncpt);
+	host1x_syncpt_put(dc->syncpt);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index e45c8414e2a3..5a6037eff37f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	struct drm_tegra_syncpt syncpt;
 	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
 	struct drm_gem_object **refs;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = NULL;
 	struct host1x_job *job;
 	unsigned int num_refs;
 	int err;
@@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 		goto fail;
 	}
 
-	/* check whether syncpoint ID is valid */
-	sp = host1x_syncpt_get(host1x, syncpt.id);
+	/* Syncpoint ref will be dropped on job release. */
+	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
 	if (!sp) {
 		err = -ENOENT;
 		goto fail;
@@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->is_addr_reg = context->client->ops->is_addr_reg;
 	job->is_valid_class = context->client->ops->is_valid_class;
 	job->syncpt_incrs = syncpt.incrs;
-	job->syncpt_id = syncpt.id;
+	job->syncpt = sp;
 	job->timeout = 10000;
 
 	if (args->timeout && args->timeout < 10000)
@@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_read *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_incr *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
@@ -409,7 +409,7 @@ static int tegra_syncpt_wait(struct drm_device *drm, void *data,
 	struct drm_tegra_syncpt_wait *args = data;
 	struct host1x_syncpt *sp;
 
-	sp = host1x_syncpt_get(host1x, args->id);
+	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
 	if (!sp)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 1a0d3ba6e525..d857a99b21a7 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -67,7 +67,7 @@ static int gr2d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr2d->channel);
 	return err;
@@ -86,7 +86,7 @@ static int gr2d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr2d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index b0b8154e8104..24442ade0da3 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -76,7 +76,7 @@ static int gr3d_init(struct host1x_client *client)
 detach:
 	host1x_client_iommu_detach(client);
 free:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 put:
 	host1x_channel_put(gr3d->channel);
 	return err;
@@ -94,7 +94,7 @@ static int gr3d_exit(struct host1x_client *client)
 		return err;
 
 	host1x_client_iommu_detach(client);
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(gr3d->channel);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index ade56b860cf9..cb476da59adc 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -197,7 +197,7 @@ static int vic_init(struct host1x_client *client)
 	return 0;
 
 free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 free_channel:
 	host1x_channel_put(vic->channel);
 detach:
@@ -221,7 +221,7 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
+	host1x_syncpt_put(client->syncpts[0]);
 	host1x_channel_put(vic->channel);
 	host1x_client_iommu_detach(client);
 
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e8d3fda91d8a..6e6ca774f68d 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -273,15 +273,13 @@ static int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
 static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 				    struct host1x_job *job)
 {
-	struct host1x *host = cdma_to_host1x(cdma);
-
 	if (cdma->timeout.client) {
 		/* timer already started */
 		return;
 	}
 
 	cdma->timeout.client = job->client;
-	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
+	cdma->timeout.syncpt = job->syncpt;
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
 
@@ -312,7 +310,6 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 static void update_cdma_locked(struct host1x_cdma *cdma)
 {
 	bool signal = false;
-	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_job *job, *n;
 
 	/* If CDMA is stopped, queue is cleared and we can return */
@@ -324,8 +321,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	 * to consume as many sync queue entries as possible without blocking
 	 */
 	list_for_each_entry_safe(job, n, &cdma->sync_queue, list) {
-		struct host1x_syncpt *sp =
-			host1x_syncpt_get(host1x, job->syncpt_id);
+		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
 		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
@@ -499,8 +495,7 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 		if (!cdma->timeout.initialized) {
 			int err;
 
-			err = host1x_hw_cdma_timeout_init(host1x, cdma,
-							  job->syncpt_id);
+			err = host1x_hw_cdma_timeout_init(host1x, cdma);
 			if (err) {
 				mutex_unlock(&cdma->lock);
 				return err;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index f781a9b0f39d..63010ae37a97 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -37,7 +37,7 @@ struct host1x_cdma_ops {
 	void (*start)(struct host1x_cdma *cdma);
 	void (*stop)(struct host1x_cdma *cdma);
 	void (*flush)(struct  host1x_cdma *cdma);
-	int (*timeout_init)(struct host1x_cdma *cdma, unsigned int syncpt);
+	int (*timeout_init)(struct host1x_cdma *cdma);
 	void (*timeout_destroy)(struct host1x_cdma *cdma);
 	void (*freeze)(struct host1x_cdma *cdma);
 	void (*resume)(struct host1x_cdma *cdma, u32 getptr);
@@ -261,10 +261,9 @@ static inline void host1x_hw_cdma_flush(struct host1x *host,
 }
 
 static inline int host1x_hw_cdma_timeout_init(struct host1x *host,
-					      struct host1x_cdma *cdma,
-					      unsigned int syncpt)
+					      struct host1x_cdma *cdma)
 {
-	return host->cdma_op->timeout_init(cdma, syncpt);
+	return host->cdma_op->timeout_init(cdma);
 }
 
 static inline void host1x_hw_cdma_timeout_destroy(struct host1x *host,
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 2f3bf94cf365..e49cd5b8f735 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -295,7 +295,7 @@ static void cdma_timeout_handler(struct work_struct *work)
 /*
  * Init timeout resources
  */
-static int cdma_timeout_init(struct host1x_cdma *cdma, unsigned int syncpt)
+static int cdma_timeout_init(struct host1x_cdma *cdma)
 {
 	INIT_DELAYED_WORK(&cdma->timeout.wq, cdma_timeout_handler);
 	cdma->timeout.initialized = true;
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5eaa29d171c9..d4c28faf27d1 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -86,8 +86,7 @@ static void submit_gathers(struct host1x_job *job)
 
 static inline void synchronize_syncpt_base(struct host1x_job *job)
 {
-	struct host1x *host = dev_get_drvdata(job->channel->dev->parent);
-	struct host1x_syncpt *sp = host->syncpt + job->syncpt_id;
+	struct host1x_syncpt *sp = job->syncpt;
 	unsigned int id;
 	u32 value;
 
@@ -118,7 +117,7 @@ static void host1x_channel_set_streamid(struct host1x_channel *channel)
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
-	struct host1x_syncpt *sp;
+	struct host1x_syncpt *sp = job->syncpt;
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
@@ -126,10 +125,9 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
-	sp = host->syncpt + job->syncpt_id;
 	trace_host1x_channel_submit(dev_name(ch->dev),
 				    job->num_gathers, job->num_relocs,
-				    job->syncpt_id, job->syncpt_incrs);
+				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
 	prev_max = job->syncpt_end = host1x_syncpt_read_max(sp);
@@ -163,7 +161,7 @@ static int channel_submit(struct host1x_job *job)
 		host1x_cdma_push(&ch->cdma,
 				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
 					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
+				 host1x_class_host_wait_syncpt(job->syncpt->id,
 					host1x_syncpt_read_max(sp)));
 	}
 
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index f31bcfa1b837..ceb48229d14b 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -204,7 +204,7 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 		unsigned int i;
 
 		host1x_debug_output(o, "\n%p: JOB, syncpt_id=%d, syncpt_val=%d, first_get=%08x, timeout=%d num_slots=%d, num_handles=%d\n",
-				    job, job->syncpt_id, job->syncpt_end,
+				    job, job->syncpt->id, job->syncpt_end,
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 82d0a60ba3f7..adbdc225de8d 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->syncpt)
+		host1x_syncpt_put(job->syncpt);
+
 	kfree(job);
 }
 
@@ -674,7 +677,7 @@ EXPORT_SYMBOL(host1x_job_unpin);
  */
 void host1x_job_dump(struct device *dev, struct host1x_job *job)
 {
-	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt_id);
+	dev_dbg(dev, "    SYNCPT_ID   %d\n", job->syncpt->id);
 	dev_dbg(dev, "    SYNCPT_VAL  %d\n", job->syncpt_end);
 	dev_dbg(dev, "    FIRST_GET   0x%x\n", job->first_get);
 	dev_dbg(dev, "    TIMEOUT     %d\n", job->timeout);
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9ccdf7709946..8a189a7c8d68 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -75,6 +75,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 	else
 		sp->client_managed = false;
 
+	kref_init(&sp->ref);
+
 	mutex_unlock(&host->syncpt_mutex);
 	return sp;
 
@@ -368,7 +370,7 @@ int host1x_syncpt_init(struct host1x *host)
  * host1x client drivers can use this function to allocate a syncpoint for
  * subsequent use. A syncpoint returned by this function will be reserved for
  * use by the client exclusively. When no longer using a syncpoint, a host1x
- * client driver needs to release it using host1x_syncpt_free().
+ * client driver needs to release it using host1x_syncpt_put().
  */
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags)
@@ -379,20 +381,9 @@ struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 }
 EXPORT_SYMBOL(host1x_syncpt_request);
 
-/**
- * host1x_syncpt_free() - free a requested syncpoint
- * @sp: host1x syncpoint
- *
- * Release a syncpoint previously allocated using host1x_syncpt_request(). A
- * host1x client driver should call this when the syncpoint is no longer in
- * use. Note that client drivers must ensure that the syncpoint doesn't remain
- * under the control of hardware after calling this function, otherwise two
- * clients may end up trying to access the same syncpoint concurrently.
- */
-void host1x_syncpt_free(struct host1x_syncpt *sp)
+static void syncpt_release(struct kref *ref)
 {
-	if (!sp)
-		return;
+	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
 	mutex_lock(&sp->host->syncpt_mutex);
 
@@ -404,7 +395,23 @@ void host1x_syncpt_free(struct host1x_syncpt *sp)
 
 	mutex_unlock(&sp->host->syncpt_mutex);
 }
-EXPORT_SYMBOL(host1x_syncpt_free);
+
+/**
+ * host1x_syncpt_put() - free a requested syncpoint
+ * @sp: host1x syncpoint
+ *
+ * Release a syncpoint previously allocated using host1x_syncpt_request(). A
+ * host1x client driver should call this when the syncpoint is no longer in
+ * use.
+ */
+void host1x_syncpt_put(struct host1x_syncpt *sp)
+{
+	if (!sp)
+		return;
+
+	kref_put(&sp->ref, syncpt_release);
+}
+EXPORT_SYMBOL(host1x_syncpt_put);
 
 void host1x_syncpt_deinit(struct host1x *host)
 {
@@ -471,16 +478,48 @@ unsigned int host1x_syncpt_nb_mlocks(struct host1x *host)
 }
 
 /**
- * host1x_syncpt_get() - obtain a syncpoint by ID
+ * host1x_syncpt_get_by_id() - obtain a syncpoint by ID
+ * @host: host1x controller
+ * @id: syncpoint ID
+ */
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host,
+					      unsigned int id)
+{
+	if (id >= host->info->nb_pts)
+		return NULL;
+
+	if (kref_get_unless_zero(&host->syncpt[id].ref))
+		return &host->syncpt[id];
+	else
+		return NULL;
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id);
+
+/**
+ * host1x_syncpt_get_by_id_noref() - obtain a syncpoint by ID but don't
+ * 	increase the refcount.
  * @host: host1x controller
  * @id: syncpoint ID
  */
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, unsigned int id)
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host,
+						    unsigned int id)
 {
 	if (id >= host->info->nb_pts)
 		return NULL;
 
-	return host->syncpt + id;
+	return &host->syncpt[id];
+}
+EXPORT_SYMBOL(host1x_syncpt_get_by_id_noref);
+
+/**
+ * host1x_syncpt_get() - increment syncpoint refcount
+ * @sp: syncpoint
+ */
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp)
+{
+	kref_get(&sp->ref);
+
+	return sp;
 }
 EXPORT_SYMBOL(host1x_syncpt_get);
 
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index 3aa6b25b1b9c..a6766f8d55ee 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -11,6 +11,7 @@
 #include <linux/atomic.h>
 #include <linux/host1x.h>
 #include <linux/kernel.h>
+#include <linux/kref.h>
 #include <linux/sched.h>
 
 #include "intr.h"
@@ -26,6 +27,8 @@ struct host1x_syncpt_base {
 };
 
 struct host1x_syncpt {
+	struct kref ref;
+
 	unsigned int id;
 	atomic_t min_val;
 	atomic_t max_val;
diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c
index 70e1e18644b2..e14be37cdb01 100644
--- a/drivers/staging/media/tegra-video/vi.c
+++ b/drivers/staging/media/tegra-video/vi.c
@@ -1131,8 +1131,8 @@ static void tegra_channel_host1x_syncpts_free(struct tegra_vi_channel *chan)
 	int i;
 
 	for (i = 0; i < chan->numgangports; i++) {
-		host1x_syncpt_free(chan->mw_ack_sp[i]);
-		host1x_syncpt_free(chan->frame_start_sp[i]);
+		host1x_syncpt_put(chan->mw_ack_sp[i]);
+		host1x_syncpt_put(chan->frame_start_sp[i]);
 	}
 }
 
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 7137ce0e35d4..107aea29bccb 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -142,7 +142,9 @@ struct host1x_syncpt_base;
 struct host1x_syncpt;
 struct host1x;
 
-struct host1x_syncpt *host1x_syncpt_get(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get_by_id_noref(struct host1x *host, u32 id);
+struct host1x_syncpt *host1x_syncpt_get(struct host1x_syncpt *sp);
 u32 host1x_syncpt_id(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_min(struct host1x_syncpt *sp);
 u32 host1x_syncpt_read_max(struct host1x_syncpt *sp);
@@ -153,7 +155,7 @@ int host1x_syncpt_wait(struct host1x_syncpt *sp, u32 thresh, long timeout,
 		       u32 *value);
 struct host1x_syncpt *host1x_syncpt_request(struct host1x_client *client,
 					    unsigned long flags);
-void host1x_syncpt_free(struct host1x_syncpt *sp);
+void host1x_syncpt_put(struct host1x_syncpt *sp);
 struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 					  unsigned long flags,
 					  const char *name);
@@ -221,7 +223,7 @@ struct host1x_job {
 	dma_addr_t *reloc_addr_phys;
 
 	/* Sync point id, number of increments and end related to the submit */
-	u32 syncpt_id;
+	struct host1x_syncpt *syncpt;
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the userspace interface header, specifying interfaces
for allocating and accessing syncpoints from userspace,
and for creating sync_file based fences based on syncpoint
thresholds.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 include/uapi/linux/host1x.h

diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
new file mode 100644
index 000000000000..9c8fb9425cb2
--- /dev/null
+++ b/include/uapi/linux/host1x.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _UAPI__LINUX_HOST1X_H
+#define _UAPI__LINUX_HOST1X_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+struct host1x_allocate_syncpoint {
+	/**
+	 * @fd: [out]
+	 *
+	 * New file descriptor representing the allocated syncpoint.
+	 */
+	__s32 fd;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_info {
+	/**
+	 * @id: [out]
+	 *
+	 * System-global ID of the syncpoint.
+	 */
+	__u32 id;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_increment {
+	/**
+	 * @count: [in]
+	 *
+	 * Number of times to increment the syncpoint. The syncpoint can
+	 * be observed at in-between values, but each increment is atomic.
+	 */
+	__u32 count;
+};
+
+struct host1x_read_syncpoint {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to read.
+	 */
+	__u32 id;
+
+	/**
+	 * @value: [out]
+	 *
+	 * Current value of the syncpoint.
+	 */
+	__u32 value;
+};
+
+struct host1x_create_fence {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to create a fence for.
+	 */
+	__u32 id;
+
+	/**
+	 * @threshold: [in]
+	 *
+	 * When the syncpoint reaches this value, the fence will be signaled.
+	 * The syncpoint is considered to have reached the threshold when the
+	 * following condition is true:
+	 *
+	 * 	((value - threshold) & 0x80000000U) == 0U
+	 *
+	 */
+	__u32 threshold;
+
+	/**
+	 * @fence_fd: [out]
+	 *
+	 * New sync_file file descriptor containing the created fence.
+	 */
+	__s32 fence_fd;
+
+	__u32 reserved[1];
+};
+
+struct host1x_fence_extract_fence {
+	__u32 id;
+	__u32 threshold;
+};
+
+struct host1x_fence_extract {
+	/**
+	 * @fence_fd: [in]
+	 *
+	 * sync_file file descriptor
+	 */
+	__s32 fence_fd;
+
+	/**
+	 * @num_fences: [in,out]
+	 *
+	 * In: size of the `fences_ptr` array counted in elements.
+	 * Out: required size of the `fences_ptr` array counted in elements.
+	 */
+	__u32 num_fences;
+
+	/**
+	 * @fences_ptr: [in]
+	 *
+	 * Pointer to array of `struct host1x_fence_extract_fence`.
+	 */
+	__u64 fences_ptr;
+
+	__u32 reserved[2];
+};
+
+#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
+#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
+#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
+#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
+#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
+#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add the userspace interface header, specifying interfaces
for allocating and accessing syncpoints from userspace,
and for creating sync_file based fences based on syncpoint
thresholds.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
 1 file changed, 134 insertions(+)
 create mode 100644 include/uapi/linux/host1x.h

diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
new file mode 100644
index 000000000000..9c8fb9425cb2
--- /dev/null
+++ b/include/uapi/linux/host1x.h
@@ -0,0 +1,134 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _UAPI__LINUX_HOST1X_H
+#define _UAPI__LINUX_HOST1X_H
+
+#include <linux/ioctl.h>
+#include <linux/types.h>
+
+#if defined(__cplusplus)
+extern "C" {
+#endif
+
+struct host1x_allocate_syncpoint {
+	/**
+	 * @fd: [out]
+	 *
+	 * New file descriptor representing the allocated syncpoint.
+	 */
+	__s32 fd;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_info {
+	/**
+	 * @id: [out]
+	 *
+	 * System-global ID of the syncpoint.
+	 */
+	__u32 id;
+
+	__u32 reserved[3];
+};
+
+struct host1x_syncpoint_increment {
+	/**
+	 * @count: [in]
+	 *
+	 * Number of times to increment the syncpoint. The syncpoint can
+	 * be observed at in-between values, but each increment is atomic.
+	 */
+	__u32 count;
+};
+
+struct host1x_read_syncpoint {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to read.
+	 */
+	__u32 id;
+
+	/**
+	 * @value: [out]
+	 *
+	 * Current value of the syncpoint.
+	 */
+	__u32 value;
+};
+
+struct host1x_create_fence {
+	/**
+	 * @id: [in]
+	 *
+	 * ID of the syncpoint to create a fence for.
+	 */
+	__u32 id;
+
+	/**
+	 * @threshold: [in]
+	 *
+	 * When the syncpoint reaches this value, the fence will be signaled.
+	 * The syncpoint is considered to have reached the threshold when the
+	 * following condition is true:
+	 *
+	 * 	((value - threshold) & 0x80000000U) == 0U
+	 *
+	 */
+	__u32 threshold;
+
+	/**
+	 * @fence_fd: [out]
+	 *
+	 * New sync_file file descriptor containing the created fence.
+	 */
+	__s32 fence_fd;
+
+	__u32 reserved[1];
+};
+
+struct host1x_fence_extract_fence {
+	__u32 id;
+	__u32 threshold;
+};
+
+struct host1x_fence_extract {
+	/**
+	 * @fence_fd: [in]
+	 *
+	 * sync_file file descriptor
+	 */
+	__s32 fence_fd;
+
+	/**
+	 * @num_fences: [in,out]
+	 *
+	 * In: size of the `fences_ptr` array counted in elements.
+	 * Out: required size of the `fences_ptr` array counted in elements.
+	 */
+	__u32 num_fences;
+
+	/**
+	 * @fences_ptr: [in]
+	 *
+	 * Pointer to array of `struct host1x_fence_extract_fence`.
+	 */
+	__u64 fences_ptr;
+
+	__u32 reserved[2];
+};
+
+#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
+#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
+#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
+#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
+#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
+#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
+
+#if defined(__cplusplus)
+}
+#endif
+
+#endif
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add the /dev/host1x device node, implementing the following
functionality:

- Reading syncpoint values
- Allocating syncpoints (providing syncpoint FDs)
- Incrementing syncpoints (based on syncpoint FD)

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v4:
* Put UAPI under CONFIG_DRM_TEGRA_STAGING
v3:
* Pass process name as syncpoint name when allocating
  syncpoint.
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/dev.c    |   9 ++
 drivers/gpu/host1x/dev.h    |   3 +
 drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h   |  22 +++
 include/linux/host1x.h      |   2 +
 6 files changed, 319 insertions(+)
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 096017b8789d..882f928d75e1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -9,6 +9,7 @@ host1x-y = \
 	job.o \
 	debug.o \
 	mipi.o \
+	uapi.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index d0ebb70e2fdd..641317d23828 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
 		goto deinit_syncpt;
 	}
 
+	err = host1x_uapi_init(&host->uapi, host);
+	if (err) {
+		dev_err(&pdev->dev, "failed to initialize uapi\n");
+		goto deinit_intr;
+	}
+
 	host1x_debug_init(host);
 
 	if (host->info->has_hypervisor)
@@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_unregister(host);
 deinit_debugfs:
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
+deinit_intr:
 	host1x_intr_deinit(host);
 deinit_syncpt:
 	host1x_syncpt_deinit(host);
@@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
 
 	host1x_unregister(host);
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
 	host1x_intr_deinit(host);
 	host1x_syncpt_deinit(host);
 	reset_control_assert(host->rst);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 63010ae37a97..7b8b7e20e32b 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,6 +17,7 @@
 #include "intr.h"
 #include "job.h"
 #include "syncpt.h"
+#include "uapi.h"
 
 struct host1x_syncpt;
 struct host1x_syncpt_base;
@@ -143,6 +144,8 @@ struct host1x {
 	struct list_head list;
 
 	struct device_dma_parameters dma_parms;
+
+	struct host1x_uapi uapi;
 };
 
 void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
new file mode 100644
index 000000000000..27b8761c3f35
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * /dev/host1x syncpoint interface
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/host1x.h>
+#include <linux/nospec.h>
+
+#include "dev.h"
+#include "syncpt.h"
+#include "uapi.h"
+
+#include <uapi/linux/host1x.h>
+
+static int syncpt_file_release(struct inode *inode, struct file *file)
+{
+	struct host1x_syncpt *sp = file->private_data;
+
+	host1x_syncpt_put(sp);
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_info args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	args.id = sp->id;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_increment args;
+	unsigned long copy_err;
+	u32 i;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	for (i = 0; i < args.count; i++) {
+		host1x_syncpt_incr(sp);
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_SYNCPOINT_INFO:
+		err = syncpt_file_ioctl_info(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
+		err = syncpt_file_ioctl_incr(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations syncpt_file_fops = {
+	.owner = THIS_MODULE,
+	.release = syncpt_file_release,
+	.unlocked_ioctl = syncpt_file_ioctl,
+	.compat_ioctl = syncpt_file_ioctl,
+};
+
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
+{
+	struct host1x_syncpt *sp;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EINVAL);
+
+	if (file->f_op != &syncpt_file_fops) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	sp = file->private_data;
+
+	host1x_syncpt_get(sp);
+
+	fput(file);
+
+	return sp;
+}
+EXPORT_SYMBOL(host1x_syncpt_fd_get);
+
+static int dev_file_open(struct inode *inode, struct file *file)
+{
+	struct host1x_uapi *uapi =
+		container_of(inode->i_cdev, struct host1x_uapi, cdev);
+
+	file->private_data = container_of(uapi, struct host1x, uapi);
+
+	return 0;
+}
+
+static int dev_file_ioctl_read_syncpoint(struct host1x *host1x,
+					 void __user *data)
+{
+	struct host1x_read_syncpoint args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+	args.value = host1x_syncpt_read(&host1x->syncpt[args.id]);
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
+					  void __user *data)
+{
+	struct host1x_allocate_syncpoint args;
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	sp = host1x_syncpt_alloc(host1x, HOST1X_SYNCPT_CLIENT_MANAGED,
+				 current->comm);
+	if (!sp)
+		return -EBUSY;
+
+	err = anon_inode_getfd("host1x_syncpt", &syncpt_file_fops, sp,
+			       O_CLOEXEC);
+	if (err < 0)
+		goto free_syncpt;
+
+	args.fd = err;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fd;
+	}
+
+	return 0;
+
+put_fd:
+	put_unused_fd(args.fd);
+free_syncpt:
+	host1x_syncpt_put(sp);
+
+	return err;
+}
+
+static long dev_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_READ_SYNCPOINT:
+		err = dev_file_ioctl_read_syncpoint(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_ALLOCATE_SYNCPOINT:
+		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations dev_file_fops = {
+	.owner = THIS_MODULE,
+	.open = dev_file_open,
+	.unlocked_ioctl = dev_file_ioctl,
+	.compat_ioctl = dev_file_ioctl,
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x)
+{
+	int err;
+	dev_t dev_num;
+
+	if (!IS_ENABLED(CONFIG_DRM_TEGRA_STAGING))
+		return 0;
+
+	err = alloc_chrdev_region(&dev_num, 0, 1, "host1x");
+	if (err)
+		return err;
+
+	uapi->class = class_create(THIS_MODULE, "host1x");
+	if (IS_ERR(uapi->class)) {
+		err = PTR_ERR(uapi->class);
+		goto unregister_chrdev_region;
+	}
+
+	cdev_init(&uapi->cdev, &dev_file_fops);
+	err = cdev_add(&uapi->cdev, dev_num, 1);
+	if (err)
+		goto destroy_class;
+
+	uapi->dev = device_create(uapi->class, host1x->dev,
+				  dev_num, NULL, "host1x");
+	if (IS_ERR(uapi->dev)) {
+		err = PTR_ERR(uapi->dev);
+		goto del_cdev;
+	}
+
+	cdev_add(&uapi->cdev, dev_num, 1);
+
+	uapi->dev_num = dev_num;
+
+	return 0;
+
+del_cdev:
+	cdev_del(&uapi->cdev);
+destroy_class:
+	class_destroy(uapi->class);
+unregister_chrdev_region:
+	unregister_chrdev_region(dev_num, 1);
+
+	return err;
+}
+
+void host1x_uapi_deinit(struct host1x_uapi *uapi)
+{
+	if (!IS_ENABLED(CONFIG_DRM_TEGRA_STAGING))
+		return;
+
+	device_destroy(uapi->class, uapi->dev_num);
+	cdev_del(&uapi->cdev);
+	class_destroy(uapi->class);
+	unregister_chrdev_region(uapi->dev_num, 1);
+}
diff --git a/drivers/gpu/host1x/uapi.h b/drivers/gpu/host1x/uapi.h
new file mode 100644
index 000000000000..7beb5e44c1b1
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_UAPI_H
+#define HOST1X_UAPI_H
+
+#include <linux/cdev.h>
+
+struct host1x_uapi {
+	struct class *class;
+
+	struct cdev cdev;
+	struct device *dev;
+	dev_t dev_num;
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x);
+void host1x_uapi_deinit(struct host1x_uapi *uapi);
+
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 107aea29bccb..b3178ae51cae 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
+
 /*
  * host1x channel
  */
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add the /dev/host1x device node, implementing the following
functionality:

- Reading syncpoint values
- Allocating syncpoints (providing syncpoint FDs)
- Incrementing syncpoints (based on syncpoint FD)

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v4:
* Put UAPI under CONFIG_DRM_TEGRA_STAGING
v3:
* Pass process name as syncpoint name when allocating
  syncpoint.
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/dev.c    |   9 ++
 drivers/gpu/host1x/dev.h    |   3 +
 drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/uapi.h   |  22 +++
 include/linux/host1x.h      |   2 +
 6 files changed, 319 insertions(+)
 create mode 100644 drivers/gpu/host1x/uapi.c
 create mode 100644 drivers/gpu/host1x/uapi.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 096017b8789d..882f928d75e1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -9,6 +9,7 @@ host1x-y = \
 	job.o \
 	debug.o \
 	mipi.o \
+	uapi.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index d0ebb70e2fdd..641317d23828 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
 		goto deinit_syncpt;
 	}
 
+	err = host1x_uapi_init(&host->uapi, host);
+	if (err) {
+		dev_err(&pdev->dev, "failed to initialize uapi\n");
+		goto deinit_intr;
+	}
+
 	host1x_debug_init(host);
 
 	if (host->info->has_hypervisor)
@@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host1x_unregister(host);
 deinit_debugfs:
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
+deinit_intr:
 	host1x_intr_deinit(host);
 deinit_syncpt:
 	host1x_syncpt_deinit(host);
@@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
 
 	host1x_unregister(host);
 	host1x_debug_deinit(host);
+	host1x_uapi_deinit(&host->uapi);
 	host1x_intr_deinit(host);
 	host1x_syncpt_deinit(host);
 	reset_control_assert(host->rst);
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 63010ae37a97..7b8b7e20e32b 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -17,6 +17,7 @@
 #include "intr.h"
 #include "job.h"
 #include "syncpt.h"
+#include "uapi.h"
 
 struct host1x_syncpt;
 struct host1x_syncpt_base;
@@ -143,6 +144,8 @@ struct host1x {
 	struct list_head list;
 
 	struct device_dma_parameters dma_parms;
+
+	struct host1x_uapi uapi;
 };
 
 void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
new file mode 100644
index 000000000000..27b8761c3f35
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.c
@@ -0,0 +1,282 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * /dev/host1x syncpoint interface
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/cdev.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/host1x.h>
+#include <linux/nospec.h>
+
+#include "dev.h"
+#include "syncpt.h"
+#include "uapi.h"
+
+#include <uapi/linux/host1x.h>
+
+static int syncpt_file_release(struct inode *inode, struct file *file)
+{
+	struct host1x_syncpt *sp = file->private_data;
+
+	host1x_syncpt_put(sp);
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_info args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	args.id = sp->id;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
+{
+	struct host1x_syncpoint_increment args;
+	unsigned long copy_err;
+	u32 i;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	for (i = 0; i < args.count; i++) {
+		host1x_syncpt_incr(sp);
+		if (signal_pending(current))
+			return -EINTR;
+	}
+
+	return 0;
+}
+
+static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
+			      unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_SYNCPOINT_INFO:
+		err = syncpt_file_ioctl_info(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
+		err = syncpt_file_ioctl_incr(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations syncpt_file_fops = {
+	.owner = THIS_MODULE,
+	.release = syncpt_file_release,
+	.unlocked_ioctl = syncpt_file_ioctl,
+	.compat_ioctl = syncpt_file_ioctl,
+};
+
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
+{
+	struct host1x_syncpt *sp;
+	struct file *file = fget(fd);
+
+	if (!file)
+		return ERR_PTR(-EINVAL);
+
+	if (file->f_op != &syncpt_file_fops) {
+		fput(file);
+		return ERR_PTR(-EINVAL);
+	}
+
+	sp = file->private_data;
+
+	host1x_syncpt_get(sp);
+
+	fput(file);
+
+	return sp;
+}
+EXPORT_SYMBOL(host1x_syncpt_fd_get);
+
+static int dev_file_open(struct inode *inode, struct file *file)
+{
+	struct host1x_uapi *uapi =
+		container_of(inode->i_cdev, struct host1x_uapi, cdev);
+
+	file->private_data = container_of(uapi, struct host1x, uapi);
+
+	return 0;
+}
+
+static int dev_file_ioctl_read_syncpoint(struct host1x *host1x,
+					 void __user *data)
+{
+	struct host1x_read_syncpoint args;
+	unsigned long copy_err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+	args.value = host1x_syncpt_read(&host1x->syncpt[args.id]);
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
+					  void __user *data)
+{
+	struct host1x_allocate_syncpoint args;
+	struct host1x_syncpt *sp;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
+		return -EINVAL;
+
+	sp = host1x_syncpt_alloc(host1x, HOST1X_SYNCPT_CLIENT_MANAGED,
+				 current->comm);
+	if (!sp)
+		return -EBUSY;
+
+	err = anon_inode_getfd("host1x_syncpt", &syncpt_file_fops, sp,
+			       O_CLOEXEC);
+	if (err < 0)
+		goto free_syncpt;
+
+	args.fd = err;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fd;
+	}
+
+	return 0;
+
+put_fd:
+	put_unused_fd(args.fd);
+free_syncpt:
+	host1x_syncpt_put(sp);
+
+	return err;
+}
+
+static long dev_file_ioctl(struct file *file, unsigned int cmd,
+			   unsigned long arg)
+{
+	void __user *data = (void __user *)arg;
+	long err;
+
+	switch (cmd) {
+	case HOST1X_IOCTL_READ_SYNCPOINT:
+		err = dev_file_ioctl_read_syncpoint(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_ALLOCATE_SYNCPOINT:
+		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
+		break;
+
+	default:
+		err = -ENOTTY;
+	}
+
+	return err;
+}
+
+static const struct file_operations dev_file_fops = {
+	.owner = THIS_MODULE,
+	.open = dev_file_open,
+	.unlocked_ioctl = dev_file_ioctl,
+	.compat_ioctl = dev_file_ioctl,
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x)
+{
+	int err;
+	dev_t dev_num;
+
+	if (!IS_ENABLED(CONFIG_DRM_TEGRA_STAGING))
+		return 0;
+
+	err = alloc_chrdev_region(&dev_num, 0, 1, "host1x");
+	if (err)
+		return err;
+
+	uapi->class = class_create(THIS_MODULE, "host1x");
+	if (IS_ERR(uapi->class)) {
+		err = PTR_ERR(uapi->class);
+		goto unregister_chrdev_region;
+	}
+
+	cdev_init(&uapi->cdev, &dev_file_fops);
+	err = cdev_add(&uapi->cdev, dev_num, 1);
+	if (err)
+		goto destroy_class;
+
+	uapi->dev = device_create(uapi->class, host1x->dev,
+				  dev_num, NULL, "host1x");
+	if (IS_ERR(uapi->dev)) {
+		err = PTR_ERR(uapi->dev);
+		goto del_cdev;
+	}
+
+	cdev_add(&uapi->cdev, dev_num, 1);
+
+	uapi->dev_num = dev_num;
+
+	return 0;
+
+del_cdev:
+	cdev_del(&uapi->cdev);
+destroy_class:
+	class_destroy(uapi->class);
+unregister_chrdev_region:
+	unregister_chrdev_region(dev_num, 1);
+
+	return err;
+}
+
+void host1x_uapi_deinit(struct host1x_uapi *uapi)
+{
+	if (!IS_ENABLED(CONFIG_DRM_TEGRA_STAGING))
+		return;
+
+	device_destroy(uapi->class, uapi->dev_num);
+	cdev_del(&uapi->cdev);
+	class_destroy(uapi->class);
+	unregister_chrdev_region(uapi->dev_num, 1);
+}
diff --git a/drivers/gpu/host1x/uapi.h b/drivers/gpu/host1x/uapi.h
new file mode 100644
index 000000000000..7beb5e44c1b1
--- /dev/null
+++ b/drivers/gpu/host1x/uapi.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_UAPI_H
+#define HOST1X_UAPI_H
+
+#include <linux/cdev.h>
+
+struct host1x_uapi {
+	struct class *class;
+
+	struct cdev cdev;
+	struct device *dev;
+	dev_t dev_num;
+};
+
+int host1x_uapi_init(struct host1x_uapi *uapi, struct host1x *host1x);
+void host1x_uapi_deinit(struct host1x_uapi *uapi);
+
+#endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 107aea29bccb..b3178ae51cae 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,8 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
+
 /*
  * host1x channel
  */
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 09/21] gpu: host1x: DMA fences and userspace fence creation
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up.

Additionally, add a new /dev/host1x IOCTL for creating sync_file
file descriptors backed by syncpoint fences.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Update for change in put_ref prototype.
v4:
* Fix _signal prototype and include it to avoid warning
* Remove use of unused local in error path
v3:
* Move declaration of host1x_fence_extract to public header
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/fence.c  | 208 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/fence.h  |  13 +++
 drivers/gpu/host1x/intr.c   |   9 ++
 drivers/gpu/host1x/intr.h   |   2 +
 drivers/gpu/host1x/uapi.c   | 103 ++++++++++++++++++
 include/linux/host1x.h      |   4 +
 7 files changed, 340 insertions(+)
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 882f928d75e1..a48af2cefae1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
 	debug.o \
 	mipi.o \
 	uapi.o \
+	fence.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
new file mode 100644
index 000000000000..e96ad93ff656
--- /dev/null
+++ b/drivers/gpu/host1x/fence.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Syncpoint dma_fence implementation
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sync_file.h>
+
+#include "fence.h"
+#include "intr.h"
+#include "syncpt.h"
+
+DEFINE_SPINLOCK(lock);
+
+struct host1x_syncpt_fence {
+	struct dma_fence base;
+
+	atomic_t signaling;
+
+	struct host1x_syncpt *sp;
+	u32 threshold;
+
+	struct host1x_waitlist *waiter;
+	void *waiter_ref;
+
+	struct delayed_work timeout_work;
+};
+
+static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
+{
+	return "host1x";
+}
+
+static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "syncpoint";
+}
+
+static bool syncpt_fence_enable_signaling(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+	int err;
+
+	if (host1x_syncpt_is_expired(sf->sp, sf->threshold))
+		return false;
+
+	dma_fence_get(f);
+
+	/*
+	 * The dma_fence framework requires the fence driver to keep a
+	 * reference to any fences for which 'enable_signaling' has been
+	 * called (and that have not been signalled).
+	 * 
+	 * We provide a userspace API to create arbitrary syncpoint fences,
+	 * so we cannot normally guarantee that all fences get signalled.
+	 * As such, setup a timeout, so that long-lasting fences will get
+	 * reaped eventually.
+	 */
+	schedule_delayed_work(&sf->timeout_work, msecs_to_jiffies(30000));
+
+	err = host1x_intr_add_action(sf->sp->host, sf->sp, sf->threshold,
+				     HOST1X_INTR_ACTION_SIGNAL_FENCE, f,
+				     sf->waiter, &sf->waiter_ref);
+	if (err) {
+		cancel_delayed_work_sync(&sf->timeout_work);
+		dma_fence_put(f);
+		return false;
+	}
+
+	/* intr framework takes ownership of waiter */
+	sf->waiter = NULL;
+
+	/*
+	 * The fence may get signalled at any time after the above call,
+	 * so we need to initialize all state used by signalling
+	 * before it.
+	 */
+
+	return true;
+}
+
+static void syncpt_fence_release(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+
+	if (sf->waiter)
+		kfree(sf->waiter);
+
+	dma_fence_free(f);
+}
+
+const struct dma_fence_ops syncpt_fence_ops = {
+	.get_driver_name = syncpt_fence_get_driver_name,
+	.get_timeline_name = syncpt_fence_get_timeline_name,
+	.enable_signaling = syncpt_fence_enable_signaling,
+	.release = syncpt_fence_release,
+};
+
+void host1x_fence_signal(struct host1x_syncpt_fence *f)
+{
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	cancel_delayed_work_sync(&f->timeout_work);
+
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref, false);
+
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+static void do_fence_timeout(struct work_struct *work)
+{
+	struct delayed_work *dwork = (struct delayed_work *)work;
+	struct host1x_syncpt_fence *f =
+		container_of(dwork, struct host1x_syncpt_fence, timeout_work);
+
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref, true);
+
+	dma_fence_set_error(&f->base, -ETIMEDOUT);
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct host1x_syncpt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->waiter = kzalloc(sizeof(*fence->waiter), GFP_KERNEL);
+	if (!fence->waiter)
+		return ERR_PTR(-ENOMEM);
+
+	fence->sp = sp;
+	fence->threshold = threshold;
+
+	dma_fence_init(&fence->base, &syncpt_fence_ops, &lock,
+		       dma_fence_context_alloc(1), 0);
+
+	INIT_DELAYED_WORK(&fence->timeout_work, do_fence_timeout);
+
+	return &fence->base;
+}
+EXPORT_SYMBOL(host1x_fence_create);
+
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct sync_file *file;
+	struct dma_fence *f;
+	int fd;
+
+	f = host1x_fence_create(sp, threshold);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		dma_fence_put(f);
+		return fd;
+	}
+
+	file = sync_file_create(f);
+	dma_fence_put(f);
+	if (!file)
+		return -ENOMEM;
+
+	fd_install(fd, file->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(host1x_fence_create_fd);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold)
+{
+	struct host1x_syncpt_fence *f;
+
+	if (fence->ops != &syncpt_fence_ops)
+		return -EINVAL;
+
+	f = container_of(fence, struct host1x_syncpt_fence, base);
+
+	*id = f->sp->id;
+	*threshold = f->threshold;
+
+	return 0;
+}
+EXPORT_SYMBOL(host1x_fence_extract);
diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h
new file mode 100644
index 000000000000..70c91de82f14
--- /dev/null
+++ b/drivers/gpu/host1x/fence.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_FENCE_H
+#define HOST1X_FENCE_H
+
+struct host1x_syncpt_fence;
+
+void host1x_fence_signal(struct host1x_syncpt_fence *fence);
+
+#endif
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 70e1096a4fe9..bcffc4d7879b 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -13,6 +13,7 @@
 #include <trace/events/host1x.h>
 #include "channel.h"
 #include "dev.h"
+#include "fence.h"
 #include "intr.h"
 
 /* Wait list management */
@@ -121,12 +122,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 	wake_up_interruptible(wq);
 }
 
+static void action_signal_fence(struct host1x_waitlist *waiter)
+{
+	struct host1x_syncpt_fence *f = waiter->data;
+
+	host1x_fence_signal(f);
+}
+
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
 	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
+	action_signal_fence,
 };
 
 static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index 6ea55e615e3a..e4c346099273 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -33,6 +33,8 @@ enum host1x_intr_action {
 	 */
 	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
 
+	HOST1X_INTR_ACTION_SIGNAL_FENCE,
+
 	HOST1X_INTR_ACTION_COUNT
 };
 
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
index 27b8761c3f35..e931756a5e16 100644
--- a/drivers/gpu/host1x/uapi.c
+++ b/drivers/gpu/host1x/uapi.c
@@ -11,8 +11,10 @@
 #include <linux/fs.h>
 #include <linux/host1x.h>
 #include <linux/nospec.h>
+#include <linux/sync_file.h>
 
 #include "dev.h"
+#include "fence.h"
 #include "syncpt.h"
 #include "uapi.h"
 
@@ -195,6 +197,99 @@ static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
 	return err;
 }
 
+static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
+{
+	struct host1x_create_fence args;
+	unsigned long copy_err;
+	int fd;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0])
+		return -EINVAL;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+
+	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
+	if (fd < 0)
+		return fd;
+
+	args.fence_fd = fd;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
+{
+	struct host1x_fence_extract_fence __user *fences_user_ptr;
+	struct dma_fence *fence, **fences;
+	struct host1x_fence_extract args;
+	struct dma_fence_array *array;
+	unsigned int num_fences, i;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
+
+	if (args.reserved[0] || args.reserved[1])
+		return -EINVAL;
+
+	fence = sync_file_get_fence(args.fence_fd);
+	if (!fence)
+		return -EINVAL;
+
+	array = to_dma_fence_array(fence);
+	if (array) {
+		fences = array->fences;
+		num_fences = array->num_fences;
+	} else {
+		fences = &fence;
+		num_fences = 1;
+	}
+
+	for (i = 0; i < min(num_fences, args.num_fences); i++) {
+		struct host1x_fence_extract_fence f;
+
+		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
+		if (err)
+			goto put_fence;
+
+		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
+		if (copy_err) {
+			err = -EFAULT;
+			goto put_fence;
+		}
+	}
+
+	args.num_fences = i+1;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fence;
+	}
+
+	return 0;
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
 static long dev_file_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -210,6 +305,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
 		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
 		break;
 
+	case HOST1X_IOCTL_CREATE_FENCE:
+		err = dev_file_ioctl_create_fence(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_FENCE_EXTRACT:
+		err = dev_file_ioctl_fence_extract(file->private_data, data);
+		break;
+
 	default:
 		err = -ENOTTY;
 	}
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index b3178ae51cae..080f9d3d29eb 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -165,6 +165,10 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);
+
 /*
  * host1x channel
  */
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 09/21] gpu: host1x: DMA fences and userspace fence creation
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add an implementation of dma_fences based on syncpoints. Syncpoint
interrupts are used to signal fences. Additionally, after
software signaling has been enabled, a 30 second timeout is started.
If the syncpoint threshold is not reached within this period,
the fence is signalled with an -ETIMEDOUT error code. This is to
allow fences that would never reach their syncpoint threshold to
be cleaned up.

Additionally, add a new /dev/host1x IOCTL for creating sync_file
file descriptors backed by syncpoint fences.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Update for change in put_ref prototype.
v4:
* Fix _signal prototype and include it to avoid warning
* Remove use of unused local in error path
v3:
* Move declaration of host1x_fence_extract to public header
---
 drivers/gpu/host1x/Makefile |   1 +
 drivers/gpu/host1x/fence.c  | 208 ++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/fence.h  |  13 +++
 drivers/gpu/host1x/intr.c   |   9 ++
 drivers/gpu/host1x/intr.h   |   2 +
 drivers/gpu/host1x/uapi.c   | 103 ++++++++++++++++++
 include/linux/host1x.h      |   4 +
 7 files changed, 340 insertions(+)
 create mode 100644 drivers/gpu/host1x/fence.c
 create mode 100644 drivers/gpu/host1x/fence.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index 882f928d75e1..a48af2cefae1 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
 	debug.o \
 	mipi.o \
 	uapi.o \
+	fence.o \
 	hw/host1x01.o \
 	hw/host1x02.o \
 	hw/host1x04.o \
diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
new file mode 100644
index 000000000000..e96ad93ff656
--- /dev/null
+++ b/drivers/gpu/host1x/fence.c
@@ -0,0 +1,208 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Syncpoint dma_fence implementation
+ *
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/slab.h>
+#include <linux/sync_file.h>
+
+#include "fence.h"
+#include "intr.h"
+#include "syncpt.h"
+
+DEFINE_SPINLOCK(lock);
+
+struct host1x_syncpt_fence {
+	struct dma_fence base;
+
+	atomic_t signaling;
+
+	struct host1x_syncpt *sp;
+	u32 threshold;
+
+	struct host1x_waitlist *waiter;
+	void *waiter_ref;
+
+	struct delayed_work timeout_work;
+};
+
+static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
+{
+	return "host1x";
+}
+
+static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
+{
+	return "syncpoint";
+}
+
+static bool syncpt_fence_enable_signaling(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+	int err;
+
+	if (host1x_syncpt_is_expired(sf->sp, sf->threshold))
+		return false;
+
+	dma_fence_get(f);
+
+	/*
+	 * The dma_fence framework requires the fence driver to keep a
+	 * reference to any fences for which 'enable_signaling' has been
+	 * called (and that have not been signalled).
+	 * 
+	 * We provide a userspace API to create arbitrary syncpoint fences,
+	 * so we cannot normally guarantee that all fences get signalled.
+	 * As such, setup a timeout, so that long-lasting fences will get
+	 * reaped eventually.
+	 */
+	schedule_delayed_work(&sf->timeout_work, msecs_to_jiffies(30000));
+
+	err = host1x_intr_add_action(sf->sp->host, sf->sp, sf->threshold,
+				     HOST1X_INTR_ACTION_SIGNAL_FENCE, f,
+				     sf->waiter, &sf->waiter_ref);
+	if (err) {
+		cancel_delayed_work_sync(&sf->timeout_work);
+		dma_fence_put(f);
+		return false;
+	}
+
+	/* intr framework takes ownership of waiter */
+	sf->waiter = NULL;
+
+	/*
+	 * The fence may get signalled at any time after the above call,
+	 * so we need to initialize all state used by signalling
+	 * before it.
+	 */
+
+	return true;
+}
+
+static void syncpt_fence_release(struct dma_fence *f)
+{
+	struct host1x_syncpt_fence *sf =
+		container_of(f, struct host1x_syncpt_fence, base);
+
+	if (sf->waiter)
+		kfree(sf->waiter);
+
+	dma_fence_free(f);
+}
+
+const struct dma_fence_ops syncpt_fence_ops = {
+	.get_driver_name = syncpt_fence_get_driver_name,
+	.get_timeline_name = syncpt_fence_get_timeline_name,
+	.enable_signaling = syncpt_fence_enable_signaling,
+	.release = syncpt_fence_release,
+};
+
+void host1x_fence_signal(struct host1x_syncpt_fence *f)
+{
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	cancel_delayed_work_sync(&f->timeout_work);
+
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref, false);
+
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+static void do_fence_timeout(struct work_struct *work)
+{
+	struct delayed_work *dwork = (struct delayed_work *)work;
+	struct host1x_syncpt_fence *f =
+		container_of(dwork, struct host1x_syncpt_fence, timeout_work);
+
+	if (atomic_xchg(&f->signaling, 1))
+		return;
+
+	/*
+	 * Cancel pending timeout work - if it races, it will
+	 * not get 'f->signaling' and return.
+	 */
+	host1x_intr_put_ref(f->sp->host, f->sp->id, f->waiter_ref, true);
+
+	dma_fence_set_error(&f->base, -ETIMEDOUT);
+	dma_fence_signal(&f->base);
+	dma_fence_put(&f->base);
+}
+
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct host1x_syncpt_fence *fence;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (!fence)
+		return ERR_PTR(-ENOMEM);
+
+	fence->waiter = kzalloc(sizeof(*fence->waiter), GFP_KERNEL);
+	if (!fence->waiter)
+		return ERR_PTR(-ENOMEM);
+
+	fence->sp = sp;
+	fence->threshold = threshold;
+
+	dma_fence_init(&fence->base, &syncpt_fence_ops, &lock,
+		       dma_fence_context_alloc(1), 0);
+
+	INIT_DELAYED_WORK(&fence->timeout_work, do_fence_timeout);
+
+	return &fence->base;
+}
+EXPORT_SYMBOL(host1x_fence_create);
+
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold)
+{
+	struct sync_file *file;
+	struct dma_fence *f;
+	int fd;
+
+	f = host1x_fence_create(sp, threshold);
+	if (IS_ERR(f))
+		return PTR_ERR(f);
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		dma_fence_put(f);
+		return fd;
+	}
+
+	file = sync_file_create(f);
+	dma_fence_put(f);
+	if (!file)
+		return -ENOMEM;
+
+	fd_install(fd, file->file);
+
+	return fd;
+}
+EXPORT_SYMBOL(host1x_fence_create_fd);
+
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold)
+{
+	struct host1x_syncpt_fence *f;
+
+	if (fence->ops != &syncpt_fence_ops)
+		return -EINVAL;
+
+	f = container_of(fence, struct host1x_syncpt_fence, base);
+
+	*id = f->sp->id;
+	*threshold = f->threshold;
+
+	return 0;
+}
+EXPORT_SYMBOL(host1x_fence_extract);
diff --git a/drivers/gpu/host1x/fence.h b/drivers/gpu/host1x/fence.h
new file mode 100644
index 000000000000..70c91de82f14
--- /dev/null
+++ b/drivers/gpu/host1x/fence.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2020, NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_FENCE_H
+#define HOST1X_FENCE_H
+
+struct host1x_syncpt_fence;
+
+void host1x_fence_signal(struct host1x_syncpt_fence *fence);
+
+#endif
diff --git a/drivers/gpu/host1x/intr.c b/drivers/gpu/host1x/intr.c
index 70e1096a4fe9..bcffc4d7879b 100644
--- a/drivers/gpu/host1x/intr.c
+++ b/drivers/gpu/host1x/intr.c
@@ -13,6 +13,7 @@
 #include <trace/events/host1x.h>
 #include "channel.h"
 #include "dev.h"
+#include "fence.h"
 #include "intr.h"
 
 /* Wait list management */
@@ -121,12 +122,20 @@ static void action_wakeup_interruptible(struct host1x_waitlist *waiter)
 	wake_up_interruptible(wq);
 }
 
+static void action_signal_fence(struct host1x_waitlist *waiter)
+{
+	struct host1x_syncpt_fence *f = waiter->data;
+
+	host1x_fence_signal(f);
+}
+
 typedef void (*action_handler)(struct host1x_waitlist *waiter);
 
 static const action_handler action_handlers[HOST1X_INTR_ACTION_COUNT] = {
 	action_submit_complete,
 	action_wakeup,
 	action_wakeup_interruptible,
+	action_signal_fence,
 };
 
 static void run_handlers(struct list_head completed[HOST1X_INTR_ACTION_COUNT])
diff --git a/drivers/gpu/host1x/intr.h b/drivers/gpu/host1x/intr.h
index 6ea55e615e3a..e4c346099273 100644
--- a/drivers/gpu/host1x/intr.h
+++ b/drivers/gpu/host1x/intr.h
@@ -33,6 +33,8 @@ enum host1x_intr_action {
 	 */
 	HOST1X_INTR_ACTION_WAKEUP_INTERRUPTIBLE,
 
+	HOST1X_INTR_ACTION_SIGNAL_FENCE,
+
 	HOST1X_INTR_ACTION_COUNT
 };
 
diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
index 27b8761c3f35..e931756a5e16 100644
--- a/drivers/gpu/host1x/uapi.c
+++ b/drivers/gpu/host1x/uapi.c
@@ -11,8 +11,10 @@
 #include <linux/fs.h>
 #include <linux/host1x.h>
 #include <linux/nospec.h>
+#include <linux/sync_file.h>
 
 #include "dev.h"
+#include "fence.h"
 #include "syncpt.h"
 #include "uapi.h"
 
@@ -195,6 +197,99 @@ static int dev_file_ioctl_alloc_syncpoint(struct host1x *host1x,
 	return err;
 }
 
+static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
+{
+	struct host1x_create_fence args;
+	unsigned long copy_err;
+	int fd;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	if (args.reserved[0])
+		return -EINVAL;
+
+	if (args.id >= host1x_syncpt_nb_pts(host1x))
+		return -EINVAL;
+
+	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
+
+	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
+	if (fd < 0)
+		return fd;
+
+	args.fence_fd = fd;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	return 0;
+}
+
+static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
+{
+	struct host1x_fence_extract_fence __user *fences_user_ptr;
+	struct dma_fence *fence, **fences;
+	struct host1x_fence_extract args;
+	struct dma_fence_array *array;
+	unsigned int num_fences, i;
+	unsigned long copy_err;
+	int err;
+
+	copy_err = copy_from_user(&args, data, sizeof(args));
+	if (copy_err)
+		return -EFAULT;
+
+	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
+
+	if (args.reserved[0] || args.reserved[1])
+		return -EINVAL;
+
+	fence = sync_file_get_fence(args.fence_fd);
+	if (!fence)
+		return -EINVAL;
+
+	array = to_dma_fence_array(fence);
+	if (array) {
+		fences = array->fences;
+		num_fences = array->num_fences;
+	} else {
+		fences = &fence;
+		num_fences = 1;
+	}
+
+	for (i = 0; i < min(num_fences, args.num_fences); i++) {
+		struct host1x_fence_extract_fence f;
+
+		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
+		if (err)
+			goto put_fence;
+
+		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
+		if (copy_err) {
+			err = -EFAULT;
+			goto put_fence;
+		}
+	}
+
+	args.num_fences = i+1;
+
+	copy_err = copy_to_user(data, &args, sizeof(args));
+	if (copy_err) {
+		err = -EFAULT;
+		goto put_fence;
+	}
+
+	return 0;
+
+put_fence:
+	dma_fence_put(fence);
+
+	return err;
+}
+
 static long dev_file_ioctl(struct file *file, unsigned int cmd,
 			   unsigned long arg)
 {
@@ -210,6 +305,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
 		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
 		break;
 
+	case HOST1X_IOCTL_CREATE_FENCE:
+		err = dev_file_ioctl_create_fence(file->private_data, data);
+		break;
+
+	case HOST1X_IOCTL_FENCE_EXTRACT:
+		err = dev_file_ioctl_fence_extract(file->private_data, data);
+		break;
+
 	default:
 		err = -ENOTTY;
 	}
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index b3178ae51cae..080f9d3d29eb 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -165,6 +165,10 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
+struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
+int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);
+
 /*
  * host1x channel
  */
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 10/21] gpu: host1x: Add no-recovery mode
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Update for change in put_ref prototype.
* Fixed typo in comment.
v3:
* Move 'locked' check inside CDMA lock to prevent race
* Add clarifying comment to NOP-patching code
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 58 ++++++++++++++++++++++++++----
 drivers/gpu/host1x/hw/channel_hw.c |  2 +-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++
 include/linux/host1x.h             |  9 +++++
 7 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 5a6037eff37f..cd81b52a9e06 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..765e5aa64eb6 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,44 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				/*
+				 * Overwrite opcodes with 0 word writes
+				 * to offset 0xbad. This does nothing but
+				 * has a easily detected signature in debug
+				 * traces.
+				 */
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
@@ -490,6 +526,16 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 
 	mutex_lock(&cdma->lock);
 
+	/*
+	 * Check if syncpoint was locked due to previous job timeout.
+	 * This needs to be done within the cdma lock to avoid a race
+	 * with the timeout handler.
+	 */
+	if (job->syncpt->locked) {
+		mutex_unlock(&cdma->lock);
+		return -EPERM;
+	}
+
 	if (job->timeout) {
 		/* init state on first submit with timeout value */
 		if (!cdma->timeout.initialized) {
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..bf21512e5078 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -191,7 +191,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index adbdc225de8d..8f59b34672c2 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter, false);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 8a189a7c8d68..81315cd1a3ed 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index a6766f8d55ee..93e894677d89 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -115,4 +122,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 080f9d3d29eb..81ca70066c76 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -233,9 +233,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -256,6 +262,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 10/21] gpu: host1x: Add no-recovery mode
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add a new property for jobs to enable or disable recovery i.e.
CPU increments of syncpoints to max value on job timeout. This
allows for a more solid model for hanged jobs, where userspace
doesn't need to guess if a syncpoint increment happened because
the job completed, or because job timeout was triggered.

On job timeout, we stop the channel, NOP all future jobs on the
channel using the same syncpoint, mark the syncpoint as locked
and resume the channel from the next job, if any.

The future jobs are NOPed, since because we don't do the CPU
increments, the value of the syncpoint is no longer synchronized,
and any waiters would become confused if a future job incremented
the syncpoint. The syncpoint is marked locked to ensure that any
future jobs cannot increment the syncpoint either, until the
application has recognized the situation and reallocated the
syncpoint.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Update for change in put_ref prototype.
* Fixed typo in comment.
v3:
* Move 'locked' check inside CDMA lock to prevent race
* Add clarifying comment to NOP-patching code
---
 drivers/gpu/drm/tegra/drm.c        |  1 +
 drivers/gpu/host1x/cdma.c          | 58 ++++++++++++++++++++++++++----
 drivers/gpu/host1x/hw/channel_hw.c |  2 +-
 drivers/gpu/host1x/job.c           |  4 +++
 drivers/gpu/host1x/syncpt.c        |  2 ++
 drivers/gpu/host1x/syncpt.h        | 12 +++++++
 include/linux/host1x.h             |  9 +++++
 7 files changed, 81 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 5a6037eff37f..cd81b52a9e06 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -197,6 +197,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->client = client;
 	job->class = client->class;
 	job->serialize = true;
+	job->syncpt_recovery = true;
 
 	/*
 	 * Track referenced BOs so that they can be unreferenced after the
diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 6e6ca774f68d..765e5aa64eb6 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -312,10 +312,6 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 	bool signal = false;
 	struct host1x_job *job, *n;
 
-	/* If CDMA is stopped, queue is cleared and we can return */
-	if (!cdma->running)
-		return;
-
 	/*
 	 * Walk the sync queue, reading the sync point registers as necessary,
 	 * to consume as many sync queue entries as possible without blocking
@@ -324,7 +320,8 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 		struct host1x_syncpt *sp = job->syncpt;
 
 		/* Check whether this syncpt has completed, and bail if not */
-		if (!host1x_syncpt_is_expired(sp, job->syncpt_end)) {
+		if (!host1x_syncpt_is_expired(sp, job->syncpt_end) &&
+		    !job->cancelled) {
 			/* Start timer on next pending syncpt */
 			if (job->timeout)
 				cdma_start_timer_locked(cdma, job);
@@ -413,8 +410,11 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 	else
 		restart_addr = cdma->last_pos;
 
+	if (!job)
+		goto resume;
+
 	/* do CPU increments for the remaining syncpts */
-	if (job) {
+	if (job->syncpt_recovery) {
 		dev_dbg(dev, "%s: perform CPU incr on pending buffers\n",
 			__func__);
 
@@ -433,8 +433,44 @@ void host1x_cdma_update_sync_queue(struct host1x_cdma *cdma,
 
 		dev_dbg(dev, "%s: finished sync_queue modification\n",
 			__func__);
+	} else {
+		struct host1x_job *failed_job = job;
+
+		host1x_job_dump(dev, job);
+
+		host1x_syncpt_set_locked(job->syncpt);
+		failed_job->cancelled = true;
+
+		list_for_each_entry_continue(job, &cdma->sync_queue, list) {
+			unsigned int i;
+
+			if (job->syncpt != failed_job->syncpt)
+				continue;
+
+			for (i = 0; i < job->num_slots; i++) {
+				unsigned int slot = (job->first_get/8 + i) %
+						    HOST1X_PUSHBUFFER_SLOTS;
+				u32 *mapped = cdma->push_buffer.mapped;
+
+				/*
+				 * Overwrite opcodes with 0 word writes
+				 * to offset 0xbad. This does nothing but
+				 * has a easily detected signature in debug
+				 * traces.
+				 */
+				mapped[2*slot+0] = 0x1bad0000;
+				mapped[2*slot+1] = 0x1bad0000;
+			}
+
+			job->cancelled = true;
+		}
+
+		wmb();
+
+		update_cdma_locked(cdma);
 	}
 
+resume:
 	/* roll back DMAGET and start up channel again */
 	host1x_hw_cdma_resume(host1x, cdma, restart_addr);
 }
@@ -490,6 +526,16 @@ int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 
 	mutex_lock(&cdma->lock);
 
+	/*
+	 * Check if syncpoint was locked due to previous job timeout.
+	 * This needs to be done within the cdma lock to avoid a race
+	 * with the timeout handler.
+	 */
+	if (job->syncpt->locked) {
+		mutex_unlock(&cdma->lock);
+		return -EPERM;
+	}
+
 	if (job->timeout) {
 		/* init state on first submit with timeout value */
 		if (!cdma->timeout.initialized) {
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index d4c28faf27d1..bf21512e5078 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -191,7 +191,7 @@ static int channel_submit(struct host1x_job *job)
 	/* schedule a submit complete interrupt */
 	err = host1x_intr_add_action(host, sp, syncval,
 				     HOST1X_INTR_ACTION_SUBMIT_COMPLETE, ch,
-				     completed_waiter, NULL);
+				     completed_waiter, &job->waiter);
 	completed_waiter = NULL;
 	WARN(err, "Failed to set submit complete interrupt");
 
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index adbdc225de8d..8f59b34672c2 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,10 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->waiter)
+		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
+				    job->waiter, false);
+
 	if (job->syncpt)
 		host1x_syncpt_put(job->syncpt);
 
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 8a189a7c8d68..81315cd1a3ed 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,8 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	sp->locked = false;
+
 	mutex_lock(&sp->host->syncpt_mutex);
 
 	host1x_syncpt_base_free(sp->base);
diff --git a/drivers/gpu/host1x/syncpt.h b/drivers/gpu/host1x/syncpt.h
index a6766f8d55ee..93e894677d89 100644
--- a/drivers/gpu/host1x/syncpt.h
+++ b/drivers/gpu/host1x/syncpt.h
@@ -40,6 +40,13 @@ struct host1x_syncpt {
 
 	/* interrupt data */
 	struct host1x_syncpt_intr intr;
+
+	/* 
+	 * If a submission incrementing this syncpoint fails, lock it so that
+	 * further submission cannot be made until application has handled the
+	 * failure.
+	 */
+	bool locked;
 };
 
 /* Initialize sync point array  */
@@ -115,4 +122,9 @@ static inline int host1x_syncpt_is_valid(struct host1x_syncpt *sp)
 	return sp->id < host1x_syncpt_nb_pts(sp->host);
 }
 
+static inline void host1x_syncpt_set_locked(struct host1x_syncpt *sp)
+{
+	sp->locked = true;
+}
+
 #endif
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 080f9d3d29eb..81ca70066c76 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -233,9 +233,15 @@ struct host1x_job {
 	u32 syncpt_incrs;
 	u32 syncpt_end;
 
+	/* Completion waiter ref */
+	void *waiter;
+
 	/* Maximum time to wait for this job */
 	unsigned int timeout;
 
+	/* Job has timed out and should be released */
+	bool cancelled;
+
 	/* Index and number of slots used in the push buffer */
 	unsigned int first_get;
 	unsigned int num_slots;
@@ -256,6 +262,9 @@ struct host1x_job {
 
 	/* Add a channel wait for previous ops to complete */
 	bool serialize;
+
+	/* Fast-forward syncpoint increments on job timeout */
+	bool syncpt_recovery;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 11/21] gpu: host1x: Add job release callback
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 3 +++
 include/linux/host1x.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 8f59b34672c2..09097e19c0d0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->release)
+		job->release(job);
+
 	if (job->waiter)
 		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
 				    job->waiter, false);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 81ca70066c76..d48cab563d5c 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -265,6 +265,10 @@ struct host1x_job {
 
 	/* Fast-forward syncpoint increments on job timeout */
 	bool syncpt_recovery;
+
+	/* Callback called when job is freed */
+	void (*release)(struct host1x_job *job);
+	void *user_data;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 11/21] gpu: host1x: Add job release callback
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add a callback field to the job structure, to be called just before
the job is to be freed. This allows the job's submitter to clean
up any of its own state, like decrement runtime PM refcounts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/job.c | 3 +++
 include/linux/host1x.h   | 4 ++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 8f59b34672c2..09097e19c0d0 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
 {
 	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
 
+	if (job->release)
+		job->release(job);
+
 	if (job->waiter)
 		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
 				    job->waiter, false);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 81ca70066c76..d48cab563d5c 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -265,6 +265,10 @@ struct host1x_job {
 
 	/* Fast-forward syncpoint increments on job timeout */
 	bool syncpt_recovery;
+
+	/* Callback called when job is freed */
+	void (*release)(struct host1x_job *job);
+	void *user_data;
 };
 
 struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 12/21] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 51 +++++++++++++++--------
 drivers/gpu/host1x/hw/debug_hw.c   |  9 +++-
 drivers/gpu/host1x/job.c           | 67 +++++++++++++++++++++---------
 drivers/gpu/host1x/job.h           | 14 +++++++
 include/linux/host1x.h             |  5 ++-
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index bf21512e5078..d88a32f73f5e 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -55,31 +55,46 @@ static void submit_gathers(struct host1x_job *job)
 #endif
 	unsigned int i;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
-		dma_addr_t addr = g->base + g->offset;
-		u32 op2, op3;
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_cmd *cmd = &job->cmds[i];
 
-		op2 = lower_32_bits(addr);
-		op3 = upper_32_bits(addr);
+		if (cmd->is_wait) {
+			/* TODO use modern wait */
+			host1x_cdma_push(cdma,
+				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+					host1x_uclass_wait_syncpt_r(), 1),
+				 host1x_class_host_wait_syncpt(cmd->wait.id,
+					cmd->wait.threshold));
+			host1x_cdma_push(
+				cdma, host1x_opcode_setclass(job->class, 0, 0),
+				HOST1X_OPCODE_NOP);
+		} else {
+			struct host1x_job_gather *g = &cmd->gather;
 
-		trace_write_gather(cdma, g->bo, g->offset, g->words);
+			dma_addr_t addr = g->base + g->offset;
+			u32 op2, op3;
 
-		if (op3 != 0) {
+			op2 = lower_32_bits(addr);
+			op3 = upper_32_bits(addr);
+
+			trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+			if (op3 != 0) {
 #if HOST1X_HW >= 6
-			u32 op1 = host1x_opcode_gather_wide(g->words);
-			u32 op4 = HOST1X_OPCODE_NOP;
+				u32 op1 = host1x_opcode_gather_wide(g->words);
+				u32 op4 = HOST1X_OPCODE_NOP;
 
-			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+				host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
 #else
-			dev_err(dev, "invalid gather for push buffer %pad\n",
-				&addr);
-			continue;
+				dev_err(dev, "invalid gather for push buffer %pad\n",
+					&addr);
+				continue;
 #endif
-		} else {
-			u32 op1 = host1x_opcode_gather(g->words);
+			} else {
+				u32 op1 = host1x_opcode_gather(g->words);
 
-			host1x_cdma_push(cdma, op1, op2);
+				host1x_cdma_push(cdma, op1, op2);
+			}
 		}
 	}
 }
@@ -126,7 +141,7 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
 	trace_host1x_channel_submit(dev_name(ch->dev),
-				    job->num_gathers, job->num_relocs,
+				    job->num_cmds, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index ceb48229d14b..35952fd5597e 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -208,10 +208,15 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
-		for (i = 0; i < job->num_gathers; i++) {
-			struct host1x_job_gather *g = &job->gathers[i];
+		for (i = 0; i < job->num_cmds; i++) {
+			struct host1x_job_gather *g;
 			u32 *mapped;
 
+			if (job->cmds[i].is_wait)
+				continue;
+
+			g = &job->cmds[i].gather;
+
 			if (job->gather_copy_mapped)
 				mapped = (u32 *)job->gather_copy_mapped;
 			else
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 09097e19c0d0..a2ba9995582a 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -38,7 +38,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	total = sizeof(struct host1x_job) +
 		(u64)num_relocs * sizeof(struct host1x_reloc) +
 		(u64)num_unpins * sizeof(struct host1x_job_unpin_data) +
-		(u64)num_cmdbufs * sizeof(struct host1x_job_gather) +
+		(u64)num_cmdbufs * sizeof(struct host1x_job_cmd) +
 		(u64)num_unpins * sizeof(dma_addr_t) +
 		(u64)num_unpins * sizeof(u32 *);
 	if (total > ULONG_MAX)
@@ -57,8 +57,8 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	mem += num_relocs * sizeof(struct host1x_reloc);
 	job->unpins = num_unpins ? mem : NULL;
 	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
-	job->gathers = num_cmdbufs ? mem : NULL;
-	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->cmds = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_cmd);
 	job->addr_phys = num_unpins ? mem : NULL;
 
 	job->reloc_addr_phys = job->addr_phys;
@@ -101,22 +101,35 @@ EXPORT_SYMBOL(host1x_job_put);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset)
 {
-	struct host1x_job_gather *gather = &job->gathers[job->num_gathers];
+	struct host1x_job_gather *gather = &job->cmds[job->num_cmds].gather;
 
 	gather->words = words;
 	gather->bo = bo;
 	gather->offset = offset;
 
-	job->num_gathers++;
+	job->num_cmds++;
 }
 EXPORT_SYMBOL(host1x_job_add_gather);
 
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh)
+{
+	struct host1x_job_cmd *cmd = &job->cmds[job->num_cmds];
+
+	cmd->is_wait = true;
+	cmd->wait.id = id;
+	cmd->wait.threshold = thresh;
+
+	job->num_cmds++;
+}
+EXPORT_SYMBOL(host1x_job_add_wait);
+
 static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	struct host1x_client *client = job->client;
 	struct device *dev = client->dev;
 	struct host1x_job_gather *g;
 	struct iommu_domain *domain;
+	struct sg_table *sgt;
 	unsigned int i;
 	int err;
 
@@ -126,7 +139,6 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	for (i = 0; i < job->num_relocs; i++) {
 		struct host1x_reloc *reloc = &job->relocs[i];
 		dma_addr_t phys_addr, *phys;
-		struct sg_table *sgt;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
 		if (!reloc->target.bo) {
@@ -202,17 +214,20 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 		return 0;
 
-	for (i = 0; i < job->num_gathers; i++) {
+	for (i = 0; i < job->num_cmds; i++) {
 		size_t gather_size = 0;
 		struct scatterlist *sg;
-		struct sg_table *sgt;
 		dma_addr_t phys_addr;
 		unsigned long shift;
 		struct iova *alloc;
 		dma_addr_t *phys;
 		unsigned int j;
 
-		g = &job->gathers[i];
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
+
 		g->bo = host1x_bo_get(g->bo);
 		if (!g->bo) {
 			err = -EINVAL;
@@ -545,8 +560,13 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 	fw.num_relocs = job->num_relocs;
 	fw.class = job->class;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
 
 		size += g->words * sizeof(u32);
 	}
@@ -568,10 +588,14 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 
 	job->gather_copy_size = size;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
 		void *gather;
 
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
+
 		/* Copy the gather */
 		gather = host1x_bo_mmap(g->bo);
 		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
@@ -614,8 +638,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 	}
 
 	/* patch gathers */
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
 
 		/* process each gather mem only once */
 		if (g->handled)
@@ -625,10 +653,11 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 			g->base = job->gather_addr_phys[i];
 
-		for (j = i + 1; j < job->num_gathers; j++) {
-			if (job->gathers[j].bo == g->bo) {
-				job->gathers[j].handled = true;
-				job->gathers[j].base = g->base;
+		for (j = i + 1; j < job->num_cmds; j++) {
+			if (!job->cmds[j].is_wait &&
+			    job->cmds[j].gather.bo == g->bo) {
+				job->cmds[j].gather.handled = true;
+				job->cmds[j].gather.base = g->base;
 			}
 		}
 
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 94bc2e4ae241..33adfaede842 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -18,6 +18,20 @@ struct host1x_job_gather {
 	bool handled;
 };
 
+struct host1x_job_wait {
+	u32 id;
+	u32 threshold;
+};
+
+struct host1x_job_cmd {
+	bool is_wait;
+
+	union {
+		struct host1x_job_gather gather;
+		struct host1x_job_wait wait;
+	};
+};
+
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index d48cab563d5c..0a46d12b69f0 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -215,8 +215,8 @@ struct host1x_job {
 	struct host1x_client *client;
 
 	/* Gathers and their memory */
-	struct host1x_job_gather *gathers;
-	unsigned int num_gathers;
+	struct host1x_job_cmd *cmds;
+	unsigned int num_cmds;
 
 	/* Array of handles to be pinned & unpinned */
 	struct host1x_reloc *relocs;
@@ -275,6 +275,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 				    u32 num_cmdbufs, u32 num_relocs);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset);
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh);
 struct host1x_job *host1x_job_get(struct host1x_job *job);
 void host1x_job_put(struct host1x_job *job);
 int host1x_job_pin(struct host1x_job *job, struct device *dev);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 12/21] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add support for inserting syncpoint waits in the CDMA pushbuffer.
These waits need to be done in HOST1X class, while gather submitted
by the application execute in engine class.

Support is added by converting the gather list of job into a command
list that can include both gathers and waits. When the job is
submitted, these commands are pushed as the appropriate opcodes
on the CDMA pushbuffer.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c | 51 +++++++++++++++--------
 drivers/gpu/host1x/hw/debug_hw.c   |  9 +++-
 drivers/gpu/host1x/job.c           | 67 +++++++++++++++++++++---------
 drivers/gpu/host1x/job.h           | 14 +++++++
 include/linux/host1x.h             |  5 ++-
 5 files changed, 105 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index bf21512e5078..d88a32f73f5e 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -55,31 +55,46 @@ static void submit_gathers(struct host1x_job *job)
 #endif
 	unsigned int i;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
-		dma_addr_t addr = g->base + g->offset;
-		u32 op2, op3;
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_cmd *cmd = &job->cmds[i];
 
-		op2 = lower_32_bits(addr);
-		op3 = upper_32_bits(addr);
+		if (cmd->is_wait) {
+			/* TODO use modern wait */
+			host1x_cdma_push(cdma,
+				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+					host1x_uclass_wait_syncpt_r(), 1),
+				 host1x_class_host_wait_syncpt(cmd->wait.id,
+					cmd->wait.threshold));
+			host1x_cdma_push(
+				cdma, host1x_opcode_setclass(job->class, 0, 0),
+				HOST1X_OPCODE_NOP);
+		} else {
+			struct host1x_job_gather *g = &cmd->gather;
 
-		trace_write_gather(cdma, g->bo, g->offset, g->words);
+			dma_addr_t addr = g->base + g->offset;
+			u32 op2, op3;
 
-		if (op3 != 0) {
+			op2 = lower_32_bits(addr);
+			op3 = upper_32_bits(addr);
+
+			trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+			if (op3 != 0) {
 #if HOST1X_HW >= 6
-			u32 op1 = host1x_opcode_gather_wide(g->words);
-			u32 op4 = HOST1X_OPCODE_NOP;
+				u32 op1 = host1x_opcode_gather_wide(g->words);
+				u32 op4 = HOST1X_OPCODE_NOP;
 
-			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+				host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
 #else
-			dev_err(dev, "invalid gather for push buffer %pad\n",
-				&addr);
-			continue;
+				dev_err(dev, "invalid gather for push buffer %pad\n",
+					&addr);
+				continue;
 #endif
-		} else {
-			u32 op1 = host1x_opcode_gather(g->words);
+			} else {
+				u32 op1 = host1x_opcode_gather(g->words);
 
-			host1x_cdma_push(cdma, op1, op2);
+				host1x_cdma_push(cdma, op1, op2);
+			}
 		}
 	}
 }
@@ -126,7 +141,7 @@ static int channel_submit(struct host1x_job *job)
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
 
 	trace_host1x_channel_submit(dev_name(ch->dev),
-				    job->num_gathers, job->num_relocs,
+				    job->num_cmds, job->num_relocs,
 				    job->syncpt->id, job->syncpt_incrs);
 
 	/* before error checks, return current max */
diff --git a/drivers/gpu/host1x/hw/debug_hw.c b/drivers/gpu/host1x/hw/debug_hw.c
index ceb48229d14b..35952fd5597e 100644
--- a/drivers/gpu/host1x/hw/debug_hw.c
+++ b/drivers/gpu/host1x/hw/debug_hw.c
@@ -208,10 +208,15 @@ static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 				    job->first_get, job->timeout,
 				    job->num_slots, job->num_unpins);
 
-		for (i = 0; i < job->num_gathers; i++) {
-			struct host1x_job_gather *g = &job->gathers[i];
+		for (i = 0; i < job->num_cmds; i++) {
+			struct host1x_job_gather *g;
 			u32 *mapped;
 
+			if (job->cmds[i].is_wait)
+				continue;
+
+			g = &job->cmds[i].gather;
+
 			if (job->gather_copy_mapped)
 				mapped = (u32 *)job->gather_copy_mapped;
 			else
diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
index 09097e19c0d0..a2ba9995582a 100644
--- a/drivers/gpu/host1x/job.c
+++ b/drivers/gpu/host1x/job.c
@@ -38,7 +38,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	total = sizeof(struct host1x_job) +
 		(u64)num_relocs * sizeof(struct host1x_reloc) +
 		(u64)num_unpins * sizeof(struct host1x_job_unpin_data) +
-		(u64)num_cmdbufs * sizeof(struct host1x_job_gather) +
+		(u64)num_cmdbufs * sizeof(struct host1x_job_cmd) +
 		(u64)num_unpins * sizeof(dma_addr_t) +
 		(u64)num_unpins * sizeof(u32 *);
 	if (total > ULONG_MAX)
@@ -57,8 +57,8 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 	mem += num_relocs * sizeof(struct host1x_reloc);
 	job->unpins = num_unpins ? mem : NULL;
 	mem += num_unpins * sizeof(struct host1x_job_unpin_data);
-	job->gathers = num_cmdbufs ? mem : NULL;
-	mem += num_cmdbufs * sizeof(struct host1x_job_gather);
+	job->cmds = num_cmdbufs ? mem : NULL;
+	mem += num_cmdbufs * sizeof(struct host1x_job_cmd);
 	job->addr_phys = num_unpins ? mem : NULL;
 
 	job->reloc_addr_phys = job->addr_phys;
@@ -101,22 +101,35 @@ EXPORT_SYMBOL(host1x_job_put);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset)
 {
-	struct host1x_job_gather *gather = &job->gathers[job->num_gathers];
+	struct host1x_job_gather *gather = &job->cmds[job->num_cmds].gather;
 
 	gather->words = words;
 	gather->bo = bo;
 	gather->offset = offset;
 
-	job->num_gathers++;
+	job->num_cmds++;
 }
 EXPORT_SYMBOL(host1x_job_add_gather);
 
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh)
+{
+	struct host1x_job_cmd *cmd = &job->cmds[job->num_cmds];
+
+	cmd->is_wait = true;
+	cmd->wait.id = id;
+	cmd->wait.threshold = thresh;
+
+	job->num_cmds++;
+}
+EXPORT_SYMBOL(host1x_job_add_wait);
+
 static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 {
 	struct host1x_client *client = job->client;
 	struct device *dev = client->dev;
 	struct host1x_job_gather *g;
 	struct iommu_domain *domain;
+	struct sg_table *sgt;
 	unsigned int i;
 	int err;
 
@@ -126,7 +139,6 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	for (i = 0; i < job->num_relocs; i++) {
 		struct host1x_reloc *reloc = &job->relocs[i];
 		dma_addr_t phys_addr, *phys;
-		struct sg_table *sgt;
 
 		reloc->target.bo = host1x_bo_get(reloc->target.bo);
 		if (!reloc->target.bo) {
@@ -202,17 +214,20 @@ static unsigned int pin_job(struct host1x *host, struct host1x_job *job)
 	if (IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 		return 0;
 
-	for (i = 0; i < job->num_gathers; i++) {
+	for (i = 0; i < job->num_cmds; i++) {
 		size_t gather_size = 0;
 		struct scatterlist *sg;
-		struct sg_table *sgt;
 		dma_addr_t phys_addr;
 		unsigned long shift;
 		struct iova *alloc;
 		dma_addr_t *phys;
 		unsigned int j;
 
-		g = &job->gathers[i];
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
+
 		g->bo = host1x_bo_get(g->bo);
 		if (!g->bo) {
 			err = -EINVAL;
@@ -545,8 +560,13 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 	fw.num_relocs = job->num_relocs;
 	fw.class = job->class;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+
+		g = &job->cmds[i].gather;
 
 		size += g->words * sizeof(u32);
 	}
@@ -568,10 +588,14 @@ static inline int copy_gathers(struct device *host, struct host1x_job *job,
 
 	job->gather_copy_size = size;
 
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
 		void *gather;
 
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
+
 		/* Copy the gather */
 		gather = host1x_bo_mmap(g->bo);
 		memcpy(job->gather_copy_mapped + offset, gather + g->offset,
@@ -614,8 +638,12 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 	}
 
 	/* patch gathers */
-	for (i = 0; i < job->num_gathers; i++) {
-		struct host1x_job_gather *g = &job->gathers[i];
+	for (i = 0; i < job->num_cmds; i++) {
+		struct host1x_job_gather *g;
+
+		if (job->cmds[i].is_wait)
+			continue;
+		g = &job->cmds[i].gather;
 
 		/* process each gather mem only once */
 		if (g->handled)
@@ -625,10 +653,11 @@ int host1x_job_pin(struct host1x_job *job, struct device *dev)
 		if (!IS_ENABLED(CONFIG_TEGRA_HOST1X_FIREWALL))
 			g->base = job->gather_addr_phys[i];
 
-		for (j = i + 1; j < job->num_gathers; j++) {
-			if (job->gathers[j].bo == g->bo) {
-				job->gathers[j].handled = true;
-				job->gathers[j].base = g->base;
+		for (j = i + 1; j < job->num_cmds; j++) {
+			if (!job->cmds[j].is_wait &&
+			    job->cmds[j].gather.bo == g->bo) {
+				job->cmds[j].gather.handled = true;
+				job->cmds[j].gather.base = g->base;
 			}
 		}
 
diff --git a/drivers/gpu/host1x/job.h b/drivers/gpu/host1x/job.h
index 94bc2e4ae241..33adfaede842 100644
--- a/drivers/gpu/host1x/job.h
+++ b/drivers/gpu/host1x/job.h
@@ -18,6 +18,20 @@ struct host1x_job_gather {
 	bool handled;
 };
 
+struct host1x_job_wait {
+	u32 id;
+	u32 threshold;
+};
+
+struct host1x_job_cmd {
+	bool is_wait;
+
+	union {
+		struct host1x_job_gather gather;
+		struct host1x_job_wait wait;
+	};
+};
+
 struct host1x_job_unpin_data {
 	struct host1x_bo *bo;
 	struct sg_table *sgt;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index d48cab563d5c..0a46d12b69f0 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -215,8 +215,8 @@ struct host1x_job {
 	struct host1x_client *client;
 
 	/* Gathers and their memory */
-	struct host1x_job_gather *gathers;
-	unsigned int num_gathers;
+	struct host1x_job_cmd *cmds;
+	unsigned int num_cmds;
 
 	/* Array of handles to be pinned & unpinned */
 	struct host1x_reloc *relocs;
@@ -275,6 +275,7 @@ struct host1x_job *host1x_job_alloc(struct host1x_channel *ch,
 				    u32 num_cmdbufs, u32 num_relocs);
 void host1x_job_add_gather(struct host1x_job *job, struct host1x_bo *bo,
 			   unsigned int words, unsigned int offset);
+void host1x_job_add_wait(struct host1x_job *job, u32 id, u32 thresh);
 struct host1x_job *host1x_job_get(struct host1x_job *job);
 void host1x_job_put(struct host1x_job *job);
 int host1x_job_pin(struct host1x_job *job, struct device *dev);
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 13/21] gpu: host1x: Reset max value when freeing a syncpoint
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Use host1x_syncpt_read instead of read_min to ensure syncpoint
  value is current.
---
 drivers/gpu/host1x/syncpt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 81315cd1a3ed..9c39f5bfc70c 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	atomic_set(&sp->max_val, host1x_syncpt_read(sp));
 	sp->locked = false;
 
 	mutex_lock(&sp->host->syncpt_mutex);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 13/21] gpu: host1x: Reset max value when freeing a syncpoint
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

With job recovery becoming optional, syncpoints may have a mismatch
between their value and max value when freed. As such, when freeing,
set the max value to the current value of the syncpoint so that it
is in a sane state for the next user.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* Use host1x_syncpt_read instead of read_min to ensure syncpoint
  value is current.
---
 drivers/gpu/host1x/syncpt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 81315cd1a3ed..9c39f5bfc70c 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -385,6 +385,7 @@ static void syncpt_release(struct kref *ref)
 {
 	struct host1x_syncpt *sp = container_of(ref, struct host1x_syncpt, ref);
 
+	atomic_set(&sp->max_val, host1x_syncpt_read(sp));
 	sp->locked = false;
 
 	mutex_lock(&sp->host->syncpt_mutex);
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 14/21] gpu: host1x: Reserve VBLANK syncpoints at initialization
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

On T20-T148 chips, the bootloader can set up a boot splash
screen with DC configured to increment syncpoint 26/27
at VBLANK. Because of this we shouldn't allow these syncpoints
to be allocated until DC has been reset and will no longer
increment them in the background.

As such, on these chips, reserve those two syncpoints at
initialization, and only mark them free once the DC
driver has indicated it's safe to do so.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* New patch
---
 drivers/gpu/drm/tegra/dc.c  |  6 ++++++
 drivers/gpu/host1x/dev.c    |  6 ++++++
 drivers/gpu/host1x/dev.h    |  6 ++++++
 drivers/gpu/host1x/syncpt.c | 34 +++++++++++++++++++++++++++++++++-
 include/linux/host1x.h      |  3 +++
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 033032dfc4b9..d0cee40ab3e6 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2033,6 +2033,12 @@ static int tegra_dc_init(struct host1x_client *client)
 	struct drm_plane *cursor = NULL;
 	int err;
 
+	/*
+	 * DC has been reset by now, so VBLANK syncpoint can be released
+	 * for general use.
+	 */
+	host1x_syncpt_release_vblank_reservation(client, 26 + dc->pipe);
+
 	/*
 	 * XXX do not register DCs with no window groups because we cannot
 	 * assign a primary plane to them, which in turn will cause KMS to
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 641317d23828..8b50fbb22846 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -77,6 +77,7 @@ static const struct host1x_info host1x01_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x02_info = {
@@ -91,6 +92,7 @@ static const struct host1x_info host1x02_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x04_info = {
@@ -105,6 +107,7 @@ static const struct host1x_info host1x04_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_info host1x05_info = {
@@ -119,6 +122,7 @@ static const struct host1x_info host1x05_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra186_sid_table[] = {
@@ -142,6 +146,7 @@ static const struct host1x_info host1x06_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
 	.sid_table = tegra186_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra194_sid_table[] = {
@@ -165,6 +170,7 @@ static const struct host1x_info host1x07_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra194_sid_table),
 	.sid_table = tegra194_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct of_device_id host1x_of_match[] = {
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 7b8b7e20e32b..e360bc4a25f6 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -102,6 +102,12 @@ struct host1x_info {
 	bool has_hypervisor; /* has hypervisor registers */
 	unsigned int num_sid_entries;
 	const struct host1x_sid_entry *sid_table;
+	/*
+	 * On T20-T148, the boot chain may setup DC to increment syncpoints
+	 * 26/27 on VBLANK. As such we cannot use these syncpoints until
+	 * the display driver disables VBLANK increments.
+	 */
+	bool reserve_vblank_syncpts;
 };
 
 struct host1x {
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9c39f5bfc70c..100270ac5bcf 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -52,7 +52,7 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 
 	mutex_lock(&host->syncpt_mutex);
 
-	for (i = 0; i < host->info->nb_pts && sp->name; i++, sp++)
+	for (i = 0; i < host->info->nb_pts && kref_read(&sp->ref); i++, sp++)
 		;
 
 	if (i >= host->info->nb_pts)
@@ -359,6 +359,11 @@ int host1x_syncpt_init(struct host1x *host)
 	if (!host->nop_sp)
 		return -ENOMEM;
 
+	if (host->info->reserve_vblank_syncpts) {
+		kref_init(&host->syncpt[26].ref);
+		kref_init(&host->syncpt[27].ref);
+	}
+
 	return 0;
 }
 
@@ -545,3 +550,30 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base)
 	return base->id;
 }
 EXPORT_SYMBOL(host1x_syncpt_base_id);
+
+static void do_nothing(struct kref *ref)
+{
+}
+
+/**
+ * host1x_syncpt_release_vblank_reservation() - Make VBLANK syncpoint
+ *   available for allocation
+ *
+ * @client: host1x bus client
+ *
+ * Makes VBLANK<i> syncpoint available for allocatation if it was
+ * reserved at initialization time. This should be called by the display
+ * driver after it has ensured that any VBLANK increment programming configured
+ * by the boot chain has been disabled.
+ */
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id)
+{
+	struct host1x *host = dev_get_drvdata(client->host->parent);
+
+	if (!host->info->reserve_vblank_syncpts)
+		return;
+
+	kref_put(&host->syncpt[syncpt_id].ref, do_nothing);
+}
+EXPORT_SYMBOL(host1x_syncpt_release_vblank_reservation);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 0a46d12b69f0..5890f91dd286 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,9 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id);
+
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
 struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 14/21] gpu: host1x: Reserve VBLANK syncpoints at initialization
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

On T20-T148 chips, the bootloader can set up a boot splash
screen with DC configured to increment syncpoint 26/27
at VBLANK. Because of this we shouldn't allow these syncpoints
to be allocated until DC has been reset and will no longer
increment them in the background.

As such, on these chips, reserve those two syncpoints at
initialization, and only mark them free once the DC
driver has indicated it's safe to do so.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* New patch
---
 drivers/gpu/drm/tegra/dc.c  |  6 ++++++
 drivers/gpu/host1x/dev.c    |  6 ++++++
 drivers/gpu/host1x/dev.h    |  6 ++++++
 drivers/gpu/host1x/syncpt.c | 34 +++++++++++++++++++++++++++++++++-
 include/linux/host1x.h      |  3 +++
 5 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 033032dfc4b9..d0cee40ab3e6 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -2033,6 +2033,12 @@ static int tegra_dc_init(struct host1x_client *client)
 	struct drm_plane *cursor = NULL;
 	int err;
 
+	/*
+	 * DC has been reset by now, so VBLANK syncpoint can be released
+	 * for general use.
+	 */
+	host1x_syncpt_release_vblank_reservation(client, 26 + dc->pipe);
+
 	/*
 	 * XXX do not register DCs with no window groups because we cannot
 	 * assign a primary plane to them, which in turn will cause KMS to
diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 641317d23828..8b50fbb22846 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -77,6 +77,7 @@ static const struct host1x_info host1x01_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x02_info = {
@@ -91,6 +92,7 @@ static const struct host1x_info host1x02_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = true,
 };
 
 static const struct host1x_info host1x04_info = {
@@ -105,6 +107,7 @@ static const struct host1x_info host1x04_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_info host1x05_info = {
@@ -119,6 +122,7 @@ static const struct host1x_info host1x05_info = {
 	.has_hypervisor = false,
 	.num_sid_entries = 0,
 	.sid_table = NULL,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra186_sid_table[] = {
@@ -142,6 +146,7 @@ static const struct host1x_info host1x06_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
 	.sid_table = tegra186_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct host1x_sid_entry tegra194_sid_table[] = {
@@ -165,6 +170,7 @@ static const struct host1x_info host1x07_info = {
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra194_sid_table),
 	.sid_table = tegra194_sid_table,
+	.reserve_vblank_syncpts = false,
 };
 
 static const struct of_device_id host1x_of_match[] = {
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 7b8b7e20e32b..e360bc4a25f6 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -102,6 +102,12 @@ struct host1x_info {
 	bool has_hypervisor; /* has hypervisor registers */
 	unsigned int num_sid_entries;
 	const struct host1x_sid_entry *sid_table;
+	/*
+	 * On T20-T148, the boot chain may setup DC to increment syncpoints
+	 * 26/27 on VBLANK. As such we cannot use these syncpoints until
+	 * the display driver disables VBLANK increments.
+	 */
+	bool reserve_vblank_syncpts;
 };
 
 struct host1x {
diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
index 9c39f5bfc70c..100270ac5bcf 100644
--- a/drivers/gpu/host1x/syncpt.c
+++ b/drivers/gpu/host1x/syncpt.c
@@ -52,7 +52,7 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 
 	mutex_lock(&host->syncpt_mutex);
 
-	for (i = 0; i < host->info->nb_pts && sp->name; i++, sp++)
+	for (i = 0; i < host->info->nb_pts && kref_read(&sp->ref); i++, sp++)
 		;
 
 	if (i >= host->info->nb_pts)
@@ -359,6 +359,11 @@ int host1x_syncpt_init(struct host1x *host)
 	if (!host->nop_sp)
 		return -ENOMEM;
 
+	if (host->info->reserve_vblank_syncpts) {
+		kref_init(&host->syncpt[26].ref);
+		kref_init(&host->syncpt[27].ref);
+	}
+
 	return 0;
 }
 
@@ -545,3 +550,30 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base)
 	return base->id;
 }
 EXPORT_SYMBOL(host1x_syncpt_base_id);
+
+static void do_nothing(struct kref *ref)
+{
+}
+
+/**
+ * host1x_syncpt_release_vblank_reservation() - Make VBLANK syncpoint
+ *   available for allocation
+ *
+ * @client: host1x bus client
+ *
+ * Makes VBLANK<i> syncpoint available for allocatation if it was
+ * reserved at initialization time. This should be called by the display
+ * driver after it has ensured that any VBLANK increment programming configured
+ * by the boot chain has been disabled.
+ */
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id)
+{
+	struct host1x *host = dev_get_drvdata(client->host->parent);
+
+	if (!host->info->reserve_vblank_syncpts)
+		return;
+
+	kref_put(&host->syncpt[syncpt_id].ref, do_nothing);
+}
+EXPORT_SYMBOL(host1x_syncpt_release_vblank_reservation);
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 0a46d12b69f0..5890f91dd286 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -163,6 +163,9 @@ struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
 struct host1x_syncpt_base *host1x_syncpt_get_base(struct host1x_syncpt *sp);
 u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 
+void host1x_syncpt_release_vblank_reservation(struct host1x_client *client,
+					      u32 syncpt_id);
+
 struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
 
 struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v4:
* Remove features that are not strictly necessary
* Remove padding/reserved fields in IOCTL structs where
  DRM's zero extension feature allows future expansion
v3:
* Remove timeout field
* Inline the syncpt_incrs array to the submit structure
* Remove WRITE_RELOC (it is now implicit)
---
 include/uapi/drm/tegra_drm.h | 338 ++++++++++++++++++++++++++++++++---
 1 file changed, 311 insertions(+), 27 deletions(-)

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..014bc393c298 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@ struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@ struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,307 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+};
+
+/* Submission */
+
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<0)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * Information for relocation patching. Relocation patching will
+	 * be done if the MAP IOCTL that created `mapping_id` did not
+	 * return an IOVA. If an IOVA was returned, the application is
+	 * responsible for patching the address into the gather.
+	 */
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+};
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @syncpt_incr: [in,out]
+	 *
+	 * Information about the syncpoint the job will increment.
+	 */
+	struct drm_tegra_submit_syncpt_incr syncpt_incr;
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Update the tegra_drm.h UAPI header, adding the new proposed UAPI.
The old staging UAPI is left in for now, with minor modification
to avoid name collisions.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v4:
* Remove features that are not strictly necessary
* Remove padding/reserved fields in IOCTL structs where
  DRM's zero extension feature allows future expansion
v3:
* Remove timeout field
* Inline the syncpt_incrs array to the submit structure
* Remove WRITE_RELOC (it is now implicit)
---
 include/uapi/drm/tegra_drm.h | 338 ++++++++++++++++++++++++++++++++---
 1 file changed, 311 insertions(+), 27 deletions(-)

diff --git a/include/uapi/drm/tegra_drm.h b/include/uapi/drm/tegra_drm.h
index c4df3c3668b3..014bc393c298 100644
--- a/include/uapi/drm/tegra_drm.h
+++ b/include/uapi/drm/tegra_drm.h
@@ -1,24 +1,5 @@
-/*
- * Copyright (c) 2012-2013, NVIDIA CORPORATION.  All rights reserved.
- *
- * Permission is hereby granted, free of charge, to any person obtaining a
- * copy of this software and associated documentation files (the "Software"),
- * to deal in the Software without restriction, including without limitation
- * the rights to use, copy, modify, merge, publish, distribute, sublicense,
- * and/or sell copies of the Software, and to permit persons to whom the
- * Software is furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
- * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
- * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
- * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
- * OTHER DEALINGS IN THE SOFTWARE.
- */
+/* SPDX-License-Identifier: MIT */
+/* Copyright (c) 2012-2020 NVIDIA Corporation */
 
 #ifndef _UAPI_TEGRA_DRM_H_
 #define _UAPI_TEGRA_DRM_H_
@@ -29,6 +10,8 @@
 extern "C" {
 #endif
 
+/* TegraDRM legacy UAPI. Only enabled with STAGING */
+
 #define DRM_TEGRA_GEM_CREATE_TILED     (1 << 0)
 #define DRM_TEGRA_GEM_CREATE_BOTTOM_UP (1 << 1)
 
@@ -644,13 +627,13 @@ struct drm_tegra_gem_get_flags {
 	__u32 flags;
 };
 
-#define DRM_TEGRA_GEM_CREATE		0x00
-#define DRM_TEGRA_GEM_MMAP		0x01
+#define DRM_TEGRA_GEM_CREATE_LEGACY	0x00
+#define DRM_TEGRA_GEM_MMAP_LEGACY	0x01
 #define DRM_TEGRA_SYNCPT_READ		0x02
 #define DRM_TEGRA_SYNCPT_INCR		0x03
 #define DRM_TEGRA_SYNCPT_WAIT		0x04
-#define DRM_TEGRA_OPEN_CHANNEL		0x05
-#define DRM_TEGRA_CLOSE_CHANNEL		0x06
+#define DRM_TEGRA_OPEN_CHANNEL	        0x05
+#define DRM_TEGRA_CLOSE_CHANNEL	        0x06
 #define DRM_TEGRA_GET_SYNCPT		0x07
 #define DRM_TEGRA_SUBMIT		0x08
 #define DRM_TEGRA_GET_SYNCPT_BASE	0x09
@@ -659,8 +642,8 @@ struct drm_tegra_gem_get_flags {
 #define DRM_TEGRA_GEM_SET_FLAGS		0x0c
 #define DRM_TEGRA_GEM_GET_FLAGS		0x0d
 
-#define DRM_IOCTL_TEGRA_GEM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE, struct drm_tegra_gem_create)
-#define DRM_IOCTL_TEGRA_GEM_MMAP DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP, struct drm_tegra_gem_mmap)
+#define DRM_IOCTL_TEGRA_GEM_CREATE_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_CREATE_LEGACY, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP_LEGACY DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_MMAP_LEGACY, struct drm_tegra_gem_mmap)
 #define DRM_IOCTL_TEGRA_SYNCPT_READ DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_READ, struct drm_tegra_syncpt_read)
 #define DRM_IOCTL_TEGRA_SYNCPT_INCR DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_INCR, struct drm_tegra_syncpt_incr)
 #define DRM_IOCTL_TEGRA_SYNCPT_WAIT DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_SYNCPT_WAIT, struct drm_tegra_syncpt_wait)
@@ -674,6 +657,307 @@ struct drm_tegra_gem_get_flags {
 #define DRM_IOCTL_TEGRA_GEM_SET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_SET_FLAGS, struct drm_tegra_gem_set_flags)
 #define DRM_IOCTL_TEGRA_GEM_GET_FLAGS DRM_IOWR(DRM_COMMAND_BASE + DRM_TEGRA_GEM_GET_FLAGS, struct drm_tegra_gem_get_flags)
 
+/* New TegraDRM UAPI */
+
+struct drm_tegra_channel_open {
+	/**
+	 * @host1x_class: [in]
+	 *
+	 * Host1x class of the engine that will be programmed using this
+	 * channel.
+	 */
+	__u32 host1x_class;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @channel_ctx: [out]
+	 *
+	 * Opaque identifier corresponding to the opened channel.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @hardware_version: [out]
+	 *
+	 * Version of the engine hardware. This can be used by userspace
+	 * to determine how the engine needs to be programmed.
+	 */
+	__u32 hardware_version;
+};
+
+struct drm_tegra_channel_close {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to close.
+	 */
+	__u32 channel_ctx;
+};
+
+#define DRM_TEGRA_CHANNEL_MAP_READWRITE			(1<<0)
+
+struct drm_tegra_channel_map {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to which make memory available for.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @handle: [in]
+	 *
+	 * GEM handle of the memory to map.
+	 */
+	__u32 handle;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @mapping_id: [out]
+	 *
+	 * Identifier corresponding to the mapping, to be used for
+	 * relocations or unmapping later.
+	 */
+	__u32 mapping_id;
+};
+
+struct drm_tegra_channel_unmap {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Channel identifier of the channel to unmap memory from.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Mapping identifier of the memory mapping to unmap.
+	 */
+	__u32 mapping_id;
+};
+
+/* Submission */
+
+/**
+ * Specify that bit 39 of the patched-in address should be set to
+ * trigger layout swizzling between Tegra and non-Tegra Blocklinear
+ * layout on systems that store surfaces in system memory in non-Tegra
+ * Blocklinear layout.
+ */
+#define DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR		(1<<0)
+
+struct drm_tegra_submit_buf {
+	/**
+	 * @mapping_id: [in]
+	 *
+	 * Identifier of the mapping to use in the submission.
+	 */
+	__u32 mapping_id;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * Information for relocation patching. Relocation patching will
+	 * be done if the MAP IOCTL that created `mapping_id` did not
+	 * return an IOVA. If an IOVA was returned, the application is
+	 * responsible for patching the address into the gather.
+	 */
+	struct {
+		/**
+		 * @target_offset: [in]
+		 *
+		 * Offset from the start of the mapping of the data whose
+		 * address is to be patched into the gather.
+		 */
+		__u64 target_offset;
+
+		/**
+		 * @gather_offset_words: [in]
+		 *
+		 * Offset in words from the start of the gather data to
+		 * where the address should be patched into.
+		 */
+		__u32 gather_offset_words;
+
+		/**
+		 * @shift: [in]
+		 *
+		 * Number of bits the address should be shifted right before
+		 * patching in.
+		 */
+		__u32 shift;
+	} reloc;
+};
+
+struct drm_tegra_submit_syncpt_incr {
+	/**
+	 * @syncpt_fd: [in]
+	 *
+	 * Syncpoint file descriptor of the syncpoint that the job will
+	 * increment.
+	 */
+	__s32 syncpt_fd;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	/**
+	 * @num_incrs: [in]
+	 *
+	 * Number of times the job will increment this syncpoint.
+	 */
+	__u32 num_incrs;
+
+	/**
+	 * @fence_value: [out]
+	 *
+	 * Value the syncpoint will have once the job has completed all
+	 * its specified syncpoint increments.
+	 *
+	 * Note that the kernel may increment the syncpoint before or after
+	 * the job. These increments are not reflected in this field.
+	 *
+	 * If the job hangs or times out, not all of the increments may
+	 * get executed.
+	 */
+	__u32 fence_value;
+};
+
+/**
+ * Execute `words` words of Host1x opcodes specified in the `gather_data_ptr`
+ * buffer. Each GATHER_UPTR command uses successive words from the buffer.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR		0
+/**
+ * Wait for a syncpoint to reach a value before continuing with further
+ * commands.
+ */
+#define DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT		1
+
+struct drm_tegra_submit_cmd_gather_uptr {
+	__u32 words;
+	__u32 reserved[3];
+};
+
+struct drm_tegra_submit_cmd_wait_syncpt {
+	__u32 id;
+	__u32 threshold;
+	__u32 reserved[2];
+};
+
+struct drm_tegra_submit_cmd {
+	/**
+	 * @type: [in]
+	 *
+	 * Command type to execute. One of the DRM_TEGRA_SUBMIT_CMD*
+	 * defines.
+	 */
+	__u32 type;
+
+	/**
+	 * @flags: [in]
+	 *
+	 * Flags.
+	 */
+	__u32 flags;
+
+	union {
+		struct drm_tegra_submit_cmd_gather_uptr gather_uptr;
+		struct drm_tegra_submit_cmd_wait_syncpt wait_syncpt;
+		__u32 reserved[4];
+	};
+};
+
+struct drm_tegra_channel_submit {
+	/**
+	 * @channel_ctx: [in]
+	 *
+	 * Identifier of the channel to submit this job to.
+	 */
+	__u32 channel_ctx;
+
+	/**
+	 * @num_bufs: [in]
+	 *
+	 * Number of elements in the `bufs_ptr` array.
+	 */
+	__u32 num_bufs;
+
+	/**
+	 * @num_cmds: [in]
+	 *
+	 * Number of elements in the `cmds_ptr` array.
+	 */
+	__u32 num_cmds;
+
+	/**
+	 * @gather_data_words: [in]
+	 *
+	 * Number of 32-bit words in the `gather_data_ptr` array.
+	 */
+	__u32 gather_data_words;
+
+	/**
+	 * @bufs_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_buf structures.
+	 */
+	__u64 bufs_ptr;
+
+	/**
+	 * @cmds_ptr: [in]
+	 *
+	 * Pointer to an array of drm_tegra_submit_cmd structures.
+	 */
+	__u64 cmds_ptr;
+
+	/**
+	 * @gather_data_ptr: [in]
+	 *
+	 * Pointer to an array of Host1x opcodes to be used by GATHER_UPTR
+	 * commands.
+	 */
+	__u64 gather_data_ptr;
+
+	/**
+	 * @syncpt_incr: [in,out]
+	 *
+	 * Information about the syncpoint the job will increment.
+	 */
+	struct drm_tegra_submit_syncpt_incr syncpt_incr;
+};
+
+#define DRM_IOCTL_TEGRA_CHANNEL_OPEN     DRM_IOWR(DRM_COMMAND_BASE + 0x10, struct drm_tegra_channel_open)
+#define DRM_IOCTL_TEGRA_CHANNEL_CLOSE    DRM_IOWR(DRM_COMMAND_BASE + 0x11, struct drm_tegra_channel_close)
+#define DRM_IOCTL_TEGRA_CHANNEL_MAP      DRM_IOWR(DRM_COMMAND_BASE + 0x12, struct drm_tegra_channel_map)
+#define DRM_IOCTL_TEGRA_CHANNEL_UNMAP    DRM_IOWR(DRM_COMMAND_BASE + 0x13, struct drm_tegra_channel_unmap)
+#define DRM_IOCTL_TEGRA_CHANNEL_SUBMIT   DRM_IOWR(DRM_COMMAND_BASE + 0x14, struct drm_tegra_channel_submit)
+
+#define DRM_IOCTL_TEGRA_GEM_CREATE       DRM_IOWR(DRM_COMMAND_BASE + 0x15, struct drm_tegra_gem_create)
+#define DRM_IOCTL_TEGRA_GEM_MMAP         DRM_IOWR(DRM_COMMAND_BASE + 0x16, struct drm_tegra_gem_mmap)
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 16/21] drm/tegra: Boot VIC during runtime PM resume
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

With the new UAPI implementation, engines are powered on and off
when there are active jobs, and the core code handles channel
allocation. To accommodate that, boot the engine as part of
runtime PM instead of using the open_channel callback, which is
not used by the new submit path.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* runtime_get/put is now done directly from submit path, so no
  callbacks are added
* Reworded.
---
 drivers/gpu/drm/tegra/vic.c | 114 +++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index cb476da59adc..5d2ad125dca3 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -29,7 +29,6 @@ struct vic_config {
 
 struct vic {
 	struct falcon falcon;
-	bool booted;
 
 	void __iomem *regs;
 	struct tegra_drm_client client;
@@ -52,48 +51,6 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = clk_prepare_enable(vic->clk);
-	if (err < 0)
-		return err;
-
-	usleep_range(10, 20);
-
-	err = reset_control_deassert(vic->rst);
-	if (err < 0)
-		goto disable;
-
-	usleep_range(10, 20);
-
-	return 0;
-
-disable:
-	clk_disable_unprepare(vic->clk);
-	return err;
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = reset_control_assert(vic->rst);
-	if (err < 0)
-		return err;
-
-	usleep_range(2000, 4000);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 #ifdef CONFIG_IOMMU_API
@@ -103,9 +60,6 @@ static int vic_boot(struct vic *vic)
 	void *hdr;
 	int err = 0;
 
-	if (vic->booted)
-		return 0;
-
 #ifdef CONFIG_IOMMU_API
 	if (vic->config->supports_sid && spec) {
 		u32 value;
@@ -153,8 +107,6 @@ static int vic_boot(struct vic *vic)
 		return err;
 	}
 
-	vic->booted = true;
-
 	return 0;
 }
 
@@ -308,35 +260,76 @@ static int vic_load_firmware(struct vic *vic)
 	return err;
 }
 
-static int vic_open_channel(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context)
+
+static int vic_runtime_resume(struct device *dev)
 {
-	struct vic *vic = to_vic(client);
+	struct vic *vic = dev_get_drvdata(dev);
 	int err;
 
-	err = pm_runtime_get_sync(vic->dev);
+	err = clk_prepare_enable(vic->clk);
 	if (err < 0)
 		return err;
 
+	usleep_range(10, 20);
+
+	err = reset_control_deassert(vic->rst);
+	if (err < 0)
+		goto disable;
+
+	usleep_range(10, 20);
+
 	err = vic_load_firmware(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
 
 	err = vic_boot(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
+
+	return 0;
+
+assert:
+	reset_control_assert(vic->rst);
+disable:
+	clk_disable_unprepare(vic->clk);
+	return err;
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_assert(vic->rst);
+	if (err < 0)
+		return err;
+
+	usleep_range(2000, 4000);
+
+	clk_disable_unprepare(vic->clk);
+
+	return 0;
+}
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = pm_runtime_get_sync(vic->dev);
+	if (err < 0) {
+		pm_runtime_put(vic->dev);
+		return err;
+	}
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		err = -ENOMEM;
-		goto rpm_put;
+		pm_runtime_put(vic->dev);
+		return -ENOMEM;
 	}
 
 	return 0;
-
-rpm_put:
-	pm_runtime_put(vic->dev);
-	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
@@ -344,7 +337,6 @@ static void vic_close_channel(struct tegra_drm_context *context)
 	struct vic *vic = to_vic(context->client);
 
 	host1x_channel_put(context->channel);
-
 	pm_runtime_put(vic->dev);
 }
 
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 16/21] drm/tegra: Boot VIC during runtime PM resume
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

With the new UAPI implementation, engines are powered on and off
when there are active jobs, and the core code handles channel
allocation. To accommodate that, boot the engine as part of
runtime PM instead of using the open_channel callback, which is
not used by the new submit path.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v3:
* runtime_get/put is now done directly from submit path, so no
  callbacks are added
* Reworded.
---
 drivers/gpu/drm/tegra/vic.c | 114 +++++++++++++++++-------------------
 1 file changed, 53 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index cb476da59adc..5d2ad125dca3 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -29,7 +29,6 @@ struct vic_config {
 
 struct vic {
 	struct falcon falcon;
-	bool booted;
 
 	void __iomem *regs;
 	struct tegra_drm_client client;
@@ -52,48 +51,6 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = clk_prepare_enable(vic->clk);
-	if (err < 0)
-		return err;
-
-	usleep_range(10, 20);
-
-	err = reset_control_deassert(vic->rst);
-	if (err < 0)
-		goto disable;
-
-	usleep_range(10, 20);
-
-	return 0;
-
-disable:
-	clk_disable_unprepare(vic->clk);
-	return err;
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-	int err;
-
-	err = reset_control_assert(vic->rst);
-	if (err < 0)
-		return err;
-
-	usleep_range(2000, 4000);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 #ifdef CONFIG_IOMMU_API
@@ -103,9 +60,6 @@ static int vic_boot(struct vic *vic)
 	void *hdr;
 	int err = 0;
 
-	if (vic->booted)
-		return 0;
-
 #ifdef CONFIG_IOMMU_API
 	if (vic->config->supports_sid && spec) {
 		u32 value;
@@ -153,8 +107,6 @@ static int vic_boot(struct vic *vic)
 		return err;
 	}
 
-	vic->booted = true;
-
 	return 0;
 }
 
@@ -308,35 +260,76 @@ static int vic_load_firmware(struct vic *vic)
 	return err;
 }
 
-static int vic_open_channel(struct tegra_drm_client *client,
-			    struct tegra_drm_context *context)
+
+static int vic_runtime_resume(struct device *dev)
 {
-	struct vic *vic = to_vic(client);
+	struct vic *vic = dev_get_drvdata(dev);
 	int err;
 
-	err = pm_runtime_get_sync(vic->dev);
+	err = clk_prepare_enable(vic->clk);
 	if (err < 0)
 		return err;
 
+	usleep_range(10, 20);
+
+	err = reset_control_deassert(vic->rst);
+	if (err < 0)
+		goto disable;
+
+	usleep_range(10, 20);
+
 	err = vic_load_firmware(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
 
 	err = vic_boot(vic);
 	if (err < 0)
-		goto rpm_put;
+		goto assert;
+
+	return 0;
+
+assert:
+	reset_control_assert(vic->rst);
+disable:
+	clk_disable_unprepare(vic->clk);
+	return err;
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = reset_control_assert(vic->rst);
+	if (err < 0)
+		return err;
+
+	usleep_range(2000, 4000);
+
+	clk_disable_unprepare(vic->clk);
+
+	return 0;
+}
+
+static int vic_open_channel(struct tegra_drm_client *client,
+			    struct tegra_drm_context *context)
+{
+	struct vic *vic = to_vic(client);
+	int err;
+
+	err = pm_runtime_get_sync(vic->dev);
+	if (err < 0) {
+		pm_runtime_put(vic->dev);
+		return err;
+	}
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		err = -ENOMEM;
-		goto rpm_put;
+		pm_runtime_put(vic->dev);
+		return -ENOMEM;
 	}
 
 	return 0;
-
-rpm_put:
-	pm_runtime_put(vic->dev);
-	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
@@ -344,7 +337,6 @@ static void vic_close_channel(struct tegra_drm_context *context)
 	struct vic *vic = to_vic(context->client);
 
 	host1x_channel_put(context->channel);
-
 	pm_runtime_put(vic->dev);
 }
 
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 17/21] drm/tegra: Set resv fields when importing/exporting GEMs
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To allow sharing of implicit fences when exporting/importing dma_buf
objects, set the 'resv' fields when importing or exporting GEM
objects.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 26af8daa9a16..731e6bdc01b4 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -431,6 +431,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device *drm,
 	}
 
 	bo->gem.import_attach = attach;
+	bo->gem.resv = buf->resv;
 
 	return bo;
 
@@ -683,6 +684,7 @@ struct dma_buf *tegra_gem_prime_export(struct drm_gem_object *gem,
 	exp_info.size = gem->size;
 	exp_info.flags = flags;
 	exp_info.priv = gem;
+	exp_info.resv = gem->resv;
 
 	return drm_gem_dmabuf_export(gem->dev, &exp_info);
 }
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 17/21] drm/tegra: Set resv fields when importing/exporting GEMs
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

To allow sharing of implicit fences when exporting/importing dma_buf
objects, set the 'resv' fields when importing or exporting GEM
objects.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/gem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/tegra/gem.c b/drivers/gpu/drm/tegra/gem.c
index 26af8daa9a16..731e6bdc01b4 100644
--- a/drivers/gpu/drm/tegra/gem.c
+++ b/drivers/gpu/drm/tegra/gem.c
@@ -431,6 +431,7 @@ static struct tegra_bo *tegra_bo_import(struct drm_device *drm,
 	}
 
 	bo->gem.import_attach = attach;
+	bo->gem.resv = buf->resv;
 
 	return bo;
 
@@ -683,6 +684,7 @@ struct dma_buf *tegra_gem_prime_export(struct drm_gem_object *gem,
 	exp_info.size = gem->size;
 	exp_info.flags = flags;
 	exp_info.priv = gem;
+	exp_info.resv = gem->resv;
 
 	return drm_gem_dmabuf_export(gem->dev, &exp_info);
 }
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

To avoid duplication, allocate the per-engine shared channel in the
core code instead. Once MLOCKs are implemented on Host1x side, we
can also update this to avoid allocating a shared channel when
MLOCKs are enabled.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index cd81b52a9e06..afd3f143c5e0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
 int tegra_drm_register_client(struct tegra_drm *tegra,
 			      struct tegra_drm_client *client)
 {
+	/*
+	 * When MLOCKs are implemented, change to allocate a shared channel
+	 * only when MLOCKs are disabled.
+	 */
+	client->shared_channel = host1x_channel_request(&client->base);
+	if (!client->shared_channel)
+		return -EBUSY;
+
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
 	client->drm = tegra;
@@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
+	if (client->shared_channel)
+		host1x_channel_put(client->shared_channel);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index f38de08e0c95..0f38f159aa8e 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -87,8 +87,12 @@ struct tegra_drm_client {
 	struct list_head list;
 	struct tegra_drm *drm;
 
+	/* Set by driver */
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
+
+	/* Set by TegraDRM core */
+	struct host1x_channel *shared_channel;
 };
 
 static inline struct tegra_drm_client *
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

To avoid duplication, allocate the per-engine shared channel in the
core code instead. Once MLOCKs are implemented on Host1x side, we
can also update this to avoid allocating a shared channel when
MLOCKs are enabled.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index cd81b52a9e06..afd3f143c5e0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
 int tegra_drm_register_client(struct tegra_drm *tegra,
 			      struct tegra_drm_client *client)
 {
+	/*
+	 * When MLOCKs are implemented, change to allocate a shared channel
+	 * only when MLOCKs are disabled.
+	 */
+	client->shared_channel = host1x_channel_request(&client->base);
+	if (!client->shared_channel)
+		return -EBUSY;
+
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
 	client->drm = tegra;
@@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
+	if (client->shared_channel)
+		host1x_channel_put(client->shared_channel);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index f38de08e0c95..0f38f159aa8e 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -87,8 +87,12 @@ struct tegra_drm_client {
 	struct list_head list;
 	struct tegra_drm *drm;
 
+	/* Set by driver */
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
+
+	/* Set by TegraDRM core */
+	struct host1x_channel *shared_channel;
 };
 
 static inline struct tegra_drm_client *
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Implement the non-submission parts of the new UAPI, including
channel management and memory mapping. The UAPI is under the
CONFIG_DRM_TEGRA_STAGING config flag for now.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Set iova_end in both mapping paths
v4:
* New patch, split out from combined UAPI + submit patch.
---
 drivers/gpu/drm/tegra/Makefile    |   1 +
 drivers/gpu/drm/tegra/drm.c       |  41 ++--
 drivers/gpu/drm/tegra/drm.h       |   5 +
 drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
 drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++
 5 files changed, 401 insertions(+), 16 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index d6cf202414f0..0abdb21b38b9 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 tegra-drm-y := \
 	drm.o \
+	uapi/uapi.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index afd3f143c5e0..6a51035ce33f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,6 +20,7 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "uapi.h"
 #include "drm.h"
 #include "gem.h"
 
@@ -33,11 +34,6 @@
 #define CARVEOUT_SZ SZ_64M
 #define CDMA_GATHER_FETCHES_MAX_NB 16383
 
-struct tegra_drm_file {
-	struct idr contexts;
-	struct mutex lock;
-};
-
 static int tegra_atomic_check(struct drm_device *drm,
 			      struct drm_atomic_state *state)
 {
@@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	if (!fpriv)
 		return -ENOMEM;
 
-	idr_init_base(&fpriv->contexts, 1);
+	idr_init_base(&fpriv->legacy_contexts, 1);
+	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
 	mutex_init(&fpriv->lock);
 	filp->driver_priv = fpriv;
 
@@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
 	if (err < 0)
 		return err;
 
-	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
+	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
 	if (err < 0) {
 		client->ops->close_channel(context);
 		return err;
@@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	idr_remove(&fpriv->contexts, context->id);
+	idr_remove(&fpriv->legacy_contexts, context->id);
 	tegra_drm_context_free(context);
 
 unlock:
@@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
 
 static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
 #ifdef CONFIG_DRM_TEGRA_STAGING
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
 			  DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
+			  DRM_RENDER_ALLOW),
+
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
@@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
 	struct tegra_drm_file *fpriv = file->driver_priv;
 
 	mutex_lock(&fpriv->lock);
-	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
+	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
+	tegra_drm_uapi_close_file(fpriv);
 	mutex_unlock(&fpriv->lock);
 
-	idr_destroy(&fpriv->contexts);
+	idr_destroy(&fpriv->legacy_contexts);
 	mutex_destroy(&fpriv->lock);
 	kfree(fpriv);
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 0f38f159aa8e..1af57c2016eb 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -59,6 +59,11 @@ struct tegra_drm {
 	struct tegra_display_hub *hub;
 };
 
+static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
+{
+	return dev_get_drvdata(tegra->drm->dev->parent);
+}
+
 struct tegra_drm_client;
 
 struct tegra_drm_context {
diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
new file mode 100644
index 000000000000..5c422607e8fa
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_H
+#define _TEGRA_DRM_UAPI_H
+
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/xarray.h>
+
+#include <drm/drm.h>
+
+struct drm_file;
+struct drm_device;
+
+struct tegra_drm_file {
+	/* Legacy UAPI state */
+	struct idr legacy_contexts;
+	struct mutex lock;
+
+	/* New UAPI state */
+	struct xarray contexts;
+};
+
+struct tegra_drm_channel_ctx {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct xarray mappings;
+};
+
+struct tegra_drm_mapping {
+	struct kref ref;
+
+	struct device *dev;
+	struct host1x_bo *bo;
+	struct sg_table *sgt;
+	enum dma_data_direction direction;
+	dma_addr_t iova;
+	dma_addr_t iova_end;
+};
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file);
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file);
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file);
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
new file mode 100644
index 000000000000..d503b5e817c4
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/uapi.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
+{
+	struct tegra_drm_channel_ctx *ctx;
+
+	mutex_lock(&file->lock);
+	ctx = xa_load(&file->contexts, id);
+	if (!ctx)
+		mutex_unlock(&file->lock);
+
+	return ctx;
+}
+
+static void tegra_drm_mapping_release(struct kref *ref)
+{
+	struct tegra_drm_mapping *mapping =
+		container_of(ref, struct tegra_drm_mapping, ref);
+
+	if (mapping->sgt)
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+	host1x_bo_put(mapping->bo);
+
+	kfree(mapping);
+}
+
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
+{
+	kref_put(&mapping->ref, tegra_drm_mapping_release);
+}
+
+static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
+{
+	unsigned long mapping_id;
+	struct tegra_drm_mapping *mapping;
+
+	xa_for_each(&ctx->mappings, mapping_id, mapping)
+		tegra_drm_mapping_put(mapping);
+
+	xa_destroy(&ctx->mappings);
+
+	host1x_channel_put(ctx->channel);
+
+	kfree(ctx);
+}
+
+int close_channel_ctx(int id, void *p, void *data)
+{
+	struct tegra_drm_channel_ctx *ctx = p;
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
+{
+	unsigned long ctx_id;
+	struct tegra_drm_channel_ctx *ctx;
+
+	xa_for_each(&file->contexts, ctx_id, ctx)
+		tegra_drm_channel_ctx_close(ctx);
+
+	xa_destroy(&file->contexts);
+}
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_tegra_channel_open *args = data;
+	struct tegra_drm_client *client = NULL;
+	struct tegra_drm_channel_ctx *ctx;
+	int err;
+
+	if (args->flags)
+		return -EINVAL;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = -ENODEV;
+	list_for_each_entry(client, &tegra->clients, list) {
+		if (client->base.class == args->host1x_class) {
+			err = 0;
+			break;
+		}
+	}
+	if (err)
+		goto free_ctx;
+
+	if (client->shared_channel) {
+		ctx->channel = host1x_channel_get(client->shared_channel);
+	} else {
+		ctx->channel = host1x_channel_request(&client->base);
+		if (!ctx->channel) {
+			err = -EBUSY;
+			goto free_ctx;
+		}
+	}
+
+	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto put_channel;
+
+	ctx->client = client;
+	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
+
+	args->hardware_version = client->version;
+
+	return 0;
+
+put_channel:
+	host1x_channel_put(ctx->channel);
+free_ctx:
+	kfree(ctx);
+
+	return err;
+}
+
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_close *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	xa_erase(&fpriv->contexts, args->channel_ctx);
+
+	mutex_unlock(&fpriv->lock);
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_map *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+	struct drm_gem_object *gem;
+	u32 mapping_id;
+	int err = 0;
+
+	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	kref_init(&mapping->ref);
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem) {
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	mapping->dev = ctx->client->base.dev;
+	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
+
+	if (!iommu_get_domain_for_dev(mapping->dev) ||
+	    ctx->client->base.group) {
+		host1x_bo_pin(mapping->dev, mapping->bo,
+			      &mapping->iova);
+	} else {
+		mapping->direction = DMA_TO_DEVICE;
+		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
+			mapping->direction = DMA_BIDIRECTIONAL;
+
+		mapping->sgt =
+			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
+		if (IS_ERR(mapping->sgt)) {
+			err = PTR_ERR(mapping->sgt);
+			goto put_gem;
+		}
+
+		err = dma_map_sgtable(mapping->dev, mapping->sgt,
+				      mapping->direction,
+				      DMA_ATTR_SKIP_CPU_SYNC);
+		if (err)
+			goto unpin;
+
+		/* TODO only map the requested part */
+		mapping->iova = sg_dma_address(mapping->sgt->sgl);
+	}
+
+	mapping->iova_end = mapping->iova + gem->size;
+
+	mutex_unlock(&fpriv->lock);
+
+	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto unmap;
+
+	args->mapping_id = mapping_id;
+
+	return 0;
+
+unmap:
+	if (mapping->sgt) {
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+	}
+unpin:
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+put_gem:
+	drm_gem_object_put(gem);
+	kfree(mapping);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
+
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_unmap *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = xa_erase(&ctx->mappings, args->mapping_id);
+
+	mutex_unlock(&fpriv->lock);
+
+	if (mapping) {
+		tegra_drm_mapping_put(mapping);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+			       struct drm_file *file)
+{
+	struct drm_tegra_gem_create *args = data;
+	struct tegra_bo *bo;
+
+	if (args->flags)
+		return -EINVAL;
+
+	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
+					 &args->handle);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+			     struct drm_file *file)
+{
+	struct drm_tegra_gem_mmap *args = data;
+	struct drm_gem_object *gem;
+	struct tegra_bo *bo;
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem)
+		return -EINVAL;
+
+	bo = to_tegra_bo(gem);
+
+	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
+
+	drm_gem_object_put(gem);
+
+	return 0;
+}
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Implement the non-submission parts of the new UAPI, including
channel management and memory mapping. The UAPI is under the
CONFIG_DRM_TEGRA_STAGING config flag for now.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Set iova_end in both mapping paths
v4:
* New patch, split out from combined UAPI + submit patch.
---
 drivers/gpu/drm/tegra/Makefile    |   1 +
 drivers/gpu/drm/tegra/drm.c       |  41 ++--
 drivers/gpu/drm/tegra/drm.h       |   5 +
 drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
 drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++
 5 files changed, 401 insertions(+), 16 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index d6cf202414f0..0abdb21b38b9 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 
 tegra-drm-y := \
 	drm.o \
+	uapi/uapi.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index afd3f143c5e0..6a51035ce33f 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -20,6 +20,7 @@
 #include <drm/drm_prime.h>
 #include <drm/drm_vblank.h>
 
+#include "uapi.h"
 #include "drm.h"
 #include "gem.h"
 
@@ -33,11 +34,6 @@
 #define CARVEOUT_SZ SZ_64M
 #define CDMA_GATHER_FETCHES_MAX_NB 16383
 
-struct tegra_drm_file {
-	struct idr contexts;
-	struct mutex lock;
-};
-
 static int tegra_atomic_check(struct drm_device *drm,
 			      struct drm_atomic_state *state)
 {
@@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	if (!fpriv)
 		return -ENOMEM;
 
-	idr_init_base(&fpriv->contexts, 1);
+	idr_init_base(&fpriv->legacy_contexts, 1);
+	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
 	mutex_init(&fpriv->lock);
 	filp->driver_priv = fpriv;
 
@@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
 	if (err < 0)
 		return err;
 
-	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
+	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
 	if (err < 0) {
 		client->ops->close_channel(context);
 		return err;
@@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	idr_remove(&fpriv->contexts, context->id);
+	idr_remove(&fpriv->legacy_contexts, context->id);
 	tegra_drm_context_free(context);
 
 unlock:
@@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 
 	mutex_lock(&fpriv->lock);
 
-	context = idr_find(&fpriv->contexts, args->context);
+	context = idr_find(&fpriv->legacy_contexts, args->context);
 	if (!context) {
 		err = -ENODEV;
 		goto unlock;
@@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
 
 static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
 #ifdef CONFIG_DRM_TEGRA_STAGING
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
 			  DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
+			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
+			  DRM_RENDER_ALLOW),
+
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
@@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
 	struct tegra_drm_file *fpriv = file->driver_priv;
 
 	mutex_lock(&fpriv->lock);
-	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
+	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
+	tegra_drm_uapi_close_file(fpriv);
 	mutex_unlock(&fpriv->lock);
 
-	idr_destroy(&fpriv->contexts);
+	idr_destroy(&fpriv->legacy_contexts);
 	mutex_destroy(&fpriv->lock);
 	kfree(fpriv);
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 0f38f159aa8e..1af57c2016eb 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -59,6 +59,11 @@ struct tegra_drm {
 	struct tegra_display_hub *hub;
 };
 
+static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
+{
+	return dev_get_drvdata(tegra->drm->dev->parent);
+}
+
 struct tegra_drm_client;
 
 struct tegra_drm_context {
diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
new file mode 100644
index 000000000000..5c422607e8fa
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi.h
@@ -0,0 +1,63 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_H
+#define _TEGRA_DRM_UAPI_H
+
+#include <linux/dma-mapping.h>
+#include <linux/idr.h>
+#include <linux/kref.h>
+#include <linux/xarray.h>
+
+#include <drm/drm.h>
+
+struct drm_file;
+struct drm_device;
+
+struct tegra_drm_file {
+	/* Legacy UAPI state */
+	struct idr legacy_contexts;
+	struct mutex lock;
+
+	/* New UAPI state */
+	struct xarray contexts;
+};
+
+struct tegra_drm_channel_ctx {
+	struct tegra_drm_client *client;
+	struct host1x_channel *channel;
+	struct xarray mappings;
+};
+
+struct tegra_drm_mapping {
+	struct kref ref;
+
+	struct device *dev;
+	struct host1x_bo *bo;
+	struct sg_table *sgt;
+	enum dma_data_direction direction;
+	dma_addr_t iova;
+	dma_addr_t iova_end;
+};
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file);
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file);
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file);
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+				struct drm_file *file);
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+				struct drm_file *file);
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
new file mode 100644
index 000000000000..d503b5e817c4
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/uapi.c
@@ -0,0 +1,307 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/list.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+
+struct tegra_drm_channel_ctx *
+tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
+{
+	struct tegra_drm_channel_ctx *ctx;
+
+	mutex_lock(&file->lock);
+	ctx = xa_load(&file->contexts, id);
+	if (!ctx)
+		mutex_unlock(&file->lock);
+
+	return ctx;
+}
+
+static void tegra_drm_mapping_release(struct kref *ref)
+{
+	struct tegra_drm_mapping *mapping =
+		container_of(ref, struct tegra_drm_mapping, ref);
+
+	if (mapping->sgt)
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+	host1x_bo_put(mapping->bo);
+
+	kfree(mapping);
+}
+
+void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
+{
+	kref_put(&mapping->ref, tegra_drm_mapping_release);
+}
+
+static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
+{
+	unsigned long mapping_id;
+	struct tegra_drm_mapping *mapping;
+
+	xa_for_each(&ctx->mappings, mapping_id, mapping)
+		tegra_drm_mapping_put(mapping);
+
+	xa_destroy(&ctx->mappings);
+
+	host1x_channel_put(ctx->channel);
+
+	kfree(ctx);
+}
+
+int close_channel_ctx(int id, void *p, void *data)
+{
+	struct tegra_drm_channel_ctx *ctx = p;
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
+{
+	unsigned long ctx_id;
+	struct tegra_drm_channel_ctx *ctx;
+
+	xa_for_each(&file->contexts, ctx_id, ctx)
+		tegra_drm_channel_ctx_close(ctx);
+
+	xa_destroy(&file->contexts);
+}
+
+int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
+				 struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct tegra_drm *tegra = drm->dev_private;
+	struct drm_tegra_channel_open *args = data;
+	struct tegra_drm_client *client = NULL;
+	struct tegra_drm_channel_ctx *ctx;
+	int err;
+
+	if (args->flags)
+		return -EINVAL;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = -ENODEV;
+	list_for_each_entry(client, &tegra->clients, list) {
+		if (client->base.class == args->host1x_class) {
+			err = 0;
+			break;
+		}
+	}
+	if (err)
+		goto free_ctx;
+
+	if (client->shared_channel) {
+		ctx->channel = host1x_channel_get(client->shared_channel);
+	} else {
+		ctx->channel = host1x_channel_request(&client->base);
+		if (!ctx->channel) {
+			err = -EBUSY;
+			goto free_ctx;
+		}
+	}
+
+	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto put_channel;
+
+	ctx->client = client;
+	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
+
+	args->hardware_version = client->version;
+
+	return 0;
+
+put_channel:
+	host1x_channel_put(ctx->channel);
+free_ctx:
+	kfree(ctx);
+
+	return err;
+}
+
+int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_close *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	xa_erase(&fpriv->contexts, args->channel_ctx);
+
+	mutex_unlock(&fpriv->lock);
+
+	tegra_drm_channel_ctx_close(ctx);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
+				struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_map *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+	struct drm_gem_object *gem;
+	u32 mapping_id;
+	int err = 0;
+
+	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
+		return -EINVAL;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
+	if (!mapping) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+
+	kref_init(&mapping->ref);
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem) {
+		err = -EINVAL;
+		goto unlock;
+	}
+
+	mapping->dev = ctx->client->base.dev;
+	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
+
+	if (!iommu_get_domain_for_dev(mapping->dev) ||
+	    ctx->client->base.group) {
+		host1x_bo_pin(mapping->dev, mapping->bo,
+			      &mapping->iova);
+	} else {
+		mapping->direction = DMA_TO_DEVICE;
+		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
+			mapping->direction = DMA_BIDIRECTIONAL;
+
+		mapping->sgt =
+			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
+		if (IS_ERR(mapping->sgt)) {
+			err = PTR_ERR(mapping->sgt);
+			goto put_gem;
+		}
+
+		err = dma_map_sgtable(mapping->dev, mapping->sgt,
+				      mapping->direction,
+				      DMA_ATTR_SKIP_CPU_SYNC);
+		if (err)
+			goto unpin;
+
+		/* TODO only map the requested part */
+		mapping->iova = sg_dma_address(mapping->sgt->sgl);
+	}
+
+	mapping->iova_end = mapping->iova + gem->size;
+
+	mutex_unlock(&fpriv->lock);
+
+	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
+		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
+	if (err < 0)
+		goto unmap;
+
+	args->mapping_id = mapping_id;
+
+	return 0;
+
+unmap:
+	if (mapping->sgt) {
+		dma_unmap_sgtable(mapping->dev, mapping->sgt,
+				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
+	}
+unpin:
+	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
+put_gem:
+	drm_gem_object_put(gem);
+	kfree(mapping);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
+
+int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
+				  struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_unmap *args = data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct tegra_drm_mapping *mapping;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	mapping = xa_erase(&ctx->mappings, args->mapping_id);
+
+	mutex_unlock(&fpriv->lock);
+
+	if (mapping) {
+		tegra_drm_mapping_put(mapping);
+		return 0;
+	} else {
+		return -EINVAL;
+	}
+}
+
+int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
+			       struct drm_file *file)
+{
+	struct drm_tegra_gem_create *args = data;
+	struct tegra_bo *bo;
+
+	if (args->flags)
+		return -EINVAL;
+
+	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
+					 &args->handle);
+	if (IS_ERR(bo))
+		return PTR_ERR(bo);
+
+	return 0;
+}
+
+int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
+			     struct drm_file *file)
+{
+	struct drm_tegra_gem_mmap *args = data;
+	struct drm_gem_object *gem;
+	struct tegra_bo *bo;
+
+	gem = drm_gem_object_lookup(file, args->handle);
+	if (!gem)
+		return -EINVAL;
+
+	bo = to_tegra_bo(gem);
+
+	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
+
+	drm_gem_object_put(gem);
+
+	return 0;
+}
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Implement the job submission IOCTL with a minimum feature set.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Add 16K size limit to copies from userspace.
* Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
  to prevent oversized shift on 32-bit platforms.
v4:
* Remove all features that are not strictly necessary.
* Split into two patches.
v3:
* Remove WRITE_RELOC. Relocations are now patched implicitly
  when patching is needed.
* Directly call PM runtime APIs on devices instead of using
  power_on/power_off callbacks.
* Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
* Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
* Accommodate for removal of timeout field and inlining of
  syncpt_incrs array.
* Copy entire user arrays at a time instead of going through
  elements one-by-one.
* Implement waiting of DMA reservations.
* Split out gather_bo implementation into a separate file.
* Fix length parameter passed to sg_init_one in gather_bo
* Cosmetic cleanup.
---
 drivers/gpu/drm/tegra/Makefile         |   2 +
 drivers/gpu/drm/tegra/drm.c            |   2 +
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
 drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
 6 files changed, 557 insertions(+)
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 0abdb21b38b9..059322e88943 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 tegra-drm-y := \
 	drm.o \
 	uapi/uapi.o \
+	uapi/submit.o \
+	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 6a51035ce33f..60eab403ae9b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
+			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
new file mode 100644
index 000000000000..b487a0d44648
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+
+#include "gather_bo.h"
+
+static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_get(&bo->ref);
+
+	return host_bo;
+}
+
+static void gather_bo_release(struct kref *ref)
+{
+	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
+
+	kfree(bo->gather_data);
+	kfree(bo);
+}
+
+void gather_bo_put(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_put(&bo->ref, gather_bo_release);
+}
+
+static struct sg_table *
+gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+	struct sg_table *sgt;
+	int err;
+
+	if (phys) {
+		*phys = virt_to_phys(bo->gather_data);
+		return NULL;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (err) {
+		kfree(sgt);
+		return ERR_PTR(err);
+	}
+
+	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
+
+	return sgt;
+}
+
+static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
+{
+	if (sgt) {
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
+}
+
+static void *gather_bo_mmap(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	return bo->gather_data;
+}
+
+static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
+{
+}
+
+const struct host1x_bo_ops gather_bo_ops = {
+	.get = gather_bo_get,
+	.put = gather_bo_put,
+	.pin = gather_bo_pin,
+	.unpin = gather_bo_unpin,
+	.mmap = gather_bo_mmap,
+	.munmap = gather_bo_munmap,
+};
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
new file mode 100644
index 000000000000..6b4c9d83ac91
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
+#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
+
+#include <linux/host1x.h>
+#include <linux/kref.h>
+
+struct gather_bo {
+	struct host1x_bo base;
+
+	struct kref ref;
+
+	u32 *gather_data;
+	size_t gather_data_words;
+};
+
+extern const struct host1x_bo_ops gather_bo_ops;
+void gather_bo_put(struct host1x_bo *host_bo);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
new file mode 100644
index 000000000000..398be3065e21
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -0,0 +1,428 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/dma-fence-array.h>
+#include <linux/file.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/nospec.h>
+#include <linux/pm_runtime.h>
+#include <linux/sync_file.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+#include "../gem.h"
+
+#include "gather_bo.h"
+#include "submit.h"
+
+static struct tegra_drm_mapping *
+tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
+{
+	struct tegra_drm_mapping *mapping;
+
+	xa_lock(&ctx->mappings);
+	mapping = xa_load(&ctx->mappings, id);
+	if (mapping)
+		kref_get(&mapping->ref);
+	xa_unlock(&ctx->mappings);
+
+	return mapping;
+}
+
+static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
+{
+	unsigned long copy_err;
+	size_t copy_len;
+	void *data;
+
+	if (check_mul_overflow(count, size, &copy_len))
+		return ERR_PTR(-EINVAL);
+
+	if (copy_len > 0x4000)
+		return ERR_PTR(-E2BIG);
+
+	data = kvmalloc(copy_len, GFP_KERNEL);
+	if (!data)
+		return ERR_PTR(-ENOMEM);
+
+	copy_err = copy_from_user(data, from, copy_len);
+	if (copy_err) {
+		kvfree(data);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return data;
+}
+
+static int submit_copy_gather_data(struct drm_device *drm,
+				   struct gather_bo **pbo,
+				   struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+	struct gather_bo *bo;
+	size_t copy_len;
+
+	if (args->gather_data_words == 0) {
+		drm_info(drm, "gather_data_words can't be 0");
+		return -EINVAL;
+	}
+
+	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
+		return -EINVAL;
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	kref_init(&bo->ref);
+	host1x_bo_init(&bo->base, &gather_bo_ops);
+
+	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
+	if (!bo->gather_data) {
+		kfree(bo);
+		return -ENOMEM;
+	}
+
+	copy_err = copy_from_user(bo->gather_data,
+				  u64_to_user_ptr(args->gather_data_ptr),
+				  copy_len);
+	if (copy_err) {
+		kfree(bo->gather_data);
+		kfree(bo);
+		return -EFAULT;
+	}
+
+	bo->gather_data_words = args->gather_data_words;
+
+	*pbo = bo;
+
+	return 0;
+}
+
+static int submit_write_reloc(struct gather_bo *bo,
+			      struct drm_tegra_submit_buf *buf,
+			      struct tegra_drm_mapping *mapping)
+{
+	/* TODO check that target_offset is within bounds */
+	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
+	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
+
+#ifdef CONFIG_ARM64
+	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
+		written_ptr |= BIT(39);
+#endif
+
+	if (buf->reloc.gather_offset_words >= bo->gather_data_words)
+		return -EINVAL;
+
+	buf->reloc.gather_offset_words = array_index_nospec(
+		buf->reloc.gather_offset_words, bo->gather_data_words);
+
+	bo->gather_data[buf->reloc.gather_offset_words] = written_ptr;
+
+	return 0;
+}
+
+static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
+			       struct tegra_drm_submit_data *job_data,
+			       struct tegra_drm_channel_ctx *ctx,
+			       struct drm_tegra_channel_submit *args)
+{
+	struct tegra_drm_used_mapping *mappings;
+	struct drm_tegra_submit_buf *bufs;
+	int err;
+	u32 i;
+
+	bufs = alloc_copy_user_array(u64_to_user_ptr(args->bufs_ptr),
+				     args->num_bufs, sizeof(*bufs));
+	if (IS_ERR(bufs))
+		return PTR_ERR(bufs);
+
+	mappings = kcalloc(args->num_bufs, sizeof(*mappings), GFP_KERNEL);
+	if (!mappings) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	for (i = 0; i < args->num_bufs; i++) {
+		struct drm_tegra_submit_buf *buf = &bufs[i];
+		struct tegra_drm_mapping *mapping;
+
+		if (buf->flags & ~DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		mapping = tegra_drm_mapping_get(ctx, buf->mapping_id);
+		if (!mapping) {
+			drm_info(drm, "invalid mapping_id for buf: %u",
+				 buf->mapping_id);
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		err = submit_write_reloc(bo, buf, mapping);
+		if (err) {
+			tegra_drm_mapping_put(mapping);
+			goto drop_refs;
+		}
+
+		mappings[i].mapping = mapping;
+		mappings[i].flags = buf->flags;
+	}
+
+	job_data->used_mappings = mappings;
+	job_data->num_used_mappings = i;
+
+	err = 0;
+
+	goto done;
+
+drop_refs:
+	for (;;) {
+		if (i-- == 0)
+			break;
+
+		tegra_drm_mapping_put(mappings[i].mapping);
+	}
+
+	kfree(mappings);
+	job_data->used_mappings = NULL;
+
+done:
+	kvfree(bufs);
+
+	return err;
+}
+
+static int submit_get_syncpt(struct drm_device *drm, struct host1x_job *job,
+			     struct drm_tegra_channel_submit *args)
+{
+	struct host1x_syncpt *sp;
+
+	if (args->syncpt_incr.flags)
+		return -EINVAL;
+
+	/* Syncpt ref will be dropped on job release */
+	sp = host1x_syncpt_fd_get(args->syncpt_incr.syncpt_fd);
+	if (IS_ERR(sp))
+		return PTR_ERR(sp);
+
+	job->syncpt = sp;
+	job->syncpt_incrs = args->syncpt_incr.num_incrs;
+
+	return 0;
+}
+
+static int submit_job_add_gather(struct host1x_job *job,
+				 struct tegra_drm_channel_ctx *ctx,
+				 struct drm_tegra_submit_cmd_gather_uptr *cmd,
+				 struct gather_bo *bo, u32 *offset,
+				 struct tegra_drm_submit_data *job_data)
+{
+	u32 next_offset;
+
+	if (cmd->reserved[0] || cmd->reserved[1] || cmd->reserved[2])
+		return -EINVAL;
+
+	/* Check for maximum gather size */
+	if (cmd->words > 16383)
+		return -EINVAL;
+
+	if (check_add_overflow(*offset, cmd->words, &next_offset))
+		return -EINVAL;
+
+	if (next_offset > bo->gather_data_words)
+		return -EINVAL;
+
+	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
+
+	*offset = next_offset;
+
+	return 0;
+}
+
+static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
+			     struct gather_bo *bo,
+			     struct tegra_drm_channel_ctx *ctx,
+			     struct drm_tegra_channel_submit *args,
+			     struct tegra_drm_submit_data *job_data)
+{
+	struct drm_tegra_submit_cmd *cmds;
+	u32 i, gather_offset = 0;
+	struct host1x_job *job;
+	int err;
+
+	cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr),
+				     args->num_cmds, sizeof(*cmds));
+	if (IS_ERR(cmds))
+		return PTR_ERR(cmds);
+
+	job = host1x_job_alloc(ctx->channel, args->num_cmds, 0);
+	if (!job) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = submit_get_syncpt(drm, job, args);
+	if (err < 0)
+		goto free_job;
+
+	job->client = &ctx->client->base;
+	job->class = ctx->client->base.class;
+	job->serialize = true;
+
+	for (i = 0; i < args->num_cmds; i++) {
+		struct drm_tegra_submit_cmd *cmd = &cmds[i];
+
+		if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
+			err = submit_job_add_gather(job, ctx, &cmd->gather_uptr,
+						    bo, &gather_offset,
+						    job_data);
+			if (err)
+				goto free_job;
+		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
+			if (cmd->wait_syncpt.reserved[0] ||
+			    cmd->wait_syncpt.reserved[1]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_wait(job, cmd->wait_syncpt.id,
+					    cmd->wait_syncpt.threshold);
+		} else {
+			err = -EINVAL;
+			goto free_job;
+		}
+	}
+
+	if (gather_offset == 0) {
+		drm_info(drm, "Job must have at least one gather");
+		err = -EINVAL;
+		goto free_job;
+	}
+
+	*pjob = job;
+
+	err = 0;
+	goto done;
+
+free_job:
+	host1x_job_put(job);
+
+done:
+	kvfree(cmds);
+
+	return err;
+}
+
+static void release_job(struct host1x_job *job)
+{
+	struct tegra_drm_client *client =
+		container_of(job->client, struct tegra_drm_client, base);
+	struct tegra_drm_submit_data *job_data = job->user_data;
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++)
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+
+	kfree(job_data->used_mappings);
+	kfree(job_data);
+
+	pm_runtime_put_autosuspend(client->base.dev);
+}
+
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_submit *args = data;
+	struct tegra_drm_submit_data *job_data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct host1x_job *job;
+	struct gather_bo *bo;
+	u32 i;
+	int err;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	/* Allocate gather BO and copy gather words in. */
+	err = submit_copy_gather_data(drm, &bo, args);
+	if (err)
+		goto unlock;
+
+	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+	if (!job_data) {
+		err = -ENOMEM;
+		goto put_bo;
+	}
+
+	/* Get data buffer mappings and do relocation patching. */
+	err = submit_process_bufs(drm, bo, job_data, ctx, args);
+	if (err)
+		goto free_job_data;
+
+	/* Allocate host1x_job and add gathers and waits to it. */
+	err = submit_create_job(drm, &job, bo, ctx, args,
+				job_data);
+	if (err)
+		goto free_job_data;
+
+	/* Map gather data for Host1x. */
+	err = host1x_job_pin(job, ctx->client->base.dev);
+	if (err)
+		goto put_job;
+
+	/* Boot engine. */
+	err = pm_runtime_get_sync(ctx->client->base.dev);
+	if (err < 0)
+		goto put_pm_runtime;
+
+	job->user_data = job_data;
+	job->release = release_job;
+	job->timeout = 10000;
+
+	/*
+	 * job_data is now part of job reference counting, so don't release
+	 * it from here.
+	 */
+	job_data = NULL;
+
+	/* Submit job to hardware. */
+	err = host1x_job_submit(job);
+	if (err)
+		goto put_job;
+
+	/* Return postfences to userspace and add fences to DMA reservations. */
+	args->syncpt_incr.fence_value = job->syncpt_end;
+
+	goto put_job;
+
+put_pm_runtime:
+	if (!job->release)
+		pm_runtime_put(ctx->client->base.dev);
+	host1x_job_unpin(job);
+put_job:
+	host1x_job_put(job);
+free_job_data:
+	if (job_data && job_data->used_mappings) {
+		for (i = 0; i < job_data->num_used_mappings; i++)
+			tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+		kfree(job_data->used_mappings);
+	}
+	if (job_data)
+		kfree(job_data);
+put_bo:
+	gather_bo_put(&bo->base);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
new file mode 100644
index 000000000000..0a165e9e4bda
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_SUBMIT_H
+#define _TEGRA_DRM_UAPI_SUBMIT_H
+
+struct tegra_drm_used_mapping {
+	struct tegra_drm_mapping *mapping;
+	u32 flags;
+};
+
+struct tegra_drm_submit_data {
+	struct tegra_drm_used_mapping *used_mappings;
+	u32 num_used_mappings;
+};
+
+#endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Implement the job submission IOCTL with a minimum feature set.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Add 16K size limit to copies from userspace.
* Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
  to prevent oversized shift on 32-bit platforms.
v4:
* Remove all features that are not strictly necessary.
* Split into two patches.
v3:
* Remove WRITE_RELOC. Relocations are now patched implicitly
  when patching is needed.
* Directly call PM runtime APIs on devices instead of using
  power_on/power_off callbacks.
* Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
* Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
* Accommodate for removal of timeout field and inlining of
  syncpt_incrs array.
* Copy entire user arrays at a time instead of going through
  elements one-by-one.
* Implement waiting of DMA reservations.
* Split out gather_bo implementation into a separate file.
* Fix length parameter passed to sg_init_one in gather_bo
* Cosmetic cleanup.
---
 drivers/gpu/drm/tegra/Makefile         |   2 +
 drivers/gpu/drm/tegra/drm.c            |   2 +
 drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
 drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
 drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
 6 files changed, 557 insertions(+)
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
 create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 0abdb21b38b9..059322e88943 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 tegra-drm-y := \
 	drm.o \
 	uapi/uapi.o \
+	uapi/submit.o \
+	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
 	dp.o \
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 6a51035ce33f..60eab403ae9b 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
 			  DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
+			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
 			  DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
new file mode 100644
index 000000000000..b487a0d44648
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
@@ -0,0 +1,86 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/scatterlist.h>
+#include <linux/slab.h>
+
+#include "gather_bo.h"
+
+static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_get(&bo->ref);
+
+	return host_bo;
+}
+
+static void gather_bo_release(struct kref *ref)
+{
+	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
+
+	kfree(bo->gather_data);
+	kfree(bo);
+}
+
+void gather_bo_put(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	kref_put(&bo->ref, gather_bo_release);
+}
+
+static struct sg_table *
+gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+	struct sg_table *sgt;
+	int err;
+
+	if (phys) {
+		*phys = virt_to_phys(bo->gather_data);
+		return NULL;
+	}
+
+	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
+	if (!sgt)
+		return ERR_PTR(-ENOMEM);
+
+	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
+	if (err) {
+		kfree(sgt);
+		return ERR_PTR(err);
+	}
+
+	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
+
+	return sgt;
+}
+
+static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
+{
+	if (sgt) {
+		sg_free_table(sgt);
+		kfree(sgt);
+	}
+}
+
+static void *gather_bo_mmap(struct host1x_bo *host_bo)
+{
+	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
+
+	return bo->gather_data;
+}
+
+static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
+{
+}
+
+const struct host1x_bo_ops gather_bo_ops = {
+	.get = gather_bo_get,
+	.put = gather_bo_put,
+	.pin = gather_bo_pin,
+	.unpin = gather_bo_unpin,
+	.mmap = gather_bo_mmap,
+	.munmap = gather_bo_munmap,
+};
diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
new file mode 100644
index 000000000000..6b4c9d83ac91
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
@@ -0,0 +1,22 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
+#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
+
+#include <linux/host1x.h>
+#include <linux/kref.h>
+
+struct gather_bo {
+	struct host1x_bo base;
+
+	struct kref ref;
+
+	u32 *gather_data;
+	size_t gather_data_words;
+};
+
+extern const struct host1x_bo_ops gather_bo_ops;
+void gather_bo_put(struct host1x_bo *host_bo);
+
+#endif
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
new file mode 100644
index 000000000000..398be3065e21
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -0,0 +1,428 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#include <linux/dma-fence-array.h>
+#include <linux/file.h>
+#include <linux/host1x.h>
+#include <linux/iommu.h>
+#include <linux/kref.h>
+#include <linux/list.h>
+#include <linux/nospec.h>
+#include <linux/pm_runtime.h>
+#include <linux/sync_file.h>
+
+#include <drm/drm_drv.h>
+#include <drm/drm_file.h>
+
+#include "../uapi.h"
+#include "../drm.h"
+#include "../gem.h"
+
+#include "gather_bo.h"
+#include "submit.h"
+
+static struct tegra_drm_mapping *
+tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
+{
+	struct tegra_drm_mapping *mapping;
+
+	xa_lock(&ctx->mappings);
+	mapping = xa_load(&ctx->mappings, id);
+	if (mapping)
+		kref_get(&mapping->ref);
+	xa_unlock(&ctx->mappings);
+
+	return mapping;
+}
+
+static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
+{
+	unsigned long copy_err;
+	size_t copy_len;
+	void *data;
+
+	if (check_mul_overflow(count, size, &copy_len))
+		return ERR_PTR(-EINVAL);
+
+	if (copy_len > 0x4000)
+		return ERR_PTR(-E2BIG);
+
+	data = kvmalloc(copy_len, GFP_KERNEL);
+	if (!data)
+		return ERR_PTR(-ENOMEM);
+
+	copy_err = copy_from_user(data, from, copy_len);
+	if (copy_err) {
+		kvfree(data);
+		return ERR_PTR(-EFAULT);
+	}
+
+	return data;
+}
+
+static int submit_copy_gather_data(struct drm_device *drm,
+				   struct gather_bo **pbo,
+				   struct drm_tegra_channel_submit *args)
+{
+	unsigned long copy_err;
+	struct gather_bo *bo;
+	size_t copy_len;
+
+	if (args->gather_data_words == 0) {
+		drm_info(drm, "gather_data_words can't be 0");
+		return -EINVAL;
+	}
+
+	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
+		return -EINVAL;
+
+	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
+	if (!bo)
+		return -ENOMEM;
+
+	kref_init(&bo->ref);
+	host1x_bo_init(&bo->base, &gather_bo_ops);
+
+	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
+	if (!bo->gather_data) {
+		kfree(bo);
+		return -ENOMEM;
+	}
+
+	copy_err = copy_from_user(bo->gather_data,
+				  u64_to_user_ptr(args->gather_data_ptr),
+				  copy_len);
+	if (copy_err) {
+		kfree(bo->gather_data);
+		kfree(bo);
+		return -EFAULT;
+	}
+
+	bo->gather_data_words = args->gather_data_words;
+
+	*pbo = bo;
+
+	return 0;
+}
+
+static int submit_write_reloc(struct gather_bo *bo,
+			      struct drm_tegra_submit_buf *buf,
+			      struct tegra_drm_mapping *mapping)
+{
+	/* TODO check that target_offset is within bounds */
+	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
+	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
+
+#ifdef CONFIG_ARM64
+	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
+		written_ptr |= BIT(39);
+#endif
+
+	if (buf->reloc.gather_offset_words >= bo->gather_data_words)
+		return -EINVAL;
+
+	buf->reloc.gather_offset_words = array_index_nospec(
+		buf->reloc.gather_offset_words, bo->gather_data_words);
+
+	bo->gather_data[buf->reloc.gather_offset_words] = written_ptr;
+
+	return 0;
+}
+
+static int submit_process_bufs(struct drm_device *drm, struct gather_bo *bo,
+			       struct tegra_drm_submit_data *job_data,
+			       struct tegra_drm_channel_ctx *ctx,
+			       struct drm_tegra_channel_submit *args)
+{
+	struct tegra_drm_used_mapping *mappings;
+	struct drm_tegra_submit_buf *bufs;
+	int err;
+	u32 i;
+
+	bufs = alloc_copy_user_array(u64_to_user_ptr(args->bufs_ptr),
+				     args->num_bufs, sizeof(*bufs));
+	if (IS_ERR(bufs))
+		return PTR_ERR(bufs);
+
+	mappings = kcalloc(args->num_bufs, sizeof(*mappings), GFP_KERNEL);
+	if (!mappings) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	for (i = 0; i < args->num_bufs; i++) {
+		struct drm_tegra_submit_buf *buf = &bufs[i];
+		struct tegra_drm_mapping *mapping;
+
+		if (buf->flags & ~DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR) {
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		mapping = tegra_drm_mapping_get(ctx, buf->mapping_id);
+		if (!mapping) {
+			drm_info(drm, "invalid mapping_id for buf: %u",
+				 buf->mapping_id);
+			err = -EINVAL;
+			goto drop_refs;
+		}
+
+		err = submit_write_reloc(bo, buf, mapping);
+		if (err) {
+			tegra_drm_mapping_put(mapping);
+			goto drop_refs;
+		}
+
+		mappings[i].mapping = mapping;
+		mappings[i].flags = buf->flags;
+	}
+
+	job_data->used_mappings = mappings;
+	job_data->num_used_mappings = i;
+
+	err = 0;
+
+	goto done;
+
+drop_refs:
+	for (;;) {
+		if (i-- == 0)
+			break;
+
+		tegra_drm_mapping_put(mappings[i].mapping);
+	}
+
+	kfree(mappings);
+	job_data->used_mappings = NULL;
+
+done:
+	kvfree(bufs);
+
+	return err;
+}
+
+static int submit_get_syncpt(struct drm_device *drm, struct host1x_job *job,
+			     struct drm_tegra_channel_submit *args)
+{
+	struct host1x_syncpt *sp;
+
+	if (args->syncpt_incr.flags)
+		return -EINVAL;
+
+	/* Syncpt ref will be dropped on job release */
+	sp = host1x_syncpt_fd_get(args->syncpt_incr.syncpt_fd);
+	if (IS_ERR(sp))
+		return PTR_ERR(sp);
+
+	job->syncpt = sp;
+	job->syncpt_incrs = args->syncpt_incr.num_incrs;
+
+	return 0;
+}
+
+static int submit_job_add_gather(struct host1x_job *job,
+				 struct tegra_drm_channel_ctx *ctx,
+				 struct drm_tegra_submit_cmd_gather_uptr *cmd,
+				 struct gather_bo *bo, u32 *offset,
+				 struct tegra_drm_submit_data *job_data)
+{
+	u32 next_offset;
+
+	if (cmd->reserved[0] || cmd->reserved[1] || cmd->reserved[2])
+		return -EINVAL;
+
+	/* Check for maximum gather size */
+	if (cmd->words > 16383)
+		return -EINVAL;
+
+	if (check_add_overflow(*offset, cmd->words, &next_offset))
+		return -EINVAL;
+
+	if (next_offset > bo->gather_data_words)
+		return -EINVAL;
+
+	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
+
+	*offset = next_offset;
+
+	return 0;
+}
+
+static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
+			     struct gather_bo *bo,
+			     struct tegra_drm_channel_ctx *ctx,
+			     struct drm_tegra_channel_submit *args,
+			     struct tegra_drm_submit_data *job_data)
+{
+	struct drm_tegra_submit_cmd *cmds;
+	u32 i, gather_offset = 0;
+	struct host1x_job *job;
+	int err;
+
+	cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr),
+				     args->num_cmds, sizeof(*cmds));
+	if (IS_ERR(cmds))
+		return PTR_ERR(cmds);
+
+	job = host1x_job_alloc(ctx->channel, args->num_cmds, 0);
+	if (!job) {
+		err = -ENOMEM;
+		goto done;
+	}
+
+	err = submit_get_syncpt(drm, job, args);
+	if (err < 0)
+		goto free_job;
+
+	job->client = &ctx->client->base;
+	job->class = ctx->client->base.class;
+	job->serialize = true;
+
+	for (i = 0; i < args->num_cmds; i++) {
+		struct drm_tegra_submit_cmd *cmd = &cmds[i];
+
+		if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
+			err = submit_job_add_gather(job, ctx, &cmd->gather_uptr,
+						    bo, &gather_offset,
+						    job_data);
+			if (err)
+				goto free_job;
+		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
+			if (cmd->wait_syncpt.reserved[0] ||
+			    cmd->wait_syncpt.reserved[1]) {
+				err = -EINVAL;
+				goto free_job;
+			}
+
+			host1x_job_add_wait(job, cmd->wait_syncpt.id,
+					    cmd->wait_syncpt.threshold);
+		} else {
+			err = -EINVAL;
+			goto free_job;
+		}
+	}
+
+	if (gather_offset == 0) {
+		drm_info(drm, "Job must have at least one gather");
+		err = -EINVAL;
+		goto free_job;
+	}
+
+	*pjob = job;
+
+	err = 0;
+	goto done;
+
+free_job:
+	host1x_job_put(job);
+
+done:
+	kvfree(cmds);
+
+	return err;
+}
+
+static void release_job(struct host1x_job *job)
+{
+	struct tegra_drm_client *client =
+		container_of(job->client, struct tegra_drm_client, base);
+	struct tegra_drm_submit_data *job_data = job->user_data;
+	u32 i;
+
+	for (i = 0; i < job_data->num_used_mappings; i++)
+		tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+
+	kfree(job_data->used_mappings);
+	kfree(job_data);
+
+	pm_runtime_put_autosuspend(client->base.dev);
+}
+
+int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
+				   struct drm_file *file)
+{
+	struct tegra_drm_file *fpriv = file->driver_priv;
+	struct drm_tegra_channel_submit *args = data;
+	struct tegra_drm_submit_data *job_data;
+	struct tegra_drm_channel_ctx *ctx;
+	struct host1x_job *job;
+	struct gather_bo *bo;
+	u32 i;
+	int err;
+
+	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
+	if (!ctx)
+		return -EINVAL;
+
+	/* Allocate gather BO and copy gather words in. */
+	err = submit_copy_gather_data(drm, &bo, args);
+	if (err)
+		goto unlock;
+
+	job_data = kzalloc(sizeof(*job_data), GFP_KERNEL);
+	if (!job_data) {
+		err = -ENOMEM;
+		goto put_bo;
+	}
+
+	/* Get data buffer mappings and do relocation patching. */
+	err = submit_process_bufs(drm, bo, job_data, ctx, args);
+	if (err)
+		goto free_job_data;
+
+	/* Allocate host1x_job and add gathers and waits to it. */
+	err = submit_create_job(drm, &job, bo, ctx, args,
+				job_data);
+	if (err)
+		goto free_job_data;
+
+	/* Map gather data for Host1x. */
+	err = host1x_job_pin(job, ctx->client->base.dev);
+	if (err)
+		goto put_job;
+
+	/* Boot engine. */
+	err = pm_runtime_get_sync(ctx->client->base.dev);
+	if (err < 0)
+		goto put_pm_runtime;
+
+	job->user_data = job_data;
+	job->release = release_job;
+	job->timeout = 10000;
+
+	/*
+	 * job_data is now part of job reference counting, so don't release
+	 * it from here.
+	 */
+	job_data = NULL;
+
+	/* Submit job to hardware. */
+	err = host1x_job_submit(job);
+	if (err)
+		goto put_job;
+
+	/* Return postfences to userspace and add fences to DMA reservations. */
+	args->syncpt_incr.fence_value = job->syncpt_end;
+
+	goto put_job;
+
+put_pm_runtime:
+	if (!job->release)
+		pm_runtime_put(ctx->client->base.dev);
+	host1x_job_unpin(job);
+put_job:
+	host1x_job_put(job);
+free_job_data:
+	if (job_data && job_data->used_mappings) {
+		for (i = 0; i < job_data->num_used_mappings; i++)
+			tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
+		kfree(job_data->used_mappings);
+	}
+	if (job_data)
+		kfree(job_data);
+put_bo:
+	gather_bo_put(&bo->base);
+unlock:
+	mutex_unlock(&fpriv->lock);
+	return err;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
new file mode 100644
index 000000000000..0a165e9e4bda
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/* Copyright (c) 2020 NVIDIA Corporation */
+
+#ifndef _TEGRA_DRM_UAPI_SUBMIT_H
+#define _TEGRA_DRM_UAPI_SUBMIT_H
+
+struct tegra_drm_used_mapping {
+	struct tegra_drm_mapping *mapping;
+	u32 flags;
+};
+
+struct tegra_drm_submit_data {
+	struct tegra_drm_used_mapping *used_mappings;
+	u32 num_used_mappings;
+};
+
+#endif
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 21/21] drm/tegra: Add job firewall
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-11 13:00   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman, Mikko Perttunen

Add a firewall that validates jobs before submission to ensure
they don't do anything they aren't allowed to do, like accessing
memory they should not access.

The firewall is functionality-wise a copy of the firewall already
implemented in gpu/host1x. It is copied here as it makes more
sense for it to live on the DRM side, as it is only needed for
userspace job submissions, and generally the data it needs to
do its job is easier to access here.

In the future, the other implementation will be removed.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Support SETCLASS opcode
v3:
* New patch
---
 drivers/gpu/drm/tegra/Makefile        |   1 +
 drivers/gpu/drm/tegra/uapi/firewall.c | 221 ++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.c   |  14 +-
 drivers/gpu/drm/tegra/uapi/submit.h   |   4 +
 4 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 059322e88943..4e3295f436f1 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -5,6 +5,7 @@ tegra-drm-y := \
 	drm.o \
 	uapi/uapi.o \
 	uapi/submit.o \
+	uapi/firewall.o \
 	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
diff --git a/drivers/gpu/drm/tegra/uapi/firewall.c b/drivers/gpu/drm/tegra/uapi/firewall.c
new file mode 100644
index 000000000000..57427c2d23fa
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/firewall.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2010-2020 NVIDIA Corporation */
+
+#include "../drm.h"
+#include "../uapi.h"
+
+#include "submit.h"
+
+struct tegra_drm_firewall {
+	struct tegra_drm_submit_data *submit;
+	struct tegra_drm_client *client;
+	u32 *data;
+	u32 pos;
+	u32 end;
+	u32 class;
+};
+
+static int fw_next(struct tegra_drm_firewall *fw, u32 *word)
+{
+	if (fw->pos == fw->end)
+		return -EINVAL;
+
+	*word = fw->data[fw->pos++];
+
+	return 0;
+}
+
+static bool fw_check_addr_valid(struct tegra_drm_firewall *fw, u32 offset)
+{
+	u32 i;
+
+	for (i = 0; i < fw->submit->num_used_mappings; i++) {
+		struct tegra_drm_mapping *m = fw->submit->used_mappings[i].mapping;
+
+		if (offset >= m->iova && offset <= m->iova_end)
+			return true;
+	}
+
+	return false;
+}
+
+static int fw_check_reg(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+	u32 word;
+	int err;
+
+	err = fw_next(fw, &word);
+	if (err)
+		return err;
+
+	if (!fw->client->ops->is_addr_reg)
+		return 0;
+
+	is_addr = fw->client->ops->is_addr_reg(fw->client->base.dev, fw->class,
+					       offset);
+
+	if (!is_addr)
+		return 0;
+
+	if (!fw_check_addr_valid(fw, word))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int fw_check_regs_seq(struct tegra_drm_firewall *fw, u32 offset,
+			     u32 count, bool incr)
+{
+	u32 i;
+
+	for (i = 0; i < count; i++) {
+		if (fw_check_reg(fw, offset))
+			return -EINVAL;
+
+		if (incr)
+			offset++;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_mask(struct tegra_drm_firewall *fw, u32 offset,
+			      u16 mask)
+{
+	unsigned long bmask = mask;
+	unsigned int bit;
+
+	for_each_set_bit(bit, &bmask, 16) {
+		if (fw_check_reg(fw, offset+bit))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_imm(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+
+	is_addr = fw->client->ops->is_addr_reg(fw->client->base.dev, fw->class,
+					       offset);
+	if (is_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int fw_check_class(struct tegra_drm_firewall *fw, u32 class)
+{
+	if (!fw->client->ops->is_valid_class)
+		return -EINVAL;
+
+	if (!fw->client->ops->is_valid_class(class))
+		return -EINVAL;
+
+	return 0;
+}
+
+enum {
+        HOST1X_OPCODE_SETCLASS  = 0x00,
+        HOST1X_OPCODE_INCR      = 0x01,
+        HOST1X_OPCODE_NONINCR   = 0x02,
+        HOST1X_OPCODE_MASK      = 0x03,
+        HOST1X_OPCODE_IMM       = 0x04,
+        HOST1X_OPCODE_RESTART   = 0x05,
+        HOST1X_OPCODE_GATHER    = 0x06,
+        HOST1X_OPCODE_SETSTRMID = 0x07,
+        HOST1X_OPCODE_SETAPPID  = 0x08,
+        HOST1X_OPCODE_SETPYLD   = 0x09,
+        HOST1X_OPCODE_INCR_W    = 0x0a,
+        HOST1X_OPCODE_NONINCR_W = 0x0b,
+        HOST1X_OPCODE_GATHER_W  = 0x0c,
+        HOST1X_OPCODE_RESTART_W = 0x0d,
+        HOST1X_OPCODE_EXTEND    = 0x0e,
+};
+
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit,
+			  u32 *job_class)
+{
+	struct tegra_drm_firewall fw = {
+		.submit = submit,
+		.client = client,
+		.data = data,
+		.pos = start,
+		.end = start+words,
+		.class = *job_class,
+	};
+	bool payload_valid = false;
+	u32 payload;
+	int err;
+
+	while (fw.pos != fw.end) {
+		u32 word, opcode, offset, count, mask, class;
+
+		err = fw_next(&fw, &word);
+		if (err)
+			return err;
+
+		opcode = (word & 0xf0000000) >> 28;
+
+		switch (opcode) {
+		case HOST1X_OPCODE_SETCLASS:
+			offset = word >> 16 & 0xfff;
+			mask = word & 0x3f;
+			class = (word >> 6) & 0x3ff;
+			err = fw_check_class(&fw, class);
+			fw.class = class;
+			*job_class = class;
+			if (!err)
+				err = fw_check_regs_mask(&fw, offset, mask);
+			break;
+		case HOST1X_OPCODE_INCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, true);
+			break;
+		case HOST1X_OPCODE_NONINCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, false);
+			break;
+		case HOST1X_OPCODE_MASK:
+			offset = (word >> 16) & 0xfff;
+			mask = word & 0xffff;
+			err = fw_check_regs_mask(&fw, offset, mask);
+			break;
+		case HOST1X_OPCODE_IMM:
+			/* IMM cannot reasonably be used to write a pointer */
+			offset = (word >> 16) & 0xfff;
+			err = fw_check_regs_imm(&fw, offset);
+			break;
+		case HOST1X_OPCODE_SETPYLD:
+			payload = word & 0xffff;
+			payload_valid = true;
+			break;
+		case HOST1X_OPCODE_INCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, true);
+			break;
+		case HOST1X_OPCODE_NONINCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, false);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
index 398be3065e21..8633844ae3d7 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.c
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -224,7 +224,8 @@ static int submit_job_add_gather(struct host1x_job *job,
 				 struct tegra_drm_channel_ctx *ctx,
 				 struct drm_tegra_submit_cmd_gather_uptr *cmd,
 				 struct gather_bo *bo, u32 *offset,
-				 struct tegra_drm_submit_data *job_data)
+				 struct tegra_drm_submit_data *job_data,
+				 u32 *class)
 {
 	u32 next_offset;
 
@@ -241,6 +242,10 @@ static int submit_job_add_gather(struct host1x_job *job,
 	if (next_offset > bo->gather_data_words)
 		return -EINVAL;
 
+	if (tegra_drm_fw_validate(ctx->client, bo->gather_data, *offset,
+				  cmd->words, job_data, class))
+		return -EINVAL;
+
 	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
 
 	*offset = next_offset;
@@ -255,10 +260,13 @@ static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
 			     struct tegra_drm_submit_data *job_data)
 {
 	struct drm_tegra_submit_cmd *cmds;
-	u32 i, gather_offset = 0;
+	u32 i, gather_offset = 0, class;
 	struct host1x_job *job;
 	int err;
 
+	/* Set initial class for firewall. */
+	class = ctx->client->base.class;
+
 	cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr),
 				     args->num_cmds, sizeof(*cmds));
 	if (IS_ERR(cmds))
@@ -284,7 +292,7 @@ static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
 		if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
 			err = submit_job_add_gather(job, ctx, &cmd->gather_uptr,
 						    bo, &gather_offset,
-						    job_data);
+						    job_data, &class);
 			if (err)
 				goto free_job;
 		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
index 0a165e9e4bda..cf6a2f0a29fc 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.h
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -14,4 +14,8 @@ struct tegra_drm_submit_data {
 	u32 num_used_mappings;
 };
 
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit,
+			  u32 *job_class);
+
 #endif
-- 
2.30.0


^ permalink raw reply related	[flat|nested] 195+ messages in thread

* [PATCH v5 21/21] drm/tegra: Add job firewall
@ 2021-01-11 13:00   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-11 13:00 UTC (permalink / raw)
  To: thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

Add a firewall that validates jobs before submission to ensure
they don't do anything they aren't allowed to do, like accessing
memory they should not access.

The firewall is functionality-wise a copy of the firewall already
implemented in gpu/host1x. It is copied here as it makes more
sense for it to live on the DRM side, as it is only needed for
userspace job submissions, and generally the data it needs to
do its job is easier to access here.

In the future, the other implementation will be removed.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
v5:
* Support SETCLASS opcode
v3:
* New patch
---
 drivers/gpu/drm/tegra/Makefile        |   1 +
 drivers/gpu/drm/tegra/uapi/firewall.c | 221 ++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/uapi/submit.c   |  14 +-
 drivers/gpu/drm/tegra/uapi/submit.h   |   4 +
 4 files changed, 237 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/tegra/uapi/firewall.c

diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
index 059322e88943..4e3295f436f1 100644
--- a/drivers/gpu/drm/tegra/Makefile
+++ b/drivers/gpu/drm/tegra/Makefile
@@ -5,6 +5,7 @@ tegra-drm-y := \
 	drm.o \
 	uapi/uapi.o \
 	uapi/submit.o \
+	uapi/firewall.o \
 	uapi/gather_bo.o \
 	gem.o \
 	fb.o \
diff --git a/drivers/gpu/drm/tegra/uapi/firewall.c b/drivers/gpu/drm/tegra/uapi/firewall.c
new file mode 100644
index 000000000000..57427c2d23fa
--- /dev/null
+++ b/drivers/gpu/drm/tegra/uapi/firewall.c
@@ -0,0 +1,221 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2010-2020 NVIDIA Corporation */
+
+#include "../drm.h"
+#include "../uapi.h"
+
+#include "submit.h"
+
+struct tegra_drm_firewall {
+	struct tegra_drm_submit_data *submit;
+	struct tegra_drm_client *client;
+	u32 *data;
+	u32 pos;
+	u32 end;
+	u32 class;
+};
+
+static int fw_next(struct tegra_drm_firewall *fw, u32 *word)
+{
+	if (fw->pos == fw->end)
+		return -EINVAL;
+
+	*word = fw->data[fw->pos++];
+
+	return 0;
+}
+
+static bool fw_check_addr_valid(struct tegra_drm_firewall *fw, u32 offset)
+{
+	u32 i;
+
+	for (i = 0; i < fw->submit->num_used_mappings; i++) {
+		struct tegra_drm_mapping *m = fw->submit->used_mappings[i].mapping;
+
+		if (offset >= m->iova && offset <= m->iova_end)
+			return true;
+	}
+
+	return false;
+}
+
+static int fw_check_reg(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+	u32 word;
+	int err;
+
+	err = fw_next(fw, &word);
+	if (err)
+		return err;
+
+	if (!fw->client->ops->is_addr_reg)
+		return 0;
+
+	is_addr = fw->client->ops->is_addr_reg(fw->client->base.dev, fw->class,
+					       offset);
+
+	if (!is_addr)
+		return 0;
+
+	if (!fw_check_addr_valid(fw, word))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int fw_check_regs_seq(struct tegra_drm_firewall *fw, u32 offset,
+			     u32 count, bool incr)
+{
+	u32 i;
+
+	for (i = 0; i < count; i++) {
+		if (fw_check_reg(fw, offset))
+			return -EINVAL;
+
+		if (incr)
+			offset++;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_mask(struct tegra_drm_firewall *fw, u32 offset,
+			      u16 mask)
+{
+	unsigned long bmask = mask;
+	unsigned int bit;
+
+	for_each_set_bit(bit, &bmask, 16) {
+		if (fw_check_reg(fw, offset+bit))
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int fw_check_regs_imm(struct tegra_drm_firewall *fw, u32 offset)
+{
+	bool is_addr;
+
+	is_addr = fw->client->ops->is_addr_reg(fw->client->base.dev, fw->class,
+					       offset);
+	if (is_addr)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int fw_check_class(struct tegra_drm_firewall *fw, u32 class)
+{
+	if (!fw->client->ops->is_valid_class)
+		return -EINVAL;
+
+	if (!fw->client->ops->is_valid_class(class))
+		return -EINVAL;
+
+	return 0;
+}
+
+enum {
+        HOST1X_OPCODE_SETCLASS  = 0x00,
+        HOST1X_OPCODE_INCR      = 0x01,
+        HOST1X_OPCODE_NONINCR   = 0x02,
+        HOST1X_OPCODE_MASK      = 0x03,
+        HOST1X_OPCODE_IMM       = 0x04,
+        HOST1X_OPCODE_RESTART   = 0x05,
+        HOST1X_OPCODE_GATHER    = 0x06,
+        HOST1X_OPCODE_SETSTRMID = 0x07,
+        HOST1X_OPCODE_SETAPPID  = 0x08,
+        HOST1X_OPCODE_SETPYLD   = 0x09,
+        HOST1X_OPCODE_INCR_W    = 0x0a,
+        HOST1X_OPCODE_NONINCR_W = 0x0b,
+        HOST1X_OPCODE_GATHER_W  = 0x0c,
+        HOST1X_OPCODE_RESTART_W = 0x0d,
+        HOST1X_OPCODE_EXTEND    = 0x0e,
+};
+
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit,
+			  u32 *job_class)
+{
+	struct tegra_drm_firewall fw = {
+		.submit = submit,
+		.client = client,
+		.data = data,
+		.pos = start,
+		.end = start+words,
+		.class = *job_class,
+	};
+	bool payload_valid = false;
+	u32 payload;
+	int err;
+
+	while (fw.pos != fw.end) {
+		u32 word, opcode, offset, count, mask, class;
+
+		err = fw_next(&fw, &word);
+		if (err)
+			return err;
+
+		opcode = (word & 0xf0000000) >> 28;
+
+		switch (opcode) {
+		case HOST1X_OPCODE_SETCLASS:
+			offset = word >> 16 & 0xfff;
+			mask = word & 0x3f;
+			class = (word >> 6) & 0x3ff;
+			err = fw_check_class(&fw, class);
+			fw.class = class;
+			*job_class = class;
+			if (!err)
+				err = fw_check_regs_mask(&fw, offset, mask);
+			break;
+		case HOST1X_OPCODE_INCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, true);
+			break;
+		case HOST1X_OPCODE_NONINCR:
+			offset = (word >> 16) & 0xfff;
+			count = word & 0xffff;
+			err = fw_check_regs_seq(&fw, offset, count, false);
+			break;
+		case HOST1X_OPCODE_MASK:
+			offset = (word >> 16) & 0xfff;
+			mask = word & 0xffff;
+			err = fw_check_regs_mask(&fw, offset, mask);
+			break;
+		case HOST1X_OPCODE_IMM:
+			/* IMM cannot reasonably be used to write a pointer */
+			offset = (word >> 16) & 0xfff;
+			err = fw_check_regs_imm(&fw, offset);
+			break;
+		case HOST1X_OPCODE_SETPYLD:
+			payload = word & 0xffff;
+			payload_valid = true;
+			break;
+		case HOST1X_OPCODE_INCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, true);
+			break;
+		case HOST1X_OPCODE_NONINCR_W:
+			if (!payload_valid)
+				return -EINVAL;
+
+			offset = word & 0x3fffff;
+			err = fw_check_regs_seq(&fw, offset, payload, false);
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
index 398be3065e21..8633844ae3d7 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.c
+++ b/drivers/gpu/drm/tegra/uapi/submit.c
@@ -224,7 +224,8 @@ static int submit_job_add_gather(struct host1x_job *job,
 				 struct tegra_drm_channel_ctx *ctx,
 				 struct drm_tegra_submit_cmd_gather_uptr *cmd,
 				 struct gather_bo *bo, u32 *offset,
-				 struct tegra_drm_submit_data *job_data)
+				 struct tegra_drm_submit_data *job_data,
+				 u32 *class)
 {
 	u32 next_offset;
 
@@ -241,6 +242,10 @@ static int submit_job_add_gather(struct host1x_job *job,
 	if (next_offset > bo->gather_data_words)
 		return -EINVAL;
 
+	if (tegra_drm_fw_validate(ctx->client, bo->gather_data, *offset,
+				  cmd->words, job_data, class))
+		return -EINVAL;
+
 	host1x_job_add_gather(job, &bo->base, cmd->words, *offset * 4);
 
 	*offset = next_offset;
@@ -255,10 +260,13 @@ static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
 			     struct tegra_drm_submit_data *job_data)
 {
 	struct drm_tegra_submit_cmd *cmds;
-	u32 i, gather_offset = 0;
+	u32 i, gather_offset = 0, class;
 	struct host1x_job *job;
 	int err;
 
+	/* Set initial class for firewall. */
+	class = ctx->client->base.class;
+
 	cmds = alloc_copy_user_array(u64_to_user_ptr(args->cmds_ptr),
 				     args->num_cmds, sizeof(*cmds));
 	if (IS_ERR(cmds))
@@ -284,7 +292,7 @@ static int submit_create_job(struct drm_device *drm, struct host1x_job **pjob,
 		if (cmd->type == DRM_TEGRA_SUBMIT_CMD_GATHER_UPTR) {
 			err = submit_job_add_gather(job, ctx, &cmd->gather_uptr,
 						    bo, &gather_offset,
-						    job_data);
+						    job_data, &class);
 			if (err)
 				goto free_job;
 		} else if (cmd->type == DRM_TEGRA_SUBMIT_CMD_WAIT_SYNCPT) {
diff --git a/drivers/gpu/drm/tegra/uapi/submit.h b/drivers/gpu/drm/tegra/uapi/submit.h
index 0a165e9e4bda..cf6a2f0a29fc 100644
--- a/drivers/gpu/drm/tegra/uapi/submit.h
+++ b/drivers/gpu/drm/tegra/uapi/submit.h
@@ -14,4 +14,8 @@ struct tegra_drm_submit_data {
 	u32 num_used_mappings;
 };
 
+int tegra_drm_fw_validate(struct tegra_drm_client *client, u32 *data, u32 start,
+			  u32 words, struct tegra_drm_submit_data *submit,
+			  u32 *job_class);
+
 #endif
-- 
2.30.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-01-11 13:00   ` Mikko Perttunen
  (?)
@ 2021-01-11 17:37     ` kernel test robot
  -1 siblings, 0 replies; 195+ messages in thread
From: kernel test robot @ 2021-01-11 17:37 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, linux-tegra, talho, bhuntsman, dri-devel, Mikko Perttunen

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

Hi Mikko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20210111]
[cannot apply to tegra-drm/drm/tegra/for-next tegra/for-next linus/master v5.11-rc3 v5.11-rc2 v5.11-rc1 v5.11-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
base:    ef8b014ee4a1ccd9e751732690a8c7cdeed945e7
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
        git checkout 2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/tegra/uapi/uapi.c:62:5: warning: no previous prototype for 'close_channel_ctx' [-Wmissing-prototypes]
      62 | int close_channel_ctx(int id, void *p, void *data)
         |     ^~~~~~~~~~~~~~~~~


vim +/close_channel_ctx +62 drivers/gpu/drm/tegra/uapi/uapi.c

    61	
  > 62	int close_channel_ctx(int id, void *p, void *data)
    63	{
    64		struct tegra_drm_channel_ctx *ctx = p;
    65	
    66		tegra_drm_channel_ctx_close(ctx);
    67	
    68		return 0;
    69	}
    70	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 54251 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-01-11 17:37     ` kernel test robot
  0 siblings, 0 replies; 195+ messages in thread
From: kernel test robot @ 2021-01-11 17:37 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, digetx, airlied, daniel
  Cc: kbuild-all, bhuntsman, dri-devel, Mikko Perttunen, talho, linux-tegra

[-- Attachment #1: Type: text/plain, Size: 2040 bytes --]

Hi Mikko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20210111]
[cannot apply to tegra-drm/drm/tegra/for-next tegra/for-next linus/master v5.11-rc3 v5.11-rc2 v5.11-rc1 v5.11-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
base:    ef8b014ee4a1ccd9e751732690a8c7cdeed945e7
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
        git checkout 2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/tegra/uapi/uapi.c:62:5: warning: no previous prototype for 'close_channel_ctx' [-Wmissing-prototypes]
      62 | int close_channel_ctx(int id, void *p, void *data)
         |     ^~~~~~~~~~~~~~~~~


vim +/close_channel_ctx +62 drivers/gpu/drm/tegra/uapi/uapi.c

    61	
  > 62	int close_channel_ctx(int id, void *p, void *data)
    63	{
    64		struct tegra_drm_channel_ctx *ctx = p;
    65	
    66		tegra_drm_channel_ctx_close(ctx);
    67	
    68		return 0;
    69	}
    70	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 54251 bytes --]

[-- Attachment #3: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-01-11 17:37     ` kernel test robot
  0 siblings, 0 replies; 195+ messages in thread
From: kernel test robot @ 2021-01-11 17:37 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 2092 bytes --]

Hi Mikko,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20210111]
[cannot apply to tegra-drm/drm/tegra/for-next tegra/for-next linus/master v5.11-rc3 v5.11-rc2 v5.11-rc1 v5.11-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
base:    ef8b014ee4a1ccd9e751732690a8c7cdeed945e7
config: arm-defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Mikko-Perttunen/Host1x-TegraDRM-UAPI/20210111-210543
        git checkout 2c6bdcf3f00bfb0a4399d29b8c066618067560a6
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> drivers/gpu/drm/tegra/uapi/uapi.c:62:5: warning: no previous prototype for 'close_channel_ctx' [-Wmissing-prototypes]
      62 | int close_channel_ctx(int id, void *p, void *data)
         |     ^~~~~~~~~~~~~~~~~


vim +/close_channel_ctx +62 drivers/gpu/drm/tegra/uapi/uapi.c

    61	
  > 62	int close_channel_ctx(int id, void *p, void *data)
    63	{
    64		struct tegra_drm_channel_ctx *ctx = p;
    65	
    66		tegra_drm_channel_ctx_close(ctx);
    67	
    68		return 0;
    69	}
    70	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 54251 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-01-12 22:07     ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-12 22:07 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

11.01.2021 16:00, Mikko Perttunen пишет:
> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> +			 bool flush)
>  {
>  	struct host1x_waitlist *waiter = ref;
>  	struct host1x_syncpt *syncpt;
>  
> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> -	       WLS_REMOVED)
> -		schedule();
> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>  
>  	syncpt = host->syncpt + id;
> -	(void)process_wait_list(host, syncpt,
> -				host1x_syncpt_load(host->syncpt + id));
> +
> +	spin_lock(&syncpt->intr.lock);
> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> +	    WLS_CANCELLED) {
> +		list_del(&waiter->list);
> +		kref_put(&waiter->refcount, waiter_release);
> +	}
> +	spin_unlock(&syncpt->intr.lock);
> +
> +	if (flush) {
> +		/* Wait until any concurrently executing handler has finished. */
> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> +			cpu_relax();
> +	}

A busy-loop shouldn't be used in kernel unless there is a very good
reason. The wait_event() should be used instead.

But please don't hurry to update this patch, we may need or want to
retire the host1x-waiter and then these all waiter-related patches won't
be needed.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-01-12 22:07     ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-12 22:07 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

11.01.2021 16:00, Mikko Perttunen пишет:
> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> +			 bool flush)
>  {
>  	struct host1x_waitlist *waiter = ref;
>  	struct host1x_syncpt *syncpt;
>  
> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> -	       WLS_REMOVED)
> -		schedule();
> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>  
>  	syncpt = host->syncpt + id;
> -	(void)process_wait_list(host, syncpt,
> -				host1x_syncpt_load(host->syncpt + id));
> +
> +	spin_lock(&syncpt->intr.lock);
> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> +	    WLS_CANCELLED) {
> +		list_del(&waiter->list);
> +		kref_put(&waiter->refcount, waiter_release);
> +	}
> +	spin_unlock(&syncpt->intr.lock);
> +
> +	if (flush) {
> +		/* Wait until any concurrently executing handler has finished. */
> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> +			cpu_relax();
> +	}

A busy-loop shouldn't be used in kernel unless there is a very good
reason. The wait_event() should be used instead.

But please don't hurry to update this patch, we may need or want to
retire the host1x-waiter and then these all waiter-related patches won't
be needed.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-12 22:07     ` Dmitry Osipenko
@ 2021-01-12 22:20       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-12 22:20 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
>> +			 bool flush)
>>   {
>>   	struct host1x_waitlist *waiter = ref;
>>   	struct host1x_syncpt *syncpt;
>>   
>> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
>> -	       WLS_REMOVED)
>> -		schedule();
>> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>   
>>   	syncpt = host->syncpt + id;
>> -	(void)process_wait_list(host, syncpt,
>> -				host1x_syncpt_load(host->syncpt + id));
>> +
>> +	spin_lock(&syncpt->intr.lock);
>> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>> +	    WLS_CANCELLED) {
>> +		list_del(&waiter->list);
>> +		kref_put(&waiter->refcount, waiter_release);
>> +	}
>> +	spin_unlock(&syncpt->intr.lock);
>> +
>> +	if (flush) {
>> +		/* Wait until any concurrently executing handler has finished. */
>> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
>> +			cpu_relax();
>> +	}
> 
> A busy-loop shouldn't be used in kernel unless there is a very good
> reason. The wait_event() should be used instead.
> 
> But please don't hurry to update this patch, we may need or want to
> retire the host1x-waiter and then these all waiter-related patches won't
> be needed.
> 

Yes, we should improve the intr code to remove all this complexity. But 
let's merge this first to get a functional baseline and do larger design 
changes in follow-up patches.

It is cumbersome for me to develop further series (of which I have 
several under work and planning) with this baseline series not being 
merged. The uncertainty on the approval of the UAPI design also makes it 
hard to know whether it makes sense for me to work on top of this code 
or not, so I'd like to focus on what's needed to get this merged instead 
of further redesign of the driver at this time.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-01-12 22:20       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-12 22:20 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
>> +			 bool flush)
>>   {
>>   	struct host1x_waitlist *waiter = ref;
>>   	struct host1x_syncpt *syncpt;
>>   
>> -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
>> -	       WLS_REMOVED)
>> -		schedule();
>> +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>   
>>   	syncpt = host->syncpt + id;
>> -	(void)process_wait_list(host, syncpt,
>> -				host1x_syncpt_load(host->syncpt + id));
>> +
>> +	spin_lock(&syncpt->intr.lock);
>> +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>> +	    WLS_CANCELLED) {
>> +		list_del(&waiter->list);
>> +		kref_put(&waiter->refcount, waiter_release);
>> +	}
>> +	spin_unlock(&syncpt->intr.lock);
>> +
>> +	if (flush) {
>> +		/* Wait until any concurrently executing handler has finished. */
>> +		while (atomic_read(&waiter->state) != WLS_HANDLED)
>> +			cpu_relax();
>> +	}
> 
> A busy-loop shouldn't be used in kernel unless there is a very good
> reason. The wait_event() should be used instead.
> 
> But please don't hurry to update this patch, we may need or want to
> retire the host1x-waiter and then these all waiter-related patches won't
> be needed.
> 

Yes, we should improve the intr code to remove all this complexity. But 
let's merge this first to get a functional baseline and do larger design 
changes in follow-up patches.

It is cumbersome for me to develop further series (of which I have 
several under work and planning) with this baseline series not being 
merged. The uncertainty on the approval of the UAPI design also makes it 
hard to know whether it makes sense for me to work on top of this code 
or not, so I'd like to focus on what's needed to get this merged instead 
of further redesign of the driver at this time.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-01-12 22:27     ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-12 22:27 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

11.01.2021 16:00, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;
> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);

Let's omit the shared_channel until it will be really needed and used.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-01-12 22:27     ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-12 22:27 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

11.01.2021 16:00, Mikko Perttunen пишет:
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;
> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);

Let's omit the shared_channel until it will be really needed and used.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-12 22:20       ` Mikko Perttunen
@ 2021-01-13 16:29         ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-13 16:29 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

13.01.2021 01:20, Mikko Perttunen пишет:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref)
>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref,
>>> +             bool flush)
>>>   {
>>>       struct host1x_waitlist *waiter = ref;
>>>       struct host1x_syncpt *syncpt;
>>>   -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>> WLS_CANCELLED) ==
>>> -           WLS_REMOVED)
>>> -        schedule();
>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>         syncpt = host->syncpt + id;
>>> -    (void)process_wait_list(host, syncpt,
>>> -                host1x_syncpt_load(host->syncpt + id));
>>> +
>>> +    spin_lock(&syncpt->intr.lock);
>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>> +        WLS_CANCELLED) {
>>> +        list_del(&waiter->list);
>>> +        kref_put(&waiter->refcount, waiter_release);
>>> +    }
>>> +    spin_unlock(&syncpt->intr.lock);
>>> +
>>> +    if (flush) {
>>> +        /* Wait until any concurrently executing handler has
>>> finished. */
>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>> +            cpu_relax();
>>> +    }
>>
>> A busy-loop shouldn't be used in kernel unless there is a very good
>> reason. The wait_event() should be used instead.
>>
>> But please don't hurry to update this patch, we may need or want to
>> retire the host1x-waiter and then these all waiter-related patches won't
>> be needed.
>>
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.
> 
> It is cumbersome for me to develop further series (of which I have
> several under work and planning) with this baseline series not being
> merged. The uncertainty on the approval of the UAPI design also makes it
> hard to know whether it makes sense for me to work on top of this code
> or not, so I'd like to focus on what's needed to get this merged instead
> of further redesign of the driver at this time.

Is this patch (and some others) necessary for the new UAPI? If not,
could we please narrow down the patches to the minimum that is needed
for trying out the new UAPI?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-01-13 16:29         ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-13 16:29 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

13.01.2021 01:20, Mikko Perttunen пишет:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref)
>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>> *ref,
>>> +             bool flush)
>>>   {
>>>       struct host1x_waitlist *waiter = ref;
>>>       struct host1x_syncpt *syncpt;
>>>   -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>> WLS_CANCELLED) ==
>>> -           WLS_REMOVED)
>>> -        schedule();
>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>         syncpt = host->syncpt + id;
>>> -    (void)process_wait_list(host, syncpt,
>>> -                host1x_syncpt_load(host->syncpt + id));
>>> +
>>> +    spin_lock(&syncpt->intr.lock);
>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>> +        WLS_CANCELLED) {
>>> +        list_del(&waiter->list);
>>> +        kref_put(&waiter->refcount, waiter_release);
>>> +    }
>>> +    spin_unlock(&syncpt->intr.lock);
>>> +
>>> +    if (flush) {
>>> +        /* Wait until any concurrently executing handler has
>>> finished. */
>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>> +            cpu_relax();
>>> +    }
>>
>> A busy-loop shouldn't be used in kernel unless there is a very good
>> reason. The wait_event() should be used instead.
>>
>> But please don't hurry to update this patch, we may need or want to
>> retire the host1x-waiter and then these all waiter-related patches won't
>> be needed.
>>
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.
> 
> It is cumbersome for me to develop further series (of which I have
> several under work and planning) with this baseline series not being
> merged. The uncertainty on the approval of the UAPI design also makes it
> hard to know whether it makes sense for me to work on top of this code
> or not, so I'd like to focus on what's needed to get this merged instead
> of further redesign of the driver at this time.

Is this patch (and some others) necessary for the new UAPI? If not,
could we please narrow down the patches to the minimum that is needed
for trying out the new UAPI?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-01-13 18:14     ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-13 18:14 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

11.01.2021 16:00, Mikko Perttunen пишет:
> +struct drm_tegra_submit_buf {
> +	/**
> +	 * @mapping_id: [in]
> +	 *
> +	 * Identifier of the mapping to use in the submission.
> +	 */
> +	__u32 mapping_id;

I'm now in process of trying out the UAPI using grate drivers and this
becomes the first obstacle.

Looks like this is not going to work well for older Tegra SoCs, in
particular for T20, which has a small GART.

Given that the usefulness of the partial mapping feature is very
questionable until it will be proven with a real userspace, we should
start with a dynamic mappings that are done at a time of job submission.

DRM already should have everything necessary for creating and managing
caches of mappings, grate kernel driver has been using drm_mm_scan for a
long time now for that.

It should be fine to support the static mapping feature, but it should
be done separately with the drm_mm integration, IMO.

What do think?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-01-13 18:14     ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-13 18:14 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

11.01.2021 16:00, Mikko Perttunen пишет:
> +struct drm_tegra_submit_buf {
> +	/**
> +	 * @mapping_id: [in]
> +	 *
> +	 * Identifier of the mapping to use in the submission.
> +	 */
> +	__u32 mapping_id;

I'm now in process of trying out the UAPI using grate drivers and this
becomes the first obstacle.

Looks like this is not going to work well for older Tegra SoCs, in
particular for T20, which has a small GART.

Given that the usefulness of the partial mapping feature is very
questionable until it will be proven with a real userspace, we should
start with a dynamic mappings that are done at a time of job submission.

DRM already should have everything necessary for creating and managing
caches of mappings, grate kernel driver has been using drm_mm_scan for a
long time now for that.

It should be fine to support the static mapping feature, but it should
be done separately with the drm_mm integration, IMO.

What do think?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-13 16:29         ` Dmitry Osipenko
@ 2021-01-13 18:16           ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-13 18:16 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/13/21 6:29 PM, Dmitry Osipenko wrote:
> 13.01.2021 01:20, Mikko Perttunen пишет:
>> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref)
>>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref,
>>>> +             bool flush)
>>>>    {
>>>>        struct host1x_waitlist *waiter = ref;
>>>>        struct host1x_syncpt *syncpt;
>>>>    -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>>> WLS_CANCELLED) ==
>>>> -           WLS_REMOVED)
>>>> -        schedule();
>>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>>          syncpt = host->syncpt + id;
>>>> -    (void)process_wait_list(host, syncpt,
>>>> -                host1x_syncpt_load(host->syncpt + id));
>>>> +
>>>> +    spin_lock(&syncpt->intr.lock);
>>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>>> +        WLS_CANCELLED) {
>>>> +        list_del(&waiter->list);
>>>> +        kref_put(&waiter->refcount, waiter_release);
>>>> +    }
>>>> +    spin_unlock(&syncpt->intr.lock);
>>>> +
>>>> +    if (flush) {
>>>> +        /* Wait until any concurrently executing handler has
>>>> finished. */
>>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>>> +            cpu_relax();
>>>> +    }
>>>
>>> A busy-loop shouldn't be used in kernel unless there is a very good
>>> reason. The wait_event() should be used instead.
>>>
>>> But please don't hurry to update this patch, we may need or want to
>>> retire the host1x-waiter and then these all waiter-related patches won't
>>> be needed.
>>>
>>
>> Yes, we should improve the intr code to remove all this complexity. But
>> let's merge this first to get a functional baseline and do larger design
>> changes in follow-up patches.
>>
>> It is cumbersome for me to develop further series (of which I have
>> several under work and planning) with this baseline series not being
>> merged. The uncertainty on the approval of the UAPI design also makes it
>> hard to know whether it makes sense for me to work on top of this code
>> or not, so I'd like to focus on what's needed to get this merged instead
>> of further redesign of the driver at this time.
> 
> Is this patch (and some others) necessary for the new UAPI? If not,
> could we please narrow down the patches to the minimum that is needed
> for trying out the new UAPI?
> 

Yes, it is necessary. I tried to revert it and half the tests in the 
test suite start failing.

I think patches 01, 03, 14 and 17 are not strictly required, though 
reverting 03 will cause one of the syncpoint tests to fail.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-01-13 18:16           ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-13 18:16 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/13/21 6:29 PM, Dmitry Osipenko wrote:
> 13.01.2021 01:20, Mikko Perttunen пишет:
>> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref)
>>>> +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void
>>>> *ref,
>>>> +             bool flush)
>>>>    {
>>>>        struct host1x_waitlist *waiter = ref;
>>>>        struct host1x_syncpt *syncpt;
>>>>    -    while (atomic_cmpxchg(&waiter->state, WLS_PENDING,
>>>> WLS_CANCELLED) ==
>>>> -           WLS_REMOVED)
>>>> -        schedule();
>>>> +    atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
>>>>          syncpt = host->syncpt + id;
>>>> -    (void)process_wait_list(host, syncpt,
>>>> -                host1x_syncpt_load(host->syncpt + id));
>>>> +
>>>> +    spin_lock(&syncpt->intr.lock);
>>>> +    if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
>>>> +        WLS_CANCELLED) {
>>>> +        list_del(&waiter->list);
>>>> +        kref_put(&waiter->refcount, waiter_release);
>>>> +    }
>>>> +    spin_unlock(&syncpt->intr.lock);
>>>> +
>>>> +    if (flush) {
>>>> +        /* Wait until any concurrently executing handler has
>>>> finished. */
>>>> +        while (atomic_read(&waiter->state) != WLS_HANDLED)
>>>> +            cpu_relax();
>>>> +    }
>>>
>>> A busy-loop shouldn't be used in kernel unless there is a very good
>>> reason. The wait_event() should be used instead.
>>>
>>> But please don't hurry to update this patch, we may need or want to
>>> retire the host1x-waiter and then these all waiter-related patches won't
>>> be needed.
>>>
>>
>> Yes, we should improve the intr code to remove all this complexity. But
>> let's merge this first to get a functional baseline and do larger design
>> changes in follow-up patches.
>>
>> It is cumbersome for me to develop further series (of which I have
>> several under work and planning) with this baseline series not being
>> merged. The uncertainty on the approval of the UAPI design also makes it
>> hard to know whether it makes sense for me to work on top of this code
>> or not, so I'd like to focus on what's needed to get this merged instead
>> of further redesign of the driver at this time.
> 
> Is this patch (and some others) necessary for the new UAPI? If not,
> could we please narrow down the patches to the minimum that is needed
> for trying out the new UAPI?
> 

Yes, it is necessary. I tried to revert it and half the tests in the 
test suite start failing.

I think patches 01, 03, 14 and 17 are not strictly required, though 
reverting 03 will cause one of the syncpoint tests to fail.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-13 18:14     ` Dmitry Osipenko
@ 2021-01-13 18:56       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-13 18:56 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> +struct drm_tegra_submit_buf {
>> +	/**
>> +	 * @mapping_id: [in]
>> +	 *
>> +	 * Identifier of the mapping to use in the submission.
>> +	 */
>> +	__u32 mapping_id;
> 
> I'm now in process of trying out the UAPI using grate drivers and this
> becomes the first obstacle.
> 
> Looks like this is not going to work well for older Tegra SoCs, in
> particular for T20, which has a small GART.
> 
> Given that the usefulness of the partial mapping feature is very
> questionable until it will be proven with a real userspace, we should
> start with a dynamic mappings that are done at a time of job submission.
> 
> DRM already should have everything necessary for creating and managing
> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> long time now for that.
> 
> It should be fine to support the static mapping feature, but it should
> be done separately with the drm_mm integration, IMO.
> 
> What do think?
> 

Can you elaborate on the requirements to be able to use GART? Are there 
any other reasons this would not work on older chips?

I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART 
we cannot do mapping immediately at CHANNEL_MAP time, we can just treat 
it as a "registration" call for the GEM object - potentially no-op like 
direct physical addressing is. We can then do whatever is needed at 
submit time. This way we can have the best of both worlds.

Note that partial mappings are already not present in this version of 
the UAPI.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-01-13 18:56       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-13 18:56 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> 11.01.2021 16:00, Mikko Perttunen пишет:
>> +struct drm_tegra_submit_buf {
>> +	/**
>> +	 * @mapping_id: [in]
>> +	 *
>> +	 * Identifier of the mapping to use in the submission.
>> +	 */
>> +	__u32 mapping_id;
> 
> I'm now in process of trying out the UAPI using grate drivers and this
> becomes the first obstacle.
> 
> Looks like this is not going to work well for older Tegra SoCs, in
> particular for T20, which has a small GART.
> 
> Given that the usefulness of the partial mapping feature is very
> questionable until it will be proven with a real userspace, we should
> start with a dynamic mappings that are done at a time of job submission.
> 
> DRM already should have everything necessary for creating and managing
> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> long time now for that.
> 
> It should be fine to support the static mapping feature, but it should
> be done separately with the drm_mm integration, IMO.
> 
> What do think?
> 

Can you elaborate on the requirements to be able to use GART? Are there 
any other reasons this would not work on older chips?

I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART 
we cannot do mapping immediately at CHANNEL_MAP time, we can just treat 
it as a "registration" call for the GEM object - potentially no-op like 
direct physical addressing is. We can then do whatever is needed at 
submit time. This way we can have the best of both worlds.

Note that partial mappings are already not present in this version of 
the UAPI.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-13 18:56       ` Mikko Perttunen
@ 2021-01-14  8:36         ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-14  8:36 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

13.01.2021 21:56, Mikko Perttunen пишет:
> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> +struct drm_tegra_submit_buf {
>>> +    /**
>>> +     * @mapping_id: [in]
>>> +     *
>>> +     * Identifier of the mapping to use in the submission.
>>> +     */
>>> +    __u32 mapping_id;
>>
>> I'm now in process of trying out the UAPI using grate drivers and this
>> becomes the first obstacle.
>>
>> Looks like this is not going to work well for older Tegra SoCs, in
>> particular for T20, which has a small GART.
>>
>> Given that the usefulness of the partial mapping feature is very
>> questionable until it will be proven with a real userspace, we should
>> start with a dynamic mappings that are done at a time of job submission.
>>
>> DRM already should have everything necessary for creating and managing
>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>> long time now for that.
>>
>> It should be fine to support the static mapping feature, but it should
>> be done separately with the drm_mm integration, IMO.
>>
>> What do think?
>>
> 
> Can you elaborate on the requirements to be able to use GART? Are there
> any other reasons this would not work on older chips?

We have all DRM devices in a single address space on T30+, hence having
duplicated mappings for each device should be a bit wasteful.

> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
> it as a "registration" call for the GEM object - potentially no-op like
> direct physical addressing is. We can then do whatever is needed at
> submit time. This way we can have the best of both worlds.

I have some thoughts now, but nothing concrete yet. Maybe we will need
to create a per-SoC ops for MM.

I'll finish with trying what we currently have to see what else is
missing and then we will decide what to do about it.

> Note that partial mappings are already not present in this version of
> the UAPI.

Oh, right :) I haven't got closely to this part of reviewing yet.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-01-14  8:36         ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-14  8:36 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

13.01.2021 21:56, Mikko Perttunen пишет:
> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>> +struct drm_tegra_submit_buf {
>>> +    /**
>>> +     * @mapping_id: [in]
>>> +     *
>>> +     * Identifier of the mapping to use in the submission.
>>> +     */
>>> +    __u32 mapping_id;
>>
>> I'm now in process of trying out the UAPI using grate drivers and this
>> becomes the first obstacle.
>>
>> Looks like this is not going to work well for older Tegra SoCs, in
>> particular for T20, which has a small GART.
>>
>> Given that the usefulness of the partial mapping feature is very
>> questionable until it will be proven with a real userspace, we should
>> start with a dynamic mappings that are done at a time of job submission.
>>
>> DRM already should have everything necessary for creating and managing
>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>> long time now for that.
>>
>> It should be fine to support the static mapping feature, but it should
>> be done separately with the drm_mm integration, IMO.
>>
>> What do think?
>>
> 
> Can you elaborate on the requirements to be able to use GART? Are there
> any other reasons this would not work on older chips?

We have all DRM devices in a single address space on T30+, hence having
duplicated mappings for each device should be a bit wasteful.

> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
> it as a "registration" call for the GEM object - potentially no-op like
> direct physical addressing is. We can then do whatever is needed at
> submit time. This way we can have the best of both worlds.

I have some thoughts now, but nothing concrete yet. Maybe we will need
to create a per-SoC ops for MM.

I'll finish with trying what we currently have to see what else is
missing and then we will decide what to do about it.

> Note that partial mappings are already not present in this version of
> the UAPI.

Oh, right :) I haven't got closely to this part of reviewing yet.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-14  8:36         ` Dmitry Osipenko
@ 2021-01-14 10:34           ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-14 10:34 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> 13.01.2021 21:56, Mikko Perttunen пишет:
>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> +struct drm_tegra_submit_buf {
>>>> +    /**
>>>> +     * @mapping_id: [in]
>>>> +     *
>>>> +     * Identifier of the mapping to use in the submission.
>>>> +     */
>>>> +    __u32 mapping_id;
>>>
>>> I'm now in process of trying out the UAPI using grate drivers and this
>>> becomes the first obstacle.
>>>
>>> Looks like this is not going to work well for older Tegra SoCs, in
>>> particular for T20, which has a small GART.
>>>
>>> Given that the usefulness of the partial mapping feature is very
>>> questionable until it will be proven with a real userspace, we should
>>> start with a dynamic mappings that are done at a time of job submission.
>>>
>>> DRM already should have everything necessary for creating and managing
>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>> long time now for that.
>>>
>>> It should be fine to support the static mapping feature, but it should
>>> be done separately with the drm_mm integration, IMO.
>>>
>>> What do think?
>>>
>>
>> Can you elaborate on the requirements to be able to use GART? Are there
>> any other reasons this would not work on older chips?
> 
> We have all DRM devices in a single address space on T30+, hence having
> duplicated mappings for each device should be a bit wasteful.

I guess this should be pretty easy to change to only keep one mapping 
per GEM object.

> 
>> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
>> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
>> it as a "registration" call for the GEM object - potentially no-op like
>> direct physical addressing is. We can then do whatever is needed at
>> submit time. This way we can have the best of both worlds.
> 
> I have some thoughts now, but nothing concrete yet. Maybe we will need
> to create a per-SoC ops for MM.

Yep, I think some specialized code will be needed, but hopefully it will 
be relatively minor.

> 
> I'll finish with trying what we currently have to see what else is
> missing and then we will decide what to do about it.
> 

Great :)

>> Note that partial mappings are already not present in this version of
>> the UAPI.
> 
> Oh, right :) I haven't got closely to this part of reviewing yet.
> 

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-01-14 10:34           ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-14 10:34 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> 13.01.2021 21:56, Mikko Perttunen пишет:
>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>> +struct drm_tegra_submit_buf {
>>>> +    /**
>>>> +     * @mapping_id: [in]
>>>> +     *
>>>> +     * Identifier of the mapping to use in the submission.
>>>> +     */
>>>> +    __u32 mapping_id;
>>>
>>> I'm now in process of trying out the UAPI using grate drivers and this
>>> becomes the first obstacle.
>>>
>>> Looks like this is not going to work well for older Tegra SoCs, in
>>> particular for T20, which has a small GART.
>>>
>>> Given that the usefulness of the partial mapping feature is very
>>> questionable until it will be proven with a real userspace, we should
>>> start with a dynamic mappings that are done at a time of job submission.
>>>
>>> DRM already should have everything necessary for creating and managing
>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>> long time now for that.
>>>
>>> It should be fine to support the static mapping feature, but it should
>>> be done separately with the drm_mm integration, IMO.
>>>
>>> What do think?
>>>
>>
>> Can you elaborate on the requirements to be able to use GART? Are there
>> any other reasons this would not work on older chips?
> 
> We have all DRM devices in a single address space on T30+, hence having
> duplicated mappings for each device should be a bit wasteful.

I guess this should be pretty easy to change to only keep one mapping 
per GEM object.

> 
>> I think we should keep CHANNEL_MAP and mapping_ids, but if e.g. for GART
>> we cannot do mapping immediately at CHANNEL_MAP time, we can just treat
>> it as a "registration" call for the GEM object - potentially no-op like
>> direct physical addressing is. We can then do whatever is needed at
>> submit time. This way we can have the best of both worlds.
> 
> I have some thoughts now, but nothing concrete yet. Maybe we will need
> to create a per-SoC ops for MM.

Yep, I think some specialized code will be needed, but hopefully it will 
be relatively minor.

> 
> I'll finish with trying what we currently have to see what else is
> missing and then we will decide what to do about it.
> 

Great :)

>> Note that partial mappings are already not present in this version of
>> the UAPI.
> 
> Oh, right :) I haven't got closely to this part of reviewing yet.
> 

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-11 12:59 ` Mikko Perttunen
@ 2021-01-19 22:29   ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-19 22:29 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

11.01.2021 15:59, Mikko Perttunen пишет:
> Hi all,
> 
> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
> containing primarily small bug fixes. It has also been
> rebased on top of recent linux-next.
> 
> vaapi-tegra-driver has been updated to support the new UAPI
> as well as Tegra186:
> 
>   https://github.com/cyndis/vaapi-tegra-driver
> 
> The `putsurface` program has been tested to work.
> 
> The test suite for the new UAPI is available at
> https://github.com/cyndis/uapi-test
> 
> The series can be also found in
> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
> 
> Older versions:
> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
> 
> Thank you,
> Mikko

The basic support for the v5 UAPI is added now to the Opentegra driver.
 In overall UAPI works, but there are couple things that we need to
improve, I'll focus on them here.

Problems
========

1. The channel map/unmap API needs some more thought.

The main problem is a difficulty to track liveness of BOs and mappings.
 The kernel driver refs BO for each mapping and userspace needs to track
both BO and its mappings together, it's too easy to make mistake and
leak BOs without noticing.

2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
remember we discussed this previously, but this pops up again and I
don't remember where we ended previously.

This creates unnecessary complexity for userspace.  Userspace needs to
go through a lot of chores just to get a sync point and then to manage
it for the jobs.

Nothing stops two different channels to reuse a single sync point and
use it for a job, fixing this will only add more complexity to the
kernel driver instead of removing it.

3. The signalling of DMA fences doesn't work properly in v5 apparently
because of the host1x waiter bug.  I see that sync point interrupt
happens, but waiter callback isn't invoked.

4. The sync_file API is not very suitable for DRM purposes because of
-EMFILE "Too many open files", which I saw while was running x11perf.
It also adds complexity to userspace, instead of removing it.  This
approach not suitable for DRM scheduler as well.

5. Sync points have a dirty hardware state when allocated / requested.
The special sync point reservation is meaningless in this case.

6. I found that the need to chop cmdstream into gathers is a bit
cumbersome for userspace of older SoCs which don't have h/w firewall.
Can we support option where all commands are collected into a single
dedicated cmdstream for a job?

Possible solutions for the above problems
=========================================

1. Stop to use concept of "channels". Switch to DRM context-only.

Each DRM context should get access to all engines once DRM context is
created.  Think of it like "when DRM context is opened, it opens a
channel for each engine".

Then each DRM context will get one instance of mapping per-engine for
each BO.

enum tegra_engine {
	TEGRA_GR2D,
	TEGRA_GR3D,
	TEGRA_VIC,
	...
	NUM_ENGINES
};

struct tegra_bo_mapping {
	dma_addr_t ioaddr;
	...
};

struct tegra_bo {
	...
	struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
};

Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
specified h/w engine.

Once BO is closed, all its mappings should be closed too.  This way
userspace doesn't need to track both BOs and mappings.

Everything that userspace needs to do is:

	1) Open DRM context
	2) Create GEM
	3) Map GEM to required hardware engines
	4) Submit job that uses GEM handle
	5) Close GEM

If GEM wasn't mapped prior to the job's submission, then job will fail
because kernel driver won't resolve the IO mapping of the GEM.

2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
increments.  The job's sync point will be allocated dynamically when job
is submitted.  We will need a fag for the sync_incr and wait_syncpt
commands, saying "it's a job's sync point increment/wait"

3. We should use dma-fence API directly and waiter-shim should be
removed.  It's great that you're already working on this.

4. Sync file shouldn't be needed for the part of DRM API which doesn't
interact with external non-DRM devices.  We should use DRM syncobj for
everything related to DRM, it's a superior API over sync file, it's
suitable for DRM scheduler.

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.

6. We will need to allocate a host1x BO for a job's cmdstream and add a
restart command to the end of the job's stream.  CDMA will jump into the
job's stream from push buffer.

We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
gather should be inlined into job's main cmdstream.

This will remove a need to have a large push buffer that will easily
overflow, it's a real problem and upstream driver even has a bug where
it locks up on overflow.

How it will look from CDMA perspective:

PUSHBUF |
---------
...     |      | JOB   |
        |      ---------       | JOB GATHER |
RESTART	------> CMD    |       --------------
        |      |GATHER -------> DATA        |
... <---------- RESTART|       |            |
        |      |       |


What's missing
==============

1. Explicit and implicit fencing isn't there yet, we need to support DRM
scheduler for that.

2. The "wait" command probably should be taught to take a syncobj handle
in order to populate it with a dma-fence by kernel driver once job is
submitted.  This will give us intermediate fences of a job and allow
utilize the syncobj features like "wait until job is submitted to h/w
before starting to wait for timeout", which will be needed by userspace
when DRM scheduler will be supported.

Miscellaneous
=============

1. Please don't forget to bump driver version.  This is important for
userspace.

2. Please use a proper kernel coding style, use checkpatch.

   # git format-patch -v5 ...
   # ./scripts/checkpatch.pl --strict v5*

3. Kernel driver needs a rich error messages for each error condition
and it should dump submitted job when firewall fails.  It's very tedious
to re-add it all each time when something doesn't work.

4. Previously firewall was using the client's class if is_valid_class
wasn't specified in tegra_drm_client_ops, you changed it and now
firewall fails for GR3D because it doesn't have the is_valid_class() set
in the driver.  See [1].

5. The CDMA class should be restored to the class of a previous gather
after the wait command in submit_gathers() and not to the class of the
client.  GR2D supports multiple classes.  See [1].

[1]
https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-19 22:29   ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-19 22:29 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh, airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

11.01.2021 15:59, Mikko Perttunen пишет:
> Hi all,
> 
> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
> containing primarily small bug fixes. It has also been
> rebased on top of recent linux-next.
> 
> vaapi-tegra-driver has been updated to support the new UAPI
> as well as Tegra186:
> 
>   https://github.com/cyndis/vaapi-tegra-driver
> 
> The `putsurface` program has been tested to work.
> 
> The test suite for the new UAPI is available at
> https://github.com/cyndis/uapi-test
> 
> The series can be also found in
> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
> 
> Older versions:
> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
> 
> Thank you,
> Mikko

The basic support for the v5 UAPI is added now to the Opentegra driver.
 In overall UAPI works, but there are couple things that we need to
improve, I'll focus on them here.

Problems
========

1. The channel map/unmap API needs some more thought.

The main problem is a difficulty to track liveness of BOs and mappings.
 The kernel driver refs BO for each mapping and userspace needs to track
both BO and its mappings together, it's too easy to make mistake and
leak BOs without noticing.

2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
remember we discussed this previously, but this pops up again and I
don't remember where we ended previously.

This creates unnecessary complexity for userspace.  Userspace needs to
go through a lot of chores just to get a sync point and then to manage
it for the jobs.

Nothing stops two different channels to reuse a single sync point and
use it for a job, fixing this will only add more complexity to the
kernel driver instead of removing it.

3. The signalling of DMA fences doesn't work properly in v5 apparently
because of the host1x waiter bug.  I see that sync point interrupt
happens, but waiter callback isn't invoked.

4. The sync_file API is not very suitable for DRM purposes because of
-EMFILE "Too many open files", which I saw while was running x11perf.
It also adds complexity to userspace, instead of removing it.  This
approach not suitable for DRM scheduler as well.

5. Sync points have a dirty hardware state when allocated / requested.
The special sync point reservation is meaningless in this case.

6. I found that the need to chop cmdstream into gathers is a bit
cumbersome for userspace of older SoCs which don't have h/w firewall.
Can we support option where all commands are collected into a single
dedicated cmdstream for a job?

Possible solutions for the above problems
=========================================

1. Stop to use concept of "channels". Switch to DRM context-only.

Each DRM context should get access to all engines once DRM context is
created.  Think of it like "when DRM context is opened, it opens a
channel for each engine".

Then each DRM context will get one instance of mapping per-engine for
each BO.

enum tegra_engine {
	TEGRA_GR2D,
	TEGRA_GR3D,
	TEGRA_VIC,
	...
	NUM_ENGINES
};

struct tegra_bo_mapping {
	dma_addr_t ioaddr;
	...
};

struct tegra_bo {
	...
	struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
};

Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
specified h/w engine.

Once BO is closed, all its mappings should be closed too.  This way
userspace doesn't need to track both BOs and mappings.

Everything that userspace needs to do is:

	1) Open DRM context
	2) Create GEM
	3) Map GEM to required hardware engines
	4) Submit job that uses GEM handle
	5) Close GEM

If GEM wasn't mapped prior to the job's submission, then job will fail
because kernel driver won't resolve the IO mapping of the GEM.

2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
increments.  The job's sync point will be allocated dynamically when job
is submitted.  We will need a fag for the sync_incr and wait_syncpt
commands, saying "it's a job's sync point increment/wait"

3. We should use dma-fence API directly and waiter-shim should be
removed.  It's great that you're already working on this.

4. Sync file shouldn't be needed for the part of DRM API which doesn't
interact with external non-DRM devices.  We should use DRM syncobj for
everything related to DRM, it's a superior API over sync file, it's
suitable for DRM scheduler.

5. The hardware state of sync points should be reset when sync point is
requested, not when host1x driver is initialized.

6. We will need to allocate a host1x BO for a job's cmdstream and add a
restart command to the end of the job's stream.  CDMA will jump into the
job's stream from push buffer.

We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
gather should be inlined into job's main cmdstream.

This will remove a need to have a large push buffer that will easily
overflow, it's a real problem and upstream driver even has a bug where
it locks up on overflow.

How it will look from CDMA perspective:

PUSHBUF |
---------
...     |      | JOB   |
        |      ---------       | JOB GATHER |
RESTART	------> CMD    |       --------------
        |      |GATHER -------> DATA        |
... <---------- RESTART|       |            |
        |      |       |


What's missing
==============

1. Explicit and implicit fencing isn't there yet, we need to support DRM
scheduler for that.

2. The "wait" command probably should be taught to take a syncobj handle
in order to populate it with a dma-fence by kernel driver once job is
submitted.  This will give us intermediate fences of a job and allow
utilize the syncobj features like "wait until job is submitted to h/w
before starting to wait for timeout", which will be needed by userspace
when DRM scheduler will be supported.

Miscellaneous
=============

1. Please don't forget to bump driver version.  This is important for
userspace.

2. Please use a proper kernel coding style, use checkpatch.

   # git format-patch -v5 ...
   # ./scripts/checkpatch.pl --strict v5*

3. Kernel driver needs a rich error messages for each error condition
and it should dump submitted job when firewall fails.  It's very tedious
to re-add it all each time when something doesn't work.

4. Previously firewall was using the client's class if is_valid_class
wasn't specified in tegra_drm_client_ops, you changed it and now
firewall fails for GR3D because it doesn't have the is_valid_class() set
in the driver.  See [1].

5. The CDMA class should be restored to the class of a previous gather
after the wait command in submit_gathers() and not to the class of the
client.  GR2D supports multiple classes.  See [1].

[1]
https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-19 22:29   ` Dmitry Osipenko
@ 2021-01-26  2:45     ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-26  2:45 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/20/21 12:29 AM, Dmitry Osipenko wrote:
> 11.01.2021 15:59, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
>> containing primarily small bug fixes. It has also been
>> rebased on top of recent linux-next.
>>
>> vaapi-tegra-driver has been updated to support the new UAPI
>> as well as Tegra186:
>>
>>    https://github.com/cyndis/vaapi-tegra-driver
>>
>> The `putsurface` program has been tested to work.
>>
>> The test suite for the new UAPI is available at
>> https://github.com/cyndis/uapi-test
>>
>> The series can be also found in
>> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
>>
>> Older versions:
>> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
>> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
>> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
>> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
>>
>> Thank you,
>> Mikko
> 
> The basic support for the v5 UAPI is added now to the Opentegra driver.
>   In overall UAPI works, but there are couple things that we need to
> improve, I'll focus on them here.
> 
> Problems
> ========
> 
> 1. The channel map/unmap API needs some more thought.
> 
> The main problem is a difficulty to track liveness of BOs and mappings.
>   The kernel driver refs BO for each mapping and userspace needs to track
> both BO and its mappings together, it's too easy to make mistake and
> leak BOs without noticing.
> 
> 2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
> remember we discussed this previously, but this pops up again and I
> don't remember where we ended previously.
> 
> This creates unnecessary complexity for userspace.  Userspace needs to
> go through a lot of chores just to get a sync point and then to manage
> it for the jobs.
> 
> Nothing stops two different channels to reuse a single sync point and
> use it for a job, fixing this will only add more complexity to the
> kernel driver instead of removing it.
> 
> 3. The signalling of DMA fences doesn't work properly in v5 apparently
> because of the host1x waiter bug.  I see that sync point interrupt
> happens, but waiter callback isn't invoked.
> 
> 4. The sync_file API is not very suitable for DRM purposes because of
> -EMFILE "Too many open files", which I saw while was running x11perf.
> It also adds complexity to userspace, instead of removing it.  This
> approach not suitable for DRM scheduler as well.
> 
> 5. Sync points have a dirty hardware state when allocated / requested.
> The special sync point reservation is meaningless in this case.
> 
> 6. I found that the need to chop cmdstream into gathers is a bit
> cumbersome for userspace of older SoCs which don't have h/w firewall.
> Can we support option where all commands are collected into a single
> dedicated cmdstream for a job?
> 
> Possible solutions for the above problems
> =========================================
> 
> 1. Stop to use concept of "channels". Switch to DRM context-only.
> 
> Each DRM context should get access to all engines once DRM context is
> created.  Think of it like "when DRM context is opened, it opens a
> channel for each engine".
> 
> Then each DRM context will get one instance of mapping per-engine for
> each BO.
> 
> enum tegra_engine {
> 	TEGRA_GR2D,
> 	TEGRA_GR3D,
> 	TEGRA_VIC,
> 	...
> 	NUM_ENGINES
> };
> 
> struct tegra_bo_mapping {
> 	dma_addr_t ioaddr;
> 	...
> };
> 
> struct tegra_bo {
> 	...
> 	struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
> };
> 
> Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
> DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
> specified h/w engine.
> 
> Once BO is closed, all its mappings should be closed too.  This way
> userspace doesn't need to track both BOs and mappings.
> 
> Everything that userspace needs to do is:
> 
> 	1) Open DRM context
> 	2) Create GEM
> 	3) Map GEM to required hardware engines
> 	4) Submit job that uses GEM handle
> 	5) Close GEM
> 
> If GEM wasn't mapped prior to the job's submission, then job will fail
> because kernel driver won't resolve the IO mapping of the GEM.
Perhaps we can instead change the reference counting such that if you 
close the GEM object, the mappings will be dropped as well automatically.

> 
> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
> increments.  The job's sync point will be allocated dynamically when job
> is submitted.  We will need a fag for the sync_incr and wait_syncpt
> commands, saying "it's a job's sync point increment/wait"

Negative. Like I have explained in previous discussions, with the 
current way the usage of hardware resources is much more deterministic 
and obvious. I disagree on the point that this is much more complicated 
for the userspace. Separating syncpoint and channel allocation is one of 
the primary motivations of this series for me.

> 
> 3. We should use dma-fence API directly and waiter-shim should be
> removed.  It's great that you're already working on this.
> 
> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
> interact with external non-DRM devices.  We should use DRM syncobj for
> everything related to DRM, it's a superior API over sync file, it's
> suitable for DRM scheduler.

Considering the issues with fileno limits, I suppose there is no other 
choice. Considering the recent NTSYNC proposal by Wine developers, maybe 
we should also have NTHANDLEs to get rid of restrictions of file 
descriptors. DRM syncobjs may have some advantages over sync files, but 
also disadvantages. They cannot be poll()ed, so they cannot be combined 
with waits for other resources.

I'll look into this for v6.

> 
> 5. The hardware state of sync points should be reset when sync point is
> requested, not when host1x driver is initialized.

This may be doable, but I don't think it is critical for this UAPI, so 
let's consider it after this series.

The userspace should anyway not be able to assume the initial value of 
the syncpoint upon allocation. The kernel should set it to some high 
value to catch any issues related to wraparound.

Also, this makes code more complicated since it now needs to ensure all 
waits on the syncpoint have completed before freeing the syncpoint, 
which can be nontrivial e.g. if the waiter is in a different virtual 
machine or some other device connected via PCIe (a real usecase).

> 
> 6. We will need to allocate a host1x BO for a job's cmdstream and add a
> restart command to the end of the job's stream.  CDMA will jump into the
> job's stream from push buffer.
> 
> We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
> gather should be inlined into job's main cmdstream.
> 
> This will remove a need to have a large push buffer that will easily
> overflow, it's a real problem and upstream driver even has a bug where
> it locks up on overflow.
> 
> How it will look from CDMA perspective:
> 
> PUSHBUF |
> ---------
> ...     |      | JOB   |
>          |      ---------       | JOB GATHER |
> RESTART	------> CMD    |       --------------
>          |      |GATHER -------> DATA        |
> ... <---------- RESTART|       |            |
>          |      |       |
> 

Let me check if I understood you correctly:
- You would like to have the job's cmdbuf have further GATHER opcodes 
that jump into smaller gathers?

I assume this is needed because currently WAITs are placed into the 
pushbuffer, so the job will take a lot of space in the pushbuffer if 
there are a lot of waits (and GATHERs in between these waits)?

If so, perhaps as a simpler alternative we could change the firewall to 
allow SETCLASS into HOST1X for waiting specifically, then userspace 
could just submit one big cmdbuf taking only little space in the 
pushbuffer? Although that would only allow direct ID/threshold waits.

In any case, it seems that this can be added in a later patch, so we 
should omit it from this series for simplicity. If it is impossible for 
the userspace to deal with it, we could disable the firewall 
temporarily, or implement the above change in the firewall.

> 
> What's missing
> ==============
> 
> 1. Explicit and implicit fencing isn't there yet, we need to support DRM
> scheduler for that.
> 
> 2. The "wait" command probably should be taught to take a syncobj handle
> in order to populate it with a dma-fence by kernel driver once job is
> submitted.  This will give us intermediate fences of a job and allow
> utilize the syncobj features like "wait until job is submitted to h/w
> before starting to wait for timeout", which will be needed by userspace
> when DRM scheduler will be supported.
> 
> Miscellaneous
> =============
> 
> 1. Please don't forget to bump driver version.  This is important for
> userspace.

Sure. I didn't do it this time since it's backwards compatible and it's 
easy to detect if the new UAPI is available by checking for /dev/host1x. 
I can bump it in v6 if necessary.

> 
> 2. Please use a proper kernel coding style, use checkpatch.
> 
>     # git format-patch -v5 ...
>     # ./scripts/checkpatch.pl --strict v5*

Looks like I accidentally placed some spaces into firewall.c. Otherwise 
the warnings it prints do not look critical.

> 
> 3. Kernel driver needs a rich error messages for each error condition
> and it should dump submitted job when firewall fails.  It's very tedious
> to re-add it all each time when something doesn't work.

Yes, that's true. Will take a look for v6.

> 
> 4. Previously firewall was using the client's class if is_valid_class
> wasn't specified in tegra_drm_client_ops, you changed it and now
> firewall fails for GR3D because it doesn't have the is_valid_class() set
> in the driver.  See [1].
> 
> 5. The CDMA class should be restored to the class of a previous gather
> after the wait command in submit_gathers() and not to the class of the
> client.  GR2D supports multiple classes.  See [1].

Will take a look at these two for v6.

> 
> [1]
> https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f
> 

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-26  2:45     ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-26  2:45 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/20/21 12:29 AM, Dmitry Osipenko wrote:
> 11.01.2021 15:59, Mikko Perttunen пишет:
>> Hi all,
>>
>> here's the fifth revision of the Host1x/TegraDRM UAPI proposal,
>> containing primarily small bug fixes. It has also been
>> rebased on top of recent linux-next.
>>
>> vaapi-tegra-driver has been updated to support the new UAPI
>> as well as Tegra186:
>>
>>    https://github.com/cyndis/vaapi-tegra-driver
>>
>> The `putsurface` program has been tested to work.
>>
>> The test suite for the new UAPI is available at
>> https://github.com/cyndis/uapi-test
>>
>> The series can be also found in
>> https://github.com/cyndis/linux/commits/work/host1x-uapi-v5.
>>
>> Older versions:
>> v1: https://www.spinics.net/lists/linux-tegra/msg51000.html
>> v2: https://www.spinics.net/lists/linux-tegra/msg53061.html
>> v3: https://www.spinics.net/lists/linux-tegra/msg54370.html
>> v4: https://www.spinics.net/lists/dri-devel/msg279897.html
>>
>> Thank you,
>> Mikko
> 
> The basic support for the v5 UAPI is added now to the Opentegra driver.
>   In overall UAPI works, but there are couple things that we need to
> improve, I'll focus on them here.
> 
> Problems
> ========
> 
> 1. The channel map/unmap API needs some more thought.
> 
> The main problem is a difficulty to track liveness of BOs and mappings.
>   The kernel driver refs BO for each mapping and userspace needs to track
> both BO and its mappings together, it's too easy to make mistake and
> leak BOs without noticing.
> 
> 2. Host1x sync point UAPI should not be used for tracking DRM jobs.  I
> remember we discussed this previously, but this pops up again and I
> don't remember where we ended previously.
> 
> This creates unnecessary complexity for userspace.  Userspace needs to
> go through a lot of chores just to get a sync point and then to manage
> it for the jobs.
> 
> Nothing stops two different channels to reuse a single sync point and
> use it for a job, fixing this will only add more complexity to the
> kernel driver instead of removing it.
> 
> 3. The signalling of DMA fences doesn't work properly in v5 apparently
> because of the host1x waiter bug.  I see that sync point interrupt
> happens, but waiter callback isn't invoked.
> 
> 4. The sync_file API is not very suitable for DRM purposes because of
> -EMFILE "Too many open files", which I saw while was running x11perf.
> It also adds complexity to userspace, instead of removing it.  This
> approach not suitable for DRM scheduler as well.
> 
> 5. Sync points have a dirty hardware state when allocated / requested.
> The special sync point reservation is meaningless in this case.
> 
> 6. I found that the need to chop cmdstream into gathers is a bit
> cumbersome for userspace of older SoCs which don't have h/w firewall.
> Can we support option where all commands are collected into a single
> dedicated cmdstream for a job?
> 
> Possible solutions for the above problems
> =========================================
> 
> 1. Stop to use concept of "channels". Switch to DRM context-only.
> 
> Each DRM context should get access to all engines once DRM context is
> created.  Think of it like "when DRM context is opened, it opens a
> channel for each engine".
> 
> Then each DRM context will get one instance of mapping per-engine for
> each BO.
> 
> enum tegra_engine {
> 	TEGRA_GR2D,
> 	TEGRA_GR3D,
> 	TEGRA_VIC,
> 	...
> 	NUM_ENGINES
> };
> 
> struct tegra_bo_mapping {
> 	dma_addr_t ioaddr;
> 	...
> };
> 
> struct tegra_bo {
> 	...
> 	struct tegra_bo_mapping *hw_maps[NUM_ENGINES];
> };
> 
> Instead of DRM_IOCTL_TEGRA_CHANNEL_MAP we should have
> DRM_IOCTL_TEGRA_GEM_MAP_TO_ENGINE, which will create a BO mapping for a
> specified h/w engine.
> 
> Once BO is closed, all its mappings should be closed too.  This way
> userspace doesn't need to track both BOs and mappings.
> 
> Everything that userspace needs to do is:
> 
> 	1) Open DRM context
> 	2) Create GEM
> 	3) Map GEM to required hardware engines
> 	4) Submit job that uses GEM handle
> 	5) Close GEM
> 
> If GEM wasn't mapped prior to the job's submission, then job will fail
> because kernel driver won't resolve the IO mapping of the GEM.
Perhaps we can instead change the reference counting such that if you 
close the GEM object, the mappings will be dropped as well automatically.

> 
> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
> increments.  The job's sync point will be allocated dynamically when job
> is submitted.  We will need a fag for the sync_incr and wait_syncpt
> commands, saying "it's a job's sync point increment/wait"

Negative. Like I have explained in previous discussions, with the 
current way the usage of hardware resources is much more deterministic 
and obvious. I disagree on the point that this is much more complicated 
for the userspace. Separating syncpoint and channel allocation is one of 
the primary motivations of this series for me.

> 
> 3. We should use dma-fence API directly and waiter-shim should be
> removed.  It's great that you're already working on this.
> 
> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
> interact with external non-DRM devices.  We should use DRM syncobj for
> everything related to DRM, it's a superior API over sync file, it's
> suitable for DRM scheduler.

Considering the issues with fileno limits, I suppose there is no other 
choice. Considering the recent NTSYNC proposal by Wine developers, maybe 
we should also have NTHANDLEs to get rid of restrictions of file 
descriptors. DRM syncobjs may have some advantages over sync files, but 
also disadvantages. They cannot be poll()ed, so they cannot be combined 
with waits for other resources.

I'll look into this for v6.

> 
> 5. The hardware state of sync points should be reset when sync point is
> requested, not when host1x driver is initialized.

This may be doable, but I don't think it is critical for this UAPI, so 
let's consider it after this series.

The userspace should anyway not be able to assume the initial value of 
the syncpoint upon allocation. The kernel should set it to some high 
value to catch any issues related to wraparound.

Also, this makes code more complicated since it now needs to ensure all 
waits on the syncpoint have completed before freeing the syncpoint, 
which can be nontrivial e.g. if the waiter is in a different virtual 
machine or some other device connected via PCIe (a real usecase).

> 
> 6. We will need to allocate a host1x BO for a job's cmdstream and add a
> restart command to the end of the job's stream.  CDMA will jump into the
> job's stream from push buffer.
> 
> We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
> gather should be inlined into job's main cmdstream.
> 
> This will remove a need to have a large push buffer that will easily
> overflow, it's a real problem and upstream driver even has a bug where
> it locks up on overflow.
> 
> How it will look from CDMA perspective:
> 
> PUSHBUF |
> ---------
> ...     |      | JOB   |
>          |      ---------       | JOB GATHER |
> RESTART	------> CMD    |       --------------
>          |      |GATHER -------> DATA        |
> ... <---------- RESTART|       |            |
>          |      |       |
> 

Let me check if I understood you correctly:
- You would like to have the job's cmdbuf have further GATHER opcodes 
that jump into smaller gathers?

I assume this is needed because currently WAITs are placed into the 
pushbuffer, so the job will take a lot of space in the pushbuffer if 
there are a lot of waits (and GATHERs in between these waits)?

If so, perhaps as a simpler alternative we could change the firewall to 
allow SETCLASS into HOST1X for waiting specifically, then userspace 
could just submit one big cmdbuf taking only little space in the 
pushbuffer? Although that would only allow direct ID/threshold waits.

In any case, it seems that this can be added in a later patch, so we 
should omit it from this series for simplicity. If it is impossible for 
the userspace to deal with it, we could disable the firewall 
temporarily, or implement the above change in the firewall.

> 
> What's missing
> ==============
> 
> 1. Explicit and implicit fencing isn't there yet, we need to support DRM
> scheduler for that.
> 
> 2. The "wait" command probably should be taught to take a syncobj handle
> in order to populate it with a dma-fence by kernel driver once job is
> submitted.  This will give us intermediate fences of a job and allow
> utilize the syncobj features like "wait until job is submitted to h/w
> before starting to wait for timeout", which will be needed by userspace
> when DRM scheduler will be supported.
> 
> Miscellaneous
> =============
> 
> 1. Please don't forget to bump driver version.  This is important for
> userspace.

Sure. I didn't do it this time since it's backwards compatible and it's 
easy to detect if the new UAPI is available by checking for /dev/host1x. 
I can bump it in v6 if necessary.

> 
> 2. Please use a proper kernel coding style, use checkpatch.
> 
>     # git format-patch -v5 ...
>     # ./scripts/checkpatch.pl --strict v5*

Looks like I accidentally placed some spaces into firewall.c. Otherwise 
the warnings it prints do not look critical.

> 
> 3. Kernel driver needs a rich error messages for each error condition
> and it should dump submitted job when firewall fails.  It's very tedious
> to re-add it all each time when something doesn't work.

Yes, that's true. Will take a look for v6.

> 
> 4. Previously firewall was using the client's class if is_valid_class
> wasn't specified in tegra_drm_client_ops, you changed it and now
> firewall fails for GR3D because it doesn't have the is_valid_class() set
> in the driver.  See [1].
> 
> 5. The CDMA class should be restored to the class of a previous gather
> after the wait command in submit_gathers() and not to the class of the
> client.  GR2D supports multiple classes.  See [1].

Will take a look at these two for v6.

> 
> [1]
> https://github.com/grate-driver/linux/commit/024cba369c9c0e2762e9890068ff9944cb10c44f
> 

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-01-26  2:45     ` Mikko Perttunen
@ 2021-01-27 21:20       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:20 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

26.01.2021 05:45, Mikko Perttunen пишет:
>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>> increments.  The job's sync point will be allocated dynamically when job
>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>> commands, saying "it's a job's sync point increment/wait"
> 
> Negative. Like I have explained in previous discussions, with the
> current way the usage of hardware resources is much more deterministic
> and obvious. I disagree on the point that this is much more complicated
> for the userspace. Separating syncpoint and channel allocation is one of
> the primary motivations of this series for me.

Sync points are a limited resource. The most sensible way to work around
it is to keep sync points within kernel as much as possible. This is not
only much simpler for user space, but also allows to utilize DRM API
properly without re-inventing what already exists and it's easier to
maintain hardware in a good state.

If you need to use a dedicated sync point for VMs, then just allocate
that special sync point and use it. But this sync point won't be used
for jobs tracking by kernel driver. Is there any problem with this?

The primary motivation for me is to get a practically usable kernel
driver for userspace.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-01-27 21:20       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:20 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

26.01.2021 05:45, Mikko Perttunen пишет:
>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>> increments.  The job's sync point will be allocated dynamically when job
>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>> commands, saying "it's a job's sync point increment/wait"
> 
> Negative. Like I have explained in previous discussions, with the
> current way the usage of hardware resources is much more deterministic
> and obvious. I disagree on the point that this is much more complicated
> for the userspace. Separating syncpoint and channel allocation is one of
> the primary motivations of this series for me.

Sync points are a limited resource. The most sensible way to work around
it is to keep sync points within kernel as much as possible. This is not
only much simpler for user space, but also allows to utilize DRM API
properly without re-inventing what already exists and it's easier to
maintain hardware in a good state.

If you need to use a dedicated sync point for VMs, then just allocate
that special sync point and use it. But this sync point won't be used
for jobs tracking by kernel driver. Is there any problem with this?

The primary motivation for me is to get a practically usable kernel
driver for userspace.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-26  2:45     ` Mikko Perttunen
@ 2021-01-27 21:26       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:26 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

26.01.2021 05:45, Mikko Perttunen пишет:
>> 5. The hardware state of sync points should be reset when sync point is
>> requested, not when host1x driver is initialized.
> 
> This may be doable, but I don't think it is critical for this UAPI, so
> let's consider it after this series.
> 
> The userspace should anyway not be able to assume the initial value of
> the syncpoint upon allocation. The kernel should set it to some high
> value to catch any issues related to wraparound.

This is critical because min != max when sync point is requested.

> Also, this makes code more complicated since it now needs to ensure all
> waits on the syncpoint have completed before freeing the syncpoint,
> which can be nontrivial e.g. if the waiter is in a different virtual
> machine or some other device connected via PCIe (a real usecase).

It sounds to me that these VM sync points should be treated very
separately from a generic sync points, don't you think so? Let's not mix
them and get the generic sync points usable first.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-27 21:26       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:26 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

26.01.2021 05:45, Mikko Perttunen пишет:
>> 5. The hardware state of sync points should be reset when sync point is
>> requested, not when host1x driver is initialized.
> 
> This may be doable, but I don't think it is critical for this UAPI, so
> let's consider it after this series.
> 
> The userspace should anyway not be able to assume the initial value of
> the syncpoint upon allocation. The kernel should set it to some high
> value to catch any issues related to wraparound.

This is critical because min != max when sync point is requested.

> Also, this makes code more complicated since it now needs to ensure all
> waits on the syncpoint have completed before freeing the syncpoint,
> which can be nontrivial e.g. if the waiter is in a different virtual
> machine or some other device connected via PCIe (a real usecase).

It sounds to me that these VM sync points should be treated very
separately from a generic sync points, don't you think so? Let's not mix
them and get the generic sync points usable first.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
  2021-01-26  2:45     ` Mikko Perttunen
@ 2021-01-27 21:35       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:35 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

26.01.2021 05:45, Mikko Perttunen пишет:
>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>> interact with external non-DRM devices.  We should use DRM syncobj for
>> everything related to DRM, it's a superior API over sync file, it's
>> suitable for DRM scheduler.
> 
> Considering the issues with fileno limits, I suppose there is no other
> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
> we should also have NTHANDLEs to get rid of restrictions of file
> descriptors.

It's odd to me that you trying to avoid the existing DRM API. This all
was solved in DRM long time ago and grate drivers have no problems with
using the DRM APIs. Even if something is really missing, then you should
add the missing features instead of re-inventing everything from scratch.

> DRM syncobjs may have some advantages over sync files, but
> also disadvantages. They cannot be poll()ed, so they cannot be combined
> with waits for other resources.

I'm not sure do you mean by "poll". Sync object supports polling very well.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
@ 2021-01-27 21:35       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:35 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

26.01.2021 05:45, Mikko Perttunen пишет:
>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>> interact with external non-DRM devices.  We should use DRM syncobj for
>> everything related to DRM, it's a superior API over sync file, it's
>> suitable for DRM scheduler.
> 
> Considering the issues with fileno limits, I suppose there is no other
> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
> we should also have NTHANDLEs to get rid of restrictions of file
> descriptors.

It's odd to me that you trying to avoid the existing DRM API. This all
was solved in DRM long time ago and grate drivers have no problems with
using the DRM APIs. Even if something is really missing, then you should
add the missing features instead of re-inventing everything from scratch.

> DRM syncobjs may have some advantages over sync files, but
> also disadvantages. They cannot be poll()ed, so they cannot be combined
> with waits for other resources.

I'm not sure do you mean by "poll". Sync object supports polling very well.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] support option where all commands are collected into a single,dedicated cmdstream
  2021-01-26  2:45     ` Mikko Perttunen
@ 2021-01-27 21:52       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:52 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

26.01.2021 05:45, Mikko Perttunen пишет:
>> 6. We will need to allocate a host1x BO for a job's cmdstream and add a
>> restart command to the end of the job's stream.  CDMA will jump into the
>> job's stream from push buffer.
>>
>> We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
>> gather should be inlined into job's main cmdstream.
>>
>> This will remove a need to have a large push buffer that will easily
>> overflow, it's a real problem and upstream driver even has a bug where
>> it locks up on overflow.
>>
>> How it will look from CDMA perspective:
>>
>> PUSHBUF |
>> ---------
>> ...     |      | JOB   |
>>          |      ---------       | JOB GATHER |
>> RESTART    ------> CMD    |       --------------
>>          |      |GATHER -------> DATA        |
>> ... <---------- RESTART|       |            |
>>          |      |       |
>>
> 
> Let me check if I understood you correctly:
> - You would like to have the job's cmdbuf have further GATHER opcodes
> that jump into smaller gathers?

I want jobs to be a self-contained. Instead of pushing commands to the
PB of a kernel driver, we'll push them to job's cmdstream. This means
that for each new job we'll need to allocate a host1x buffer.

> I assume this is needed because currently WAITs are placed into the
> pushbuffer, so the job will take a lot of space in the pushbuffer if
> there are a lot of waits (and GATHERs in between these waits)?

Yes, and with drm-sched we will just need to limit the max number of
jobs in the h/w queue (i.e. push buffer) and then push buffer won't ever
overflow. Problem solved.

> If so, perhaps as a simpler alternative we could change the firewall to
> allow SETCLASS into HOST1X for waiting specifically, then userspace
> could just submit one big cmdbuf taking only little space in the
> pushbuffer? Although that would only allow direct ID/threshold waits.

My solution doesn't require changes to firewall, not sure whether it's
easier.

> In any case, it seems that this can be added in a later patch, so we
> should omit it from this series for simplicity. If it is impossible for
> the userspace to deal with it, we could disable the firewall
> temporarily, or implement the above change in the firewall.

I won't be able to test UAPI fully until all features are at least on
par with the experimental driver of grate kernel.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] support option where all commands are collected into a single,dedicated cmdstream
@ 2021-01-27 21:52       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 21:52 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

26.01.2021 05:45, Mikko Perttunen пишет:
>> 6. We will need to allocate a host1x BO for a job's cmdstream and add a
>> restart command to the end of the job's stream.  CDMA will jump into the
>> job's stream from push buffer.
>>
>> We could add a flag for that to drm_tegra_submit_cmd_gather, saying that
>> gather should be inlined into job's main cmdstream.
>>
>> This will remove a need to have a large push buffer that will easily
>> overflow, it's a real problem and upstream driver even has a bug where
>> it locks up on overflow.
>>
>> How it will look from CDMA perspective:
>>
>> PUSHBUF |
>> ---------
>> ...     |      | JOB   |
>>          |      ---------       | JOB GATHER |
>> RESTART    ------> CMD    |       --------------
>>          |      |GATHER -------> DATA        |
>> ... <---------- RESTART|       |            |
>>          |      |       |
>>
> 
> Let me check if I understood you correctly:
> - You would like to have the job's cmdbuf have further GATHER opcodes
> that jump into smaller gathers?

I want jobs to be a self-contained. Instead of pushing commands to the
PB of a kernel driver, we'll push them to job's cmdstream. This means
that for each new job we'll need to allocate a host1x buffer.

> I assume this is needed because currently WAITs are placed into the
> pushbuffer, so the job will take a lot of space in the pushbuffer if
> there are a lot of waits (and GATHERs in between these waits)?

Yes, and with drm-sched we will just need to limit the max number of
jobs in the h/w queue (i.e. push buffer) and then push buffer won't ever
overflow. Problem solved.

> If so, perhaps as a simpler alternative we could change the firewall to
> allow SETCLASS into HOST1X for waiting specifically, then userspace
> could just submit one big cmdbuf taking only little space in the
> pushbuffer? Although that would only allow direct ID/threshold waits.

My solution doesn't require changes to firewall, not sure whether it's
easier.

> In any case, it seems that this can be added in a later patch, so we
> should omit it from this series for simplicity. If it is impossible for
> the userspace to deal with it, we could disable the firewall
> temporarily, or implement the above change in the firewall.

I won't be able to test UAPI fully until all features are at least on
par with the experimental driver of grate kernel.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
  2021-01-27 21:35       ` Dmitry Osipenko
@ 2021-01-27 21:53         ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-27 21:53 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/27/21 11:35 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>>> interact with external non-DRM devices.  We should use DRM syncobj for
>>> everything related to DRM, it's a superior API over sync file, it's
>>> suitable for DRM scheduler.
>>
>> Considering the issues with fileno limits, I suppose there is no other
>> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
>> we should also have NTHANDLEs to get rid of restrictions of file
>> descriptors.
> 
> It's odd to me that you trying to avoid the existing DRM API. This all
> was solved in DRM long time ago and grate drivers have no problems with
> using the DRM APIs. Even if something is really missing, then you should
> add the missing features instead of re-inventing everything from scratch.
> 

DRM is only one of many subsystems that will have to deal with 
syncpoints, so I have wanted to have a central solution instead of 
reimplementing the same stuff everywhere. sync_files seem like the 
"missing feature", but they are difficult to use it with the fileno 
limits. But as has been said many times, they are intended only to 
transfer fences between the implementations in individual drivers, so I 
guess I will have to abandon this dream.

>> DRM syncobjs may have some advantages over sync files, but
>> also disadvantages. They cannot be poll()ed, so they cannot be combined
>> with waits for other resources.
> 
> I'm not sure do you mean by "poll". Sync object supports polling very well.
> 

I mean the poll/select etc. series of functions, which wait for file 
descriptors to become ready. If there's some trick that allows syncobjs 
to be used for that, then please tell.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
@ 2021-01-27 21:53         ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-27 21:53 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/27/21 11:35 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>>> interact with external non-DRM devices.  We should use DRM syncobj for
>>> everything related to DRM, it's a superior API over sync file, it's
>>> suitable for DRM scheduler.
>>
>> Considering the issues with fileno limits, I suppose there is no other
>> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
>> we should also have NTHANDLEs to get rid of restrictions of file
>> descriptors.
> 
> It's odd to me that you trying to avoid the existing DRM API. This all
> was solved in DRM long time ago and grate drivers have no problems with
> using the DRM APIs. Even if something is really missing, then you should
> add the missing features instead of re-inventing everything from scratch.
> 

DRM is only one of many subsystems that will have to deal with 
syncpoints, so I have wanted to have a central solution instead of 
reimplementing the same stuff everywhere. sync_files seem like the 
"missing feature", but they are difficult to use it with the fileno 
limits. But as has been said many times, they are intended only to 
transfer fences between the implementations in individual drivers, so I 
guess I will have to abandon this dream.

>> DRM syncobjs may have some advantages over sync files, but
>> also disadvantages. They cannot be poll()ed, so they cannot be combined
>> with waits for other resources.
> 
> I'm not sure do you mean by "poll". Sync object supports polling very well.
> 

I mean the poll/select etc. series of functions, which wait for file 
descriptors to become ready. If there's some trick that allows syncobjs 
to be used for that, then please tell.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-27 21:26       ` Dmitry Osipenko
@ 2021-01-27 21:57         ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-27 21:57 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman



On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 5. The hardware state of sync points should be reset when sync point is
>>> requested, not when host1x driver is initialized.
>>
>> This may be doable, but I don't think it is critical for this UAPI, so
>> let's consider it after this series.
>>
>> The userspace should anyway not be able to assume the initial value of
>> the syncpoint upon allocation. The kernel should set it to some high
>> value to catch any issues related to wraparound.
> 
> This is critical because min != max when sync point is requested.

That I would just consider a bug, and it can be fixed. But it's 
orthogonal to whether the value gets reset every time the syncpoint is 
allocated.

> 
>> Also, this makes code more complicated since it now needs to ensure all
>> waits on the syncpoint have completed before freeing the syncpoint,
>> which can be nontrivial e.g. if the waiter is in a different virtual
>> machine or some other device connected via PCIe (a real usecase).
> 
> It sounds to me that these VM sync points should be treated very
> separately from a generic sync points, don't you think so? Let's not mix
> them and get the generic sync points usable first.
> 

They are not special in any way, I'm just referring to cases where the 
waiter (consumer) is remote. The allocator of the syncpoint (producer) 
doesn't necessarily even need to know about it. The same concern is 
applicable within a single VM, or single application as well. Just 
putting out the point that this is something that needs to be taken care 
of if we were to reset the value.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-27 21:57         ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-27 21:57 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel



On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 5. The hardware state of sync points should be reset when sync point is
>>> requested, not when host1x driver is initialized.
>>
>> This may be doable, but I don't think it is critical for this UAPI, so
>> let's consider it after this series.
>>
>> The userspace should anyway not be able to assume the initial value of
>> the syncpoint upon allocation. The kernel should set it to some high
>> value to catch any issues related to wraparound.
> 
> This is critical because min != max when sync point is requested.

That I would just consider a bug, and it can be fixed. But it's 
orthogonal to whether the value gets reset every time the syncpoint is 
allocated.

> 
>> Also, this makes code more complicated since it now needs to ensure all
>> waits on the syncpoint have completed before freeing the syncpoint,
>> which can be nontrivial e.g. if the waiter is in a different virtual
>> machine or some other device connected via PCIe (a real usecase).
> 
> It sounds to me that these VM sync points should be treated very
> separately from a generic sync points, don't you think so? Let's not mix
> them and get the generic sync points usable first.
> 

They are not special in any way, I'm just referring to cases where the 
waiter (consumer) is remote. The allocator of the syncpoint (producer) 
doesn't necessarily even need to know about it. The same concern is 
applicable within a single VM, or single application as well. Just 
putting out the point that this is something that needs to be taken care 
of if we were to reset the value.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-27 21:57         ` Mikko Perttunen
@ 2021-01-27 22:06           ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 22:06 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

28.01.2021 00:57, Mikko Perttunen пишет:
> 
> 
> On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>> 5. The hardware state of sync points should be reset when sync point is
>>>> requested, not when host1x driver is initialized.
>>>
>>> This may be doable, but I don't think it is critical for this UAPI, so
>>> let's consider it after this series.
>>>
>>> The userspace should anyway not be able to assume the initial value of
>>> the syncpoint upon allocation. The kernel should set it to some high
>>> value to catch any issues related to wraparound.
>>
>> This is critical because min != max when sync point is requested.
> 
> That I would just consider a bug, and it can be fixed. But it's
> orthogonal to whether the value gets reset every time the syncpoint is
> allocated.
> 
>>
>>> Also, this makes code more complicated since it now needs to ensure all
>>> waits on the syncpoint have completed before freeing the syncpoint,
>>> which can be nontrivial e.g. if the waiter is in a different virtual
>>> machine or some other device connected via PCIe (a real usecase).
>>
>> It sounds to me that these VM sync points should be treated very
>> separately from a generic sync points, don't you think so? Let's not mix
>> them and get the generic sync points usable first.
>>
> 
> They are not special in any way, I'm just referring to cases where the
> waiter (consumer) is remote. The allocator of the syncpoint (producer)
> doesn't necessarily even need to know about it. The same concern is
> applicable within a single VM, or single application as well. Just
> putting out the point that this is something that needs to be taken care
> of if we were to reset the value.

Will kernel driver know that it deals with a VM sync point?

Will it be possible to get a non-VM sync point explicitly?

If driver knows that it deals with a VM sync point, then we can treat it
specially, avoiding the reset and etc.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-27 22:06           ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 22:06 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

28.01.2021 00:57, Mikko Perttunen пишет:
> 
> 
> On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>> 5. The hardware state of sync points should be reset when sync point is
>>>> requested, not when host1x driver is initialized.
>>>
>>> This may be doable, but I don't think it is critical for this UAPI, so
>>> let's consider it after this series.
>>>
>>> The userspace should anyway not be able to assume the initial value of
>>> the syncpoint upon allocation. The kernel should set it to some high
>>> value to catch any issues related to wraparound.
>>
>> This is critical because min != max when sync point is requested.
> 
> That I would just consider a bug, and it can be fixed. But it's
> orthogonal to whether the value gets reset every time the syncpoint is
> allocated.
> 
>>
>>> Also, this makes code more complicated since it now needs to ensure all
>>> waits on the syncpoint have completed before freeing the syncpoint,
>>> which can be nontrivial e.g. if the waiter is in a different virtual
>>> machine or some other device connected via PCIe (a real usecase).
>>
>> It sounds to me that these VM sync points should be treated very
>> separately from a generic sync points, don't you think so? Let's not mix
>> them and get the generic sync points usable first.
>>
> 
> They are not special in any way, I'm just referring to cases where the
> waiter (consumer) is remote. The allocator of the syncpoint (producer)
> doesn't necessarily even need to know about it. The same concern is
> applicable within a single VM, or single application as well. Just
> putting out the point that this is something that needs to be taken care
> of if we were to reset the value.

Will kernel driver know that it deals with a VM sync point?

Will it be possible to get a non-VM sync point explicitly?

If driver knows that it deals with a VM sync point, then we can treat it
specially, avoiding the reset and etc.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
  2021-01-27 21:53         ` Mikko Perttunen
@ 2021-01-27 22:26           ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 22:26 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

28.01.2021 00:53, Mikko Perttunen пишет:
> On 1/27/21 11:35 PM, Dmitry Osipenko wrote:
>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>>>> interact with external non-DRM devices.  We should use DRM syncobj for
>>>> everything related to DRM, it's a superior API over sync file, it's
>>>> suitable for DRM scheduler.
>>>
>>> Considering the issues with fileno limits, I suppose there is no other
>>> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
>>> we should also have NTHANDLEs to get rid of restrictions of file
>>> descriptors.
>>
>> It's odd to me that you trying to avoid the existing DRM API. This all
>> was solved in DRM long time ago and grate drivers have no problems with
>> using the DRM APIs. Even if something is really missing, then you should
>> add the missing features instead of re-inventing everything from scratch.
>>
> 
> DRM is only one of many subsystems that will have to deal with
> syncpoints, so I have wanted to have a central solution instead of
> reimplementing the same stuff everywhere. sync_files seem like the
> "missing feature", but they are difficult to use it with the fileno
> limits. But as has been said many times, they are intended only to
> transfer fences between the implementations in individual drivers, so I
> guess I will have to abandon this dream.

Let's focus on finishing the basics first, using what we already have.
Sync file + syncobj should be good enough for what we need right now.

>>> DRM syncobjs may have some advantages over sync files, but
>>> also disadvantages. They cannot be poll()ed, so they cannot be combined
>>> with waits for other resources.
>>
>> I'm not sure do you mean by "poll". Sync object supports polling very
>> well.
>>
> 
> I mean the poll/select etc. series of functions, which wait for file
> descriptors to become ready. If there's some trick that allows syncobjs
> to be used for that, then please tell.

Please explain in details what problem you need to solve, give an example.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] sync_file API is not very suitable for DRM
@ 2021-01-27 22:26           ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-27 22:26 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

28.01.2021 00:53, Mikko Perttunen пишет:
> On 1/27/21 11:35 PM, Dmitry Osipenko wrote:
>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>> 4. Sync file shouldn't be needed for the part of DRM API which doesn't
>>>> interact with external non-DRM devices.  We should use DRM syncobj for
>>>> everything related to DRM, it's a superior API over sync file, it's
>>>> suitable for DRM scheduler.
>>>
>>> Considering the issues with fileno limits, I suppose there is no other
>>> choice. Considering the recent NTSYNC proposal by Wine developers, maybe
>>> we should also have NTHANDLEs to get rid of restrictions of file
>>> descriptors.
>>
>> It's odd to me that you trying to avoid the existing DRM API. This all
>> was solved in DRM long time ago and grate drivers have no problems with
>> using the DRM APIs. Even if something is really missing, then you should
>> add the missing features instead of re-inventing everything from scratch.
>>
> 
> DRM is only one of many subsystems that will have to deal with
> syncpoints, so I have wanted to have a central solution instead of
> reimplementing the same stuff everywhere. sync_files seem like the
> "missing feature", but they are difficult to use it with the fileno
> limits. But as has been said many times, they are intended only to
> transfer fences between the implementations in individual drivers, so I
> guess I will have to abandon this dream.

Let's focus on finishing the basics first, using what we already have.
Sync file + syncobj should be good enough for what we need right now.

>>> DRM syncobjs may have some advantages over sync files, but
>>> also disadvantages. They cannot be poll()ed, so they cannot be combined
>>> with waits for other resources.
>>
>> I'm not sure do you mean by "poll". Sync object supports polling very
>> well.
>>
> 
> I mean the poll/select etc. series of functions, which wait for file
> descriptors to become ready. If there's some trick that allows syncobjs
> to be used for that, then please tell.

Please explain in details what problem you need to solve, give an example.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-01-27 21:20       ` Dmitry Osipenko
@ 2021-01-28 11:08         ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-28 11:08 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>> increments.  The job's sync point will be allocated dynamically when job
>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>> commands, saying "it's a job's sync point increment/wait"
>>
>> Negative. Like I have explained in previous discussions, with the
>> current way the usage of hardware resources is much more deterministic
>> and obvious. I disagree on the point that this is much more complicated
>> for the userspace. Separating syncpoint and channel allocation is one of
>> the primary motivations of this series for me.
> 
> Sync points are a limited resource. The most sensible way to work around
> it is to keep sync points within kernel as much as possible. This is not
> only much simpler for user space, but also allows to utilize DRM API
> properly without re-inventing what already exists and it's easier to
> maintain hardware in a good state.

I've spent the last few years designing for automotive and industrial 
products, where we don't want to at runtime notice that the system is 
out of free syncpoints and because of that we can only process the next 
camera frame in a second or two instead of 16 milliseconds. We need to 
know that once we have allocated the resource, it is there. The newer 
chips are also designed to support this.

Considering Linux is increasingly being used for such applications, and 
they are important target markets for NVIDIA, these need to be supported.

Because of the above design constraint the userspace software that runs 
in these environments also expects resources to be allocated up front. 
This isn't a matter of having to design that software according to what 
kind of allocation API we decide do at Linux level -- it's no use 
designing for dynamic allocation if it leads to you not meeting the 
safety requirement of needing to ensure you have all resources allocated 
up front.

This isn't a good design feature just in a car, but in anything that 
needs to be reliable. However, it does pose some tradeoffs, and if you 
think that running out of syncpoints on T20-T114 because of upfront 
allocation is an actual problem, I'm not opposed to having both options 
available.

> 
> If you need to use a dedicated sync point for VMs, then just allocate
> that special sync point and use it. But this sync point won't be used
> for jobs tracking by kernel driver. Is there any problem with this?

In addition to above, it would increase the number of syncpoints 
required. The number of syncpoints supported by hardware has been 
calculated for specific use cases, and increasing the number of required 
syncpoints risks not being able to support those use cases.

> 
> The primary motivation for me is to get a practically usable kernel
> driver for userspace.
> 

Me too. For the traditional "tablet chips" the task is quite well 
defined and supported. But my goal is to also get rid of the jank in 
downstream and allow fully-featured use of Tegra devices on upstream 
kernels and for that, the driver needs to be usable for the whole range 
of use cases.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-01-28 11:08         ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-28 11:08 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
> 26.01.2021 05:45, Mikko Perttunen пишет:
>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>> increments.  The job's sync point will be allocated dynamically when job
>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>> commands, saying "it's a job's sync point increment/wait"
>>
>> Negative. Like I have explained in previous discussions, with the
>> current way the usage of hardware resources is much more deterministic
>> and obvious. I disagree on the point that this is much more complicated
>> for the userspace. Separating syncpoint and channel allocation is one of
>> the primary motivations of this series for me.
> 
> Sync points are a limited resource. The most sensible way to work around
> it is to keep sync points within kernel as much as possible. This is not
> only much simpler for user space, but also allows to utilize DRM API
> properly without re-inventing what already exists and it's easier to
> maintain hardware in a good state.

I've spent the last few years designing for automotive and industrial 
products, where we don't want to at runtime notice that the system is 
out of free syncpoints and because of that we can only process the next 
camera frame in a second or two instead of 16 milliseconds. We need to 
know that once we have allocated the resource, it is there. The newer 
chips are also designed to support this.

Considering Linux is increasingly being used for such applications, and 
they are important target markets for NVIDIA, these need to be supported.

Because of the above design constraint the userspace software that runs 
in these environments also expects resources to be allocated up front. 
This isn't a matter of having to design that software according to what 
kind of allocation API we decide do at Linux level -- it's no use 
designing for dynamic allocation if it leads to you not meeting the 
safety requirement of needing to ensure you have all resources allocated 
up front.

This isn't a good design feature just in a car, but in anything that 
needs to be reliable. However, it does pose some tradeoffs, and if you 
think that running out of syncpoints on T20-T114 because of upfront 
allocation is an actual problem, I'm not opposed to having both options 
available.

> 
> If you need to use a dedicated sync point for VMs, then just allocate
> that special sync point and use it. But this sync point won't be used
> for jobs tracking by kernel driver. Is there any problem with this?

In addition to above, it would increase the number of syncpoints 
required. The number of syncpoints supported by hardware has been 
calculated for specific use cases, and increasing the number of required 
syncpoints risks not being able to support those use cases.

> 
> The primary motivation for me is to get a practically usable kernel
> driver for userspace.
> 

Me too. For the traditional "tablet chips" the task is quite well 
defined and supported. But my goal is to also get rid of the jank in 
downstream and allow fully-featured use of Tegra devices on upstream 
kernels and for that, the driver needs to be usable for the whole range 
of use cases.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
  2021-01-27 22:06           ` Dmitry Osipenko
@ 2021-01-28 11:46             ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-28 11:46 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, dri-devel, talho, bhuntsman

On 1/28/21 12:06 AM, Dmitry Osipenko wrote:
> 28.01.2021 00:57, Mikko Perttunen пишет:
>>
>>
>> On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>> 5. The hardware state of sync points should be reset when sync point is
>>>>> requested, not when host1x driver is initialized.
>>>>
>>>> This may be doable, but I don't think it is critical for this UAPI, so
>>>> let's consider it after this series.
>>>>
>>>> The userspace should anyway not be able to assume the initial value of
>>>> the syncpoint upon allocation. The kernel should set it to some high
>>>> value to catch any issues related to wraparound.
>>>
>>> This is critical because min != max when sync point is requested.
>>
>> That I would just consider a bug, and it can be fixed. But it's
>> orthogonal to whether the value gets reset every time the syncpoint is
>> allocated.
>>
>>>
>>>> Also, this makes code more complicated since it now needs to ensure all
>>>> waits on the syncpoint have completed before freeing the syncpoint,
>>>> which can be nontrivial e.g. if the waiter is in a different virtual
>>>> machine or some other device connected via PCIe (a real usecase).
>>>
>>> It sounds to me that these VM sync points should be treated very
>>> separately from a generic sync points, don't you think so? Let's not mix
>>> them and get the generic sync points usable first.
>>>
>>
>> They are not special in any way, I'm just referring to cases where the
>> waiter (consumer) is remote. The allocator of the syncpoint (producer)
>> doesn't necessarily even need to know about it. The same concern is
>> applicable within a single VM, or single application as well. Just
>> putting out the point that this is something that needs to be taken care
>> of if we were to reset the value.
> 
> Will kernel driver know that it deals with a VM sync point? >
> Will it be possible to get a non-VM sync point explicitly?
> 
> If driver knows that it deals with a VM sync point, then we can treat it
> specially, avoiding the reset and etc.
> 

There is no distinction between a "VM syncpoint" and a "non-VM 
syncpoint". To provide an example on the issue, consider we have VM1 and 
VM2. VM1 is running some camera software that produces frames. VM2 runs 
some analysis software that consumes those frames through shared memory. 
In between there is some software that takes the postfences of the 
camera software and transmits them to the analysis software to be used 
as prefences. Only this transmitting software needs to know anything 
about multiple VMs being in use.

At any time, if we want to reset the value of the syncpoint in question, 
we must ensure that all fences waiting on that syncpoint have observed 
the fence's threshold first.

Consider an interleaving like this:

VM1 (Camera)				VM2 (Analysis)
-------------------------------------------------------
Send postfence (threshold=X)
					Recv postfence (threshold=X)
Increment syncpoint value to X
Free syncpoint
Reset syncpoint value to Y
					Wait on postfence
-------------------------------------------------------

Now depending on the relative values of X and Y, either VM2 progresses 
(correctly), or gets stuck. If we didn't reset the syncpoint, the race 
could not occur (unless VM1 managed to increment the syncpoint 2^31 
times before VM2's wait starts, which is very unrealistic).

We can remove "VM1" and "VM2" everywhere here, and just consider two 
applications in one VM, too, or two parts of one application. Within one 
VM the issue is of course easier because the driver can have knowledge 
about fences and solve the race internally, but I'd always prefer not 
having such special cases.

Now, admittedly this is probably not a huge problem unless we are 
freeing syncpoints all the time, but nevertheless something to consider.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x/TegraDRM UAPI
@ 2021-01-28 11:46             ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-01-28 11:46 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh,
	airlied, daniel
  Cc: linux-tegra, talho, bhuntsman, dri-devel

On 1/28/21 12:06 AM, Dmitry Osipenko wrote:
> 28.01.2021 00:57, Mikko Perttunen пишет:
>>
>>
>> On 1/27/21 11:26 PM, Dmitry Osipenko wrote:
>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>> 5. The hardware state of sync points should be reset when sync point is
>>>>> requested, not when host1x driver is initialized.
>>>>
>>>> This may be doable, but I don't think it is critical for this UAPI, so
>>>> let's consider it after this series.
>>>>
>>>> The userspace should anyway not be able to assume the initial value of
>>>> the syncpoint upon allocation. The kernel should set it to some high
>>>> value to catch any issues related to wraparound.
>>>
>>> This is critical because min != max when sync point is requested.
>>
>> That I would just consider a bug, and it can be fixed. But it's
>> orthogonal to whether the value gets reset every time the syncpoint is
>> allocated.
>>
>>>
>>>> Also, this makes code more complicated since it now needs to ensure all
>>>> waits on the syncpoint have completed before freeing the syncpoint,
>>>> which can be nontrivial e.g. if the waiter is in a different virtual
>>>> machine or some other device connected via PCIe (a real usecase).
>>>
>>> It sounds to me that these VM sync points should be treated very
>>> separately from a generic sync points, don't you think so? Let's not mix
>>> them and get the generic sync points usable first.
>>>
>>
>> They are not special in any way, I'm just referring to cases where the
>> waiter (consumer) is remote. The allocator of the syncpoint (producer)
>> doesn't necessarily even need to know about it. The same concern is
>> applicable within a single VM, or single application as well. Just
>> putting out the point that this is something that needs to be taken care
>> of if we were to reset the value.
> 
> Will kernel driver know that it deals with a VM sync point? >
> Will it be possible to get a non-VM sync point explicitly?
> 
> If driver knows that it deals with a VM sync point, then we can treat it
> specially, avoiding the reset and etc.
> 

There is no distinction between a "VM syncpoint" and a "non-VM 
syncpoint". To provide an example on the issue, consider we have VM1 and 
VM2. VM1 is running some camera software that produces frames. VM2 runs 
some analysis software that consumes those frames through shared memory. 
In between there is some software that takes the postfences of the 
camera software and transmits them to the analysis software to be used 
as prefences. Only this transmitting software needs to know anything 
about multiple VMs being in use.

At any time, if we want to reset the value of the syncpoint in question, 
we must ensure that all fences waiting on that syncpoint have observed 
the fence's threshold first.

Consider an interleaving like this:

VM1 (Camera)				VM2 (Analysis)
-------------------------------------------------------
Send postfence (threshold=X)
					Recv postfence (threshold=X)
Increment syncpoint value to X
Free syncpoint
Reset syncpoint value to Y
					Wait on postfence
-------------------------------------------------------

Now depending on the relative values of X and Y, either VM2 progresses 
(correctly), or gets stuck. If we didn't reset the syncpoint, the race 
could not occur (unless VM1 managed to increment the syncpoint 2^31 
times before VM2's wait starts, which is very unrealistic).

We can remove "VM1" and "VM2" everywhere here, and just consider two 
applications in one VM, too, or two parts of one application. Within one 
VM the issue is of course easier because the driver can have knowledge 
about fences and solve the race internally, but I'd always prefer not 
having such special cases.

Now, admittedly this is probably not a huge problem unless we are 
freeing syncpoints all the time, but nevertheless something to consider.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-01-28 11:08         ` Mikko Perttunen
@ 2021-01-28 16:58           ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-01-28 16:58 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Dmitry Osipenko, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 3050 bytes --]

On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
> > 26.01.2021 05:45, Mikko Perttunen пишет:
> > > > 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
> > > > increments.  The job's sync point will be allocated dynamically when job
> > > > is submitted.  We will need a fag for the sync_incr and wait_syncpt
> > > > commands, saying "it's a job's sync point increment/wait"
> > > 
> > > Negative. Like I have explained in previous discussions, with the
> > > current way the usage of hardware resources is much more deterministic
> > > and obvious. I disagree on the point that this is much more complicated
> > > for the userspace. Separating syncpoint and channel allocation is one of
> > > the primary motivations of this series for me.
> > 
> > Sync points are a limited resource. The most sensible way to work around
> > it is to keep sync points within kernel as much as possible. This is not
> > only much simpler for user space, but also allows to utilize DRM API
> > properly without re-inventing what already exists and it's easier to
> > maintain hardware in a good state.
> 
> I've spent the last few years designing for automotive and industrial
> products, where we don't want to at runtime notice that the system is out of
> free syncpoints and because of that we can only process the next camera
> frame in a second or two instead of 16 milliseconds. We need to know that
> once we have allocated the resource, it is there. The newer chips are also
> designed to support this.
> 
> Considering Linux is increasingly being used for such applications, and they
> are important target markets for NVIDIA, these need to be supported.
> 
> Because of the above design constraint the userspace software that runs in
> these environments also expects resources to be allocated up front. This
> isn't a matter of having to design that software according to what kind of
> allocation API we decide do at Linux level -- it's no use designing for
> dynamic allocation if it leads to you not meeting the safety requirement of
> needing to ensure you have all resources allocated up front.
> 
> This isn't a good design feature just in a car, but in anything that needs
> to be reliable. However, it does pose some tradeoffs, and if you think that
> running out of syncpoints on T20-T114 because of upfront allocation is an
> actual problem, I'm not opposed to having both options available.

I think that's a fair point. I don't see why we can't support both
implicit and explicit syncpoint requests. If most of the use-cases can
work with implicit syncpoints and let the kernel handle all aspects of
it, that's great. But there's no reason we can't provide more explicit
controls for cases where it's really important that all the resources
are allocated upfront.

Ultimately this is very specific to each use-case, so I think having
both options will provide us with the most flexibility.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-01-28 16:58           ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-01-28 16:58 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Dmitry Osipenko, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 3050 bytes --]

On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
> > 26.01.2021 05:45, Mikko Perttunen пишет:
> > > > 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
> > > > increments.  The job's sync point will be allocated dynamically when job
> > > > is submitted.  We will need a fag for the sync_incr and wait_syncpt
> > > > commands, saying "it's a job's sync point increment/wait"
> > > 
> > > Negative. Like I have explained in previous discussions, with the
> > > current way the usage of hardware resources is much more deterministic
> > > and obvious. I disagree on the point that this is much more complicated
> > > for the userspace. Separating syncpoint and channel allocation is one of
> > > the primary motivations of this series for me.
> > 
> > Sync points are a limited resource. The most sensible way to work around
> > it is to keep sync points within kernel as much as possible. This is not
> > only much simpler for user space, but also allows to utilize DRM API
> > properly without re-inventing what already exists and it's easier to
> > maintain hardware in a good state.
> 
> I've spent the last few years designing for automotive and industrial
> products, where we don't want to at runtime notice that the system is out of
> free syncpoints and because of that we can only process the next camera
> frame in a second or two instead of 16 milliseconds. We need to know that
> once we have allocated the resource, it is there. The newer chips are also
> designed to support this.
> 
> Considering Linux is increasingly being used for such applications, and they
> are important target markets for NVIDIA, these need to be supported.
> 
> Because of the above design constraint the userspace software that runs in
> these environments also expects resources to be allocated up front. This
> isn't a matter of having to design that software according to what kind of
> allocation API we decide do at Linux level -- it's no use designing for
> dynamic allocation if it leads to you not meeting the safety requirement of
> needing to ensure you have all resources allocated up front.
> 
> This isn't a good design feature just in a car, but in anything that needs
> to be reliable. However, it does pose some tradeoffs, and if you think that
> running out of syncpoints on T20-T114 because of upfront allocation is an
> actual problem, I'm not opposed to having both options available.

I think that's a fair point. I don't see why we can't support both
implicit and explicit syncpoint requests. If most of the use-cases can
work with implicit syncpoints and let the kernel handle all aspects of
it, that's great. But there's no reason we can't provide more explicit
controls for cases where it's really important that all the resources
are allocated upfront.

Ultimately this is very specific to each use-case, so I think having
both options will provide us with the most flexibility.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-01-28 16:58           ` Thierry Reding
@ 2021-01-29 17:30             ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-29 17:30 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

28.01.2021 19:58, Thierry Reding пишет:
> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>> increments.  The job's sync point will be allocated dynamically when job
>>>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>
>>>> Negative. Like I have explained in previous discussions, with the
>>>> current way the usage of hardware resources is much more deterministic
>>>> and obvious. I disagree on the point that this is much more complicated
>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>> the primary motivations of this series for me.
>>>
>>> Sync points are a limited resource. The most sensible way to work around
>>> it is to keep sync points within kernel as much as possible. This is not
>>> only much simpler for user space, but also allows to utilize DRM API
>>> properly without re-inventing what already exists and it's easier to
>>> maintain hardware in a good state.
>>
>> I've spent the last few years designing for automotive and industrial
>> products, where we don't want to at runtime notice that the system is out of
>> free syncpoints and because of that we can only process the next camera
>> frame in a second or two instead of 16 milliseconds. We need to know that
>> once we have allocated the resource, it is there. The newer chips are also
>> designed to support this.
>>
>> Considering Linux is increasingly being used for such applications, and they
>> are important target markets for NVIDIA, these need to be supported.
>>
>> Because of the above design constraint the userspace software that runs in
>> these environments also expects resources to be allocated up front. This
>> isn't a matter of having to design that software according to what kind of
>> allocation API we decide do at Linux level -- it's no use designing for
>> dynamic allocation if it leads to you not meeting the safety requirement of
>> needing to ensure you have all resources allocated up front.
>>
>> This isn't a good design feature just in a car, but in anything that needs
>> to be reliable. However, it does pose some tradeoffs, and if you think that
>> running out of syncpoints on T20-T114 because of upfront allocation is an
>> actual problem, I'm not opposed to having both options available.

The word "reliable" contradicts to the error-prone approach. On the
other hand, it should be very difficult to change the stubborn
downstream firmware and we want driver to be usable as much as possible,
so in reality not much can be done about it.

> I think that's a fair point. I don't see why we can't support both
> implicit and explicit syncpoint requests. If most of the use-cases can
> work with implicit syncpoints and let the kernel handle all aspects of
> it, that's great. But there's no reason we can't provide more explicit
> controls for cases where it's really important that all the resources
> are allocated upfront.
> 
> Ultimately this is very specific to each use-case, so I think having
> both options will provide us with the most flexibility.
It should be fine to support both. This will add complexity to the
driver, thus it needs to be done wisely.

I'll need more time to think about it.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-01-29 17:30             ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-01-29 17:30 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen

28.01.2021 19:58, Thierry Reding пишет:
> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>> increments.  The job's sync point will be allocated dynamically when job
>>>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>
>>>> Negative. Like I have explained in previous discussions, with the
>>>> current way the usage of hardware resources is much more deterministic
>>>> and obvious. I disagree on the point that this is much more complicated
>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>> the primary motivations of this series for me.
>>>
>>> Sync points are a limited resource. The most sensible way to work around
>>> it is to keep sync points within kernel as much as possible. This is not
>>> only much simpler for user space, but also allows to utilize DRM API
>>> properly without re-inventing what already exists and it's easier to
>>> maintain hardware in a good state.
>>
>> I've spent the last few years designing for automotive and industrial
>> products, where we don't want to at runtime notice that the system is out of
>> free syncpoints and because of that we can only process the next camera
>> frame in a second or two instead of 16 milliseconds. We need to know that
>> once we have allocated the resource, it is there. The newer chips are also
>> designed to support this.
>>
>> Considering Linux is increasingly being used for such applications, and they
>> are important target markets for NVIDIA, these need to be supported.
>>
>> Because of the above design constraint the userspace software that runs in
>> these environments also expects resources to be allocated up front. This
>> isn't a matter of having to design that software according to what kind of
>> allocation API we decide do at Linux level -- it's no use designing for
>> dynamic allocation if it leads to you not meeting the safety requirement of
>> needing to ensure you have all resources allocated up front.
>>
>> This isn't a good design feature just in a car, but in anything that needs
>> to be reliable. However, it does pose some tradeoffs, and if you think that
>> running out of syncpoints on T20-T114 because of upfront allocation is an
>> actual problem, I'm not opposed to having both options available.

The word "reliable" contradicts to the error-prone approach. On the
other hand, it should be very difficult to change the stubborn
downstream firmware and we want driver to be usable as much as possible,
so in reality not much can be done about it.

> I think that's a fair point. I don't see why we can't support both
> implicit and explicit syncpoint requests. If most of the use-cases can
> work with implicit syncpoints and let the kernel handle all aspects of
> it, that's great. But there's no reason we can't provide more explicit
> controls for cases where it's really important that all the resources
> are allocated upfront.
> 
> Ultimately this is very specific to each use-case, so I think having
> both options will provide us with the most flexibility.
It should be fine to support both. This will add complexity to the
driver, thus it needs to be done wisely.

I'll need more time to think about it.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-01-29 17:30             ` Dmitry Osipenko
@ 2021-02-03 11:18               ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-02-03 11:18 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

On 1/29/21 7:30 PM, Dmitry Osipenko wrote:
> 28.01.2021 19:58, Thierry Reding пишет:
>> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>>> increments.  The job's sync point will be allocated dynamically when job
>>>>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>>
>>>>> Negative. Like I have explained in previous discussions, with the
>>>>> current way the usage of hardware resources is much more deterministic
>>>>> and obvious. I disagree on the point that this is much more complicated
>>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>>> the primary motivations of this series for me.
>>>>
>>>> Sync points are a limited resource. The most sensible way to work around
>>>> it is to keep sync points within kernel as much as possible. This is not
>>>> only much simpler for user space, but also allows to utilize DRM API
>>>> properly without re-inventing what already exists and it's easier to
>>>> maintain hardware in a good state.
>>>
>>> I've spent the last few years designing for automotive and industrial
>>> products, where we don't want to at runtime notice that the system is out of
>>> free syncpoints and because of that we can only process the next camera
>>> frame in a second or two instead of 16 milliseconds. We need to know that
>>> once we have allocated the resource, it is there. The newer chips are also
>>> designed to support this.
>>>
>>> Considering Linux is increasingly being used for such applications, and they
>>> are important target markets for NVIDIA, these need to be supported.
>>>
>>> Because of the above design constraint the userspace software that runs in
>>> these environments also expects resources to be allocated up front. This
>>> isn't a matter of having to design that software according to what kind of
>>> allocation API we decide do at Linux level -- it's no use designing for
>>> dynamic allocation if it leads to you not meeting the safety requirement of
>>> needing to ensure you have all resources allocated up front.
>>>
>>> This isn't a good design feature just in a car, but in anything that needs
>>> to be reliable. However, it does pose some tradeoffs, and if you think that
>>> running out of syncpoints on T20-T114 because of upfront allocation is an
>>> actual problem, I'm not opposed to having both options available.
> 
> The word "reliable" contradicts to the error-prone approach. On the
> other hand, it should be very difficult to change the stubborn
> downstream firmware and we want driver to be usable as much as possible,
> so in reality not much can be done about it.

Depends on the perspective.

> 
>> I think that's a fair point. I don't see why we can't support both
>> implicit and explicit syncpoint requests. If most of the use-cases can
>> work with implicit syncpoints and let the kernel handle all aspects of
>> it, that's great. But there's no reason we can't provide more explicit
>> controls for cases where it's really important that all the resources
>> are allocated upfront.
>>
>> Ultimately this is very specific to each use-case, so I think having
>> both options will provide us with the most flexibility.
> It should be fine to support both. This will add complexity to the
> driver, thus it needs to be done wisely.
> 
> I'll need more time to think about it.
> 

How about something like this:

Turn the syncpt_incr field back into an array of structs like

#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ		(1<<0)
#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT	(1<<1)

struct drm_tegra_submit_syncpt_incr {
	/* can be left as zero if using dynamic syncpt */
	__u32 syncpt_id;
	__u32 flags;

	struct {
		__u32 syncobj;
		__u32 value;
	} fence;

	/* patch word as such:
          * *word = *word | (syncpt_id << shift)
          */
	struct {
		__u32 gather_offset_words;
		__u32 shift;
	} patch;
};

So this will work similarly to the buffer reloc system; the kernel 
driver will allocate a job syncpoint and patch in the syncpoint ID if 
requested, and allows outputting syncobjs for each increment.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-02-03 11:18               ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-02-03 11:18 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen

On 1/29/21 7:30 PM, Dmitry Osipenko wrote:
> 28.01.2021 19:58, Thierry Reding пишет:
>> On Thu, Jan 28, 2021 at 01:08:54PM +0200, Mikko Perttunen wrote:
>>> On 1/27/21 11:20 PM, Dmitry Osipenko wrote:
>>>> 26.01.2021 05:45, Mikko Perttunen пишет:
>>>>>> 2. We will probably need a dedicated drm_tegra_submit_cmd for sync point
>>>>>> increments.  The job's sync point will be allocated dynamically when job
>>>>>> is submitted.  We will need a fag for the sync_incr and wait_syncpt
>>>>>> commands, saying "it's a job's sync point increment/wait"
>>>>>
>>>>> Negative. Like I have explained in previous discussions, with the
>>>>> current way the usage of hardware resources is much more deterministic
>>>>> and obvious. I disagree on the point that this is much more complicated
>>>>> for the userspace. Separating syncpoint and channel allocation is one of
>>>>> the primary motivations of this series for me.
>>>>
>>>> Sync points are a limited resource. The most sensible way to work around
>>>> it is to keep sync points within kernel as much as possible. This is not
>>>> only much simpler for user space, but also allows to utilize DRM API
>>>> properly without re-inventing what already exists and it's easier to
>>>> maintain hardware in a good state.
>>>
>>> I've spent the last few years designing for automotive and industrial
>>> products, where we don't want to at runtime notice that the system is out of
>>> free syncpoints and because of that we can only process the next camera
>>> frame in a second or two instead of 16 milliseconds. We need to know that
>>> once we have allocated the resource, it is there. The newer chips are also
>>> designed to support this.
>>>
>>> Considering Linux is increasingly being used for such applications, and they
>>> are important target markets for NVIDIA, these need to be supported.
>>>
>>> Because of the above design constraint the userspace software that runs in
>>> these environments also expects resources to be allocated up front. This
>>> isn't a matter of having to design that software according to what kind of
>>> allocation API we decide do at Linux level -- it's no use designing for
>>> dynamic allocation if it leads to you not meeting the safety requirement of
>>> needing to ensure you have all resources allocated up front.
>>>
>>> This isn't a good design feature just in a car, but in anything that needs
>>> to be reliable. However, it does pose some tradeoffs, and if you think that
>>> running out of syncpoints on T20-T114 because of upfront allocation is an
>>> actual problem, I'm not opposed to having both options available.
> 
> The word "reliable" contradicts to the error-prone approach. On the
> other hand, it should be very difficult to change the stubborn
> downstream firmware and we want driver to be usable as much as possible,
> so in reality not much can be done about it.

Depends on the perspective.

> 
>> I think that's a fair point. I don't see why we can't support both
>> implicit and explicit syncpoint requests. If most of the use-cases can
>> work with implicit syncpoints and let the kernel handle all aspects of
>> it, that's great. But there's no reason we can't provide more explicit
>> controls for cases where it's really important that all the resources
>> are allocated upfront.
>>
>> Ultimately this is very specific to each use-case, so I think having
>> both options will provide us with the most flexibility.
> It should be fine to support both. This will add complexity to the
> driver, thus it needs to be done wisely.
> 
> I'll need more time to think about it.
> 

How about something like this:

Turn the syncpt_incr field back into an array of structs like

#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ		(1<<0)
#define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT	(1<<1)

struct drm_tegra_submit_syncpt_incr {
	/* can be left as zero if using dynamic syncpt */
	__u32 syncpt_id;
	__u32 flags;

	struct {
		__u32 syncobj;
		__u32 value;
	} fence;

	/* patch word as such:
          * *word = *word | (syncpt_id << shift)
          */
	struct {
		__u32 gather_offset_words;
		__u32 shift;
	} patch;
};

So this will work similarly to the buffer reloc system; the kernel 
driver will allocate a job syncpoint and patch in the syncpoint ID if 
requested, and allows outputting syncobjs for each increment.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-02-03 11:18               ` Mikko Perttunen
@ 2021-02-27 11:19                 ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-02-27 11:19 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

03.02.2021 14:18, Mikko Perttunen пишет:
...
>> I'll need more time to think about it.
>>
> 
> How about something like this:
> 
> Turn the syncpt_incr field back into an array of structs like
> 
> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
> 
> struct drm_tegra_submit_syncpt_incr {
>     /* can be left as zero if using dynamic syncpt */
>     __u32 syncpt_id;
>     __u32 flags;
> 
>     struct {
>         __u32 syncobj;
>         __u32 value;
>     } fence;
> 
>     /* patch word as such:
>          * *word = *word | (syncpt_id << shift)
>          */
>     struct {
>         __u32 gather_offset_words;
>         __u32 shift;
>     } patch;
> };
> 
> So this will work similarly to the buffer reloc system; the kernel
> driver will allocate a job syncpoint and patch in the syncpoint ID if
> requested, and allows outputting syncobjs for each increment.

I haven't got any great ideas so far, but it feels that will be easier
and cleaner if we could have separate job paths (and job IOCTLS) based
on hardware generation since the workloads a too different. The needs of
a newer h/w are too obscure for me and absence of userspace code,
firmware sources and full h/w documentation do not help.

There still should be quite a lot to share, but things like
mapping-to-channel and VM sync points are too far away from older h/w,
IMO. This means that code path before drm-sched and path for job-timeout
handling should be separate.

Maybe later on it will become cleaner that we actually could unify it
all nicely, but for now it doesn't look like a good idea to me.

Mikko, do you have any objections to trying out variant with the
separate paths?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-02-27 11:19                 ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-02-27 11:19 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen

03.02.2021 14:18, Mikko Perttunen пишет:
...
>> I'll need more time to think about it.
>>
> 
> How about something like this:
> 
> Turn the syncpt_incr field back into an array of structs like
> 
> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
> 
> struct drm_tegra_submit_syncpt_incr {
>     /* can be left as zero if using dynamic syncpt */
>     __u32 syncpt_id;
>     __u32 flags;
> 
>     struct {
>         __u32 syncobj;
>         __u32 value;
>     } fence;
> 
>     /* patch word as such:
>          * *word = *word | (syncpt_id << shift)
>          */
>     struct {
>         __u32 gather_offset_words;
>         __u32 shift;
>     } patch;
> };
> 
> So this will work similarly to the buffer reloc system; the kernel
> driver will allocate a job syncpoint and patch in the syncpoint ID if
> requested, and allows outputting syncobjs for each increment.

I haven't got any great ideas so far, but it feels that will be easier
and cleaner if we could have separate job paths (and job IOCTLS) based
on hardware generation since the workloads a too different. The needs of
a newer h/w are too obscure for me and absence of userspace code,
firmware sources and full h/w documentation do not help.

There still should be quite a lot to share, but things like
mapping-to-channel and VM sync points are too far away from older h/w,
IMO. This means that code path before drm-sched and path for job-timeout
handling should be separate.

Maybe later on it will become cleaner that we actually could unify it
all nicely, but for now it doesn't look like a good idea to me.

Mikko, do you have any objections to trying out variant with the
separate paths?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-02-27 11:19                 ` Dmitry Osipenko
@ 2021-03-01  8:19                   ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-01  8:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

On 2/27/21 1:19 PM, Dmitry Osipenko wrote:
> 03.02.2021 14:18, Mikko Perttunen пишет:
> ...
>>> I'll need more time to think about it.
>>>
>>
>> How about something like this:
>>
>> Turn the syncpt_incr field back into an array of structs like
>>
>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>
>> struct drm_tegra_submit_syncpt_incr {
>>      /* can be left as zero if using dynamic syncpt */
>>      __u32 syncpt_id;
>>      __u32 flags;
>>
>>      struct {
>>          __u32 syncobj;
>>          __u32 value;
>>      } fence;
>>
>>      /* patch word as such:
>>           * *word = *word | (syncpt_id << shift)
>>           */
>>      struct {
>>          __u32 gather_offset_words;
>>          __u32 shift;
>>      } patch;
>> };
>>
>> So this will work similarly to the buffer reloc system; the kernel
>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>> requested, and allows outputting syncobjs for each increment.
> 
> I haven't got any great ideas so far, but it feels that will be easier
> and cleaner if we could have separate job paths (and job IOCTLS) based
> on hardware generation since the workloads a too different. The needs of
> a newer h/w are too obscure for me and absence of userspace code,
> firmware sources and full h/w documentation do not help.
> 
> There still should be quite a lot to share, but things like
> mapping-to-channel and VM sync points are too far away from older h/w,
> IMO. This means that code path before drm-sched and path for job-timeout
> handling should be separate.
> 
> Maybe later on it will become cleaner that we actually could unify it
> all nicely, but for now it doesn't look like a good idea to me.
> 
> Mikko, do you have any objections to trying out variant with the
> separate paths?
> 

I'm on vacation for the next two weeks. I'll think about it and post a 
proposal once I'm back to work.

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-03-01  8:19                   ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-01  8:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen

On 2/27/21 1:19 PM, Dmitry Osipenko wrote:
> 03.02.2021 14:18, Mikko Perttunen пишет:
> ...
>>> I'll need more time to think about it.
>>>
>>
>> How about something like this:
>>
>> Turn the syncpt_incr field back into an array of structs like
>>
>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>
>> struct drm_tegra_submit_syncpt_incr {
>>      /* can be left as zero if using dynamic syncpt */
>>      __u32 syncpt_id;
>>      __u32 flags;
>>
>>      struct {
>>          __u32 syncobj;
>>          __u32 value;
>>      } fence;
>>
>>      /* patch word as such:
>>           * *word = *word | (syncpt_id << shift)
>>           */
>>      struct {
>>          __u32 gather_offset_words;
>>          __u32 shift;
>>      } patch;
>> };
>>
>> So this will work similarly to the buffer reloc system; the kernel
>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>> requested, and allows outputting syncobjs for each increment.
> 
> I haven't got any great ideas so far, but it feels that will be easier
> and cleaner if we could have separate job paths (and job IOCTLS) based
> on hardware generation since the workloads a too different. The needs of
> a newer h/w are too obscure for me and absence of userspace code,
> firmware sources and full h/w documentation do not help.
> 
> There still should be quite a lot to share, but things like
> mapping-to-channel and VM sync points are too far away from older h/w,
> IMO. This means that code path before drm-sched and path for job-timeout
> handling should be separate.
> 
> Maybe later on it will become cleaner that we actually could unify it
> all nicely, but for now it doesn't look like a good idea to me.
> 
> Mikko, do you have any objections to trying out variant with the
> separate paths?
> 

I'm on vacation for the next two weeks. I'll think about it and post a 
proposal once I'm back to work.

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-01-11 12:59   ` Mikko Perttunen
@ 2021-03-22 14:46     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-22 14:46 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 1346 bytes --]

On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
> To avoid false lockdep warnings, give each client lock a different
> lock class, passed from the initialization site by macro.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/bus.c | 7 ++++---
>  include/linux/host1x.h   | 9 ++++++++-
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
> index 347fb962b6c9..8fc79e9cb652 100644
> --- a/drivers/gpu/host1x/bus.c
> +++ b/drivers/gpu/host1x/bus.c
> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>   * device and call host1x_device_init(), which will in turn call each client's
>   * &host1x_client_ops.init implementation.
>   */
> -int host1x_client_register(struct host1x_client *client)
> +int __host1x_client_register(struct host1x_client *client,
> +			   struct lock_class_key *key)

I've seen the kbuild robot warn about this because the kerneldoc is now
out of date.

>  {
>  	struct host1x *host1x;
>  	int err;
>  
>  	INIT_LIST_HEAD(&client->list);
> -	mutex_init(&client->lock);
> +	__mutex_init(&client->lock, "host1x client lock", key);

Should we maybe attempt to make this unique? Could we use something like
dev_name(client->dev) for this?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-22 14:46     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-22 14:46 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 1346 bytes --]

On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
> To avoid false lockdep warnings, give each client lock a different
> lock class, passed from the initialization site by macro.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/bus.c | 7 ++++---
>  include/linux/host1x.h   | 9 ++++++++-
>  2 files changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
> index 347fb962b6c9..8fc79e9cb652 100644
> --- a/drivers/gpu/host1x/bus.c
> +++ b/drivers/gpu/host1x/bus.c
> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>   * device and call host1x_device_init(), which will in turn call each client's
>   * &host1x_client_ops.init implementation.
>   */
> -int host1x_client_register(struct host1x_client *client)
> +int __host1x_client_register(struct host1x_client *client,
> +			   struct lock_class_key *key)

I've seen the kbuild robot warn about this because the kerneldoc is now
out of date.

>  {
>  	struct host1x *host1x;
>  	int err;
>  
>  	INIT_LIST_HEAD(&client->list);
> -	mutex_init(&client->lock);
> +	__mutex_init(&client->lock, "host1x client lock", key);

Should we maybe attempt to make this unique? Could we use something like
dev_name(client->dev) for this?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-22 14:46     ` Thierry Reding
@ 2021-03-22 14:48       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-22 14:48 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

22.03.2021 17:46, Thierry Reding пишет:
> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>> To avoid false lockdep warnings, give each client lock a different
>> lock class, passed from the initialization site by macro.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>  drivers/gpu/host1x/bus.c | 7 ++++---
>>  include/linux/host1x.h   | 9 ++++++++-
>>  2 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>> index 347fb962b6c9..8fc79e9cb652 100644
>> --- a/drivers/gpu/host1x/bus.c
>> +++ b/drivers/gpu/host1x/bus.c
>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>   * device and call host1x_device_init(), which will in turn call each client's
>>   * &host1x_client_ops.init implementation.
>>   */
>> -int host1x_client_register(struct host1x_client *client)
>> +int __host1x_client_register(struct host1x_client *client,
>> +			   struct lock_class_key *key)
> 
> I've seen the kbuild robot warn about this because the kerneldoc is now
> out of date.
> 
>>  {
>>  	struct host1x *host1x;
>>  	int err;
>>  
>>  	INIT_LIST_HEAD(&client->list);
>> -	mutex_init(&client->lock);
>> +	__mutex_init(&client->lock, "host1x client lock", key);
> 
> Should we maybe attempt to make this unique? Could we use something like
> dev_name(client->dev) for this?

I'm curious who the lockdep warning could be triggered at all, I don't
recall ever seeing it. Mikko, could you please clarify how to reproduce
the warning?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-22 14:48       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-22 14:48 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

22.03.2021 17:46, Thierry Reding пишет:
> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>> To avoid false lockdep warnings, give each client lock a different
>> lock class, passed from the initialization site by macro.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>  drivers/gpu/host1x/bus.c | 7 ++++---
>>  include/linux/host1x.h   | 9 ++++++++-
>>  2 files changed, 12 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>> index 347fb962b6c9..8fc79e9cb652 100644
>> --- a/drivers/gpu/host1x/bus.c
>> +++ b/drivers/gpu/host1x/bus.c
>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>   * device and call host1x_device_init(), which will in turn call each client's
>>   * &host1x_client_ops.init implementation.
>>   */
>> -int host1x_client_register(struct host1x_client *client)
>> +int __host1x_client_register(struct host1x_client *client,
>> +			   struct lock_class_key *key)
> 
> I've seen the kbuild robot warn about this because the kerneldoc is now
> out of date.
> 
>>  {
>>  	struct host1x *host1x;
>>  	int err;
>>  
>>  	INIT_LIST_HEAD(&client->list);
>> -	mutex_init(&client->lock);
>> +	__mutex_init(&client->lock, "host1x client lock", key);
> 
> Should we maybe attempt to make this unique? Could we use something like
> dev_name(client->dev) for this?

I'm curious who the lockdep warning could be triggered at all, I don't
recall ever seeing it. Mikko, could you please clarify how to reproduce
the warning?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-22 14:48       ` Dmitry Osipenko
@ 2021-03-22 15:19         ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-22 15:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

On 22.3.2021 16.48, Dmitry Osipenko wrote:
> 22.03.2021 17:46, Thierry Reding пишет:
>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>> To avoid false lockdep warnings, give each client lock a different
>>> lock class, passed from the initialization site by macro.
>>>
>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>> ---
>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>   include/linux/host1x.h   | 9 ++++++++-
>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>> index 347fb962b6c9..8fc79e9cb652 100644
>>> --- a/drivers/gpu/host1x/bus.c
>>> +++ b/drivers/gpu/host1x/bus.c
>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>    * device and call host1x_device_init(), which will in turn call each client's
>>>    * &host1x_client_ops.init implementation.
>>>    */
>>> -int host1x_client_register(struct host1x_client *client)
>>> +int __host1x_client_register(struct host1x_client *client,
>>> +			   struct lock_class_key *key)
>>
>> I've seen the kbuild robot warn about this because the kerneldoc is now
>> out of date.
>>
>>>   {
>>>   	struct host1x *host1x;
>>>   	int err;
>>>   
>>>   	INIT_LIST_HEAD(&client->list);
>>> -	mutex_init(&client->lock);
>>> +	__mutex_init(&client->lock, "host1x client lock", key);
>>
>> Should we maybe attempt to make this unique? Could we use something like
>> dev_name(client->dev) for this?
> 
> I'm curious who the lockdep warning could be triggered at all, I don't
> recall ever seeing it. Mikko, could you please clarify how to reproduce
> the warning?
> 

This is pretty difficult to read but I guess it's some interaction 
related to the delayed initialization of host1x clients? In any case, I 
consistently get it at boot (though it may be triggered by vic probe 
instead of nvdec).

I'll fix the kbuild robot warnings and see if I can add a 
client-specific lock name for v6.

Mikko

[   38.128257] WARNING: possible recursive locking detected
[   38.133567] 5.11.0-rc2-next-20210108+ #102 Tainted: G S
[   38.140089] --------------------------------------------
[   38.145395] systemd-udevd/239 is trying to acquire lock:
[   38.150703] ffff0000997aa218 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.160142]
[   38.160142] but task is already holding lock:
[   38.165968] ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.175398]
[   38.175398] other info that might help us debug this:
[   38.181918]  Possible unsafe locking scenario:
[   38.181918]
[   38.187830]        CPU0
[   38.190275]        ----
[   38.192719]   lock(&client->lock);
[   38.196129]   lock(&client->lock);
[   38.199537]
[   38.199537]  *** DEADLOCK ***
[   38.199537]
[   38.205449]  May be due to missing lock nesting notation
[   38.205449]
[   38.212228] 6 locks held by systemd-udevd/239:
[   38.216669]  #0: ffff00009261c188 (&dev->mutex){....}-{3:3}, at: 
device_driver_attach+0x60/0x130
[   38.225487]  #1: ffff800009a17168 (devices_lock){+.+.}-{3:3}, at: 
host1x_client_register+0x7c/0x220 [host1x]
[   38.235441]  #2: ffff000083f94bb8 (&host->devices_lock){+.+.}-{3:3}, 
at: host1x_client_register+0xac/0x220 [host1x]
[   38.245996]  #3: ffff0000a2267190 (&dev->mutex){....}-{3:3}, at: 
__device_attach+0x8c/0x230
[   38.254372]  #4: ffff000092c880f0 (&wgrp->lock){+.+.}-{3:3}, at: 
tegra_display_hub_prepare+0xd8/0x170 [tegra_drm]
[   38.264788]  #5: ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.274658]
[   38.274658] stack backtrace:
[   38.279012] CPU: 0 PID: 239 Comm: systemd-udevd Tainted: G S 
       5.11.0-rc2-next-20210108+ #102
[   38.288660] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
[   38.294577] Call trace:
[   38.297022]  dump_backtrace+0x0/0x2c0
[   38.300695]  show_stack+0x18/0x6c
[   38.304013]  dump_stack+0x120/0x19c
[   38.307507]  __lock_acquire+0x171c/0x2c34
[   38.311521]  lock_acquire.part.0+0x230/0x490
[   38.315793]  lock_acquire+0x70/0x90
[   38.319285]  __mutex_lock+0x11c/0x6d0
[   38.322952]  mutex_lock_nested+0x58/0x90
[   38.326877]  host1x_client_resume+0x30/0x100 [host1x]
[   38.332047]  host1x_client_resume+0x44/0x100 [host1x]
[   38.337200]  tegra_display_hub_prepare+0xf8/0x170 [tegra_drm]
[   38.343084]  host1x_drm_probe+0x1fc/0x4f0 [tegra_drm]
[   38.348256]  host1x_device_probe+0x3c/0x50 [host1x]
[   38.353240]  really_probe+0x148/0x6f0
[   38.356906]  driver_probe_device+0x78/0xe4
[   38.361005]  __device_attach_driver+0x10c/0x170
[   38.365536]  bus_for_each_drv+0xf0/0x160
[   38.369461]  __device_attach+0x168/0x230
[   38.373385]  device_initial_probe+0x14/0x20
[   38.377571]  bus_probe_device+0xec/0x100
[   38.381494]  device_add+0x580/0xbcc
[   38.384985]  host1x_subdev_register+0x178/0x1cc [host1x]
[   38.390397]  host1x_client_register+0x138/0x220 [host1x]
[   38.395808]  nvdec_probe+0x240/0x3ec [tegra_drm]
[   38.400549]  platform_probe+0x8c/0x110
[   38.404302]  really_probe+0x148/0x6f0
[   38.407966]  driver_probe_device+0x78/0xe4
[   38.412065]  device_driver_attach+0x120/0x130
[   38.416423]  __driver_attach+0xb4/0x190
[   38.420261]  bus_for_each_dev+0xe8/0x160
[   38.424185]  driver_attach+0x34/0x44
[   38.427761]  bus_add_driver+0x1a4/0x2b0
[   38.431598]  driver_register+0xe0/0x210
[   38.435437]  __platform_register_drivers+0x6c/0x104
[   38.440318]  host1x_drm_init+0x54/0x1000 [tegra_drm]
[   38.445405]  do_one_initcall+0xec/0x5e0
[   38.449244]  do_init_module+0xe0/0x384
[   38.453000]  load_module+0x32d8/0x3c60

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-22 15:19         ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-22 15:19 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

On 22.3.2021 16.48, Dmitry Osipenko wrote:
> 22.03.2021 17:46, Thierry Reding пишет:
>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>> To avoid false lockdep warnings, give each client lock a different
>>> lock class, passed from the initialization site by macro.
>>>
>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>> ---
>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>   include/linux/host1x.h   | 9 ++++++++-
>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>> index 347fb962b6c9..8fc79e9cb652 100644
>>> --- a/drivers/gpu/host1x/bus.c
>>> +++ b/drivers/gpu/host1x/bus.c
>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>    * device and call host1x_device_init(), which will in turn call each client's
>>>    * &host1x_client_ops.init implementation.
>>>    */
>>> -int host1x_client_register(struct host1x_client *client)
>>> +int __host1x_client_register(struct host1x_client *client,
>>> +			   struct lock_class_key *key)
>>
>> I've seen the kbuild robot warn about this because the kerneldoc is now
>> out of date.
>>
>>>   {
>>>   	struct host1x *host1x;
>>>   	int err;
>>>   
>>>   	INIT_LIST_HEAD(&client->list);
>>> -	mutex_init(&client->lock);
>>> +	__mutex_init(&client->lock, "host1x client lock", key);
>>
>> Should we maybe attempt to make this unique? Could we use something like
>> dev_name(client->dev) for this?
> 
> I'm curious who the lockdep warning could be triggered at all, I don't
> recall ever seeing it. Mikko, could you please clarify how to reproduce
> the warning?
> 

This is pretty difficult to read but I guess it's some interaction 
related to the delayed initialization of host1x clients? In any case, I 
consistently get it at boot (though it may be triggered by vic probe 
instead of nvdec).

I'll fix the kbuild robot warnings and see if I can add a 
client-specific lock name for v6.

Mikko

[   38.128257] WARNING: possible recursive locking detected
[   38.133567] 5.11.0-rc2-next-20210108+ #102 Tainted: G S
[   38.140089] --------------------------------------------
[   38.145395] systemd-udevd/239 is trying to acquire lock:
[   38.150703] ffff0000997aa218 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.160142]
[   38.160142] but task is already holding lock:
[   38.165968] ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.175398]
[   38.175398] other info that might help us debug this:
[   38.181918]  Possible unsafe locking scenario:
[   38.181918]
[   38.187830]        CPU0
[   38.190275]        ----
[   38.192719]   lock(&client->lock);
[   38.196129]   lock(&client->lock);
[   38.199537]
[   38.199537]  *** DEADLOCK ***
[   38.199537]
[   38.205449]  May be due to missing lock nesting notation
[   38.205449]
[   38.212228] 6 locks held by systemd-udevd/239:
[   38.216669]  #0: ffff00009261c188 (&dev->mutex){....}-{3:3}, at: 
device_driver_attach+0x60/0x130
[   38.225487]  #1: ffff800009a17168 (devices_lock){+.+.}-{3:3}, at: 
host1x_client_register+0x7c/0x220 [host1x]
[   38.235441]  #2: ffff000083f94bb8 (&host->devices_lock){+.+.}-{3:3}, 
at: host1x_client_register+0xac/0x220 [host1x]
[   38.245996]  #3: ffff0000a2267190 (&dev->mutex){....}-{3:3}, at: 
__device_attach+0x8c/0x230
[   38.254372]  #4: ffff000092c880f0 (&wgrp->lock){+.+.}-{3:3}, at: 
tegra_display_hub_prepare+0xd8/0x170 [tegra_drm]
[   38.264788]  #5: ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
host1x_client_resume+0x30/0x100 [host1x]
[   38.274658]
[   38.274658] stack backtrace:
[   38.279012] CPU: 0 PID: 239 Comm: systemd-udevd Tainted: G S 
       5.11.0-rc2-next-20210108+ #102
[   38.288660] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
[   38.294577] Call trace:
[   38.297022]  dump_backtrace+0x0/0x2c0
[   38.300695]  show_stack+0x18/0x6c
[   38.304013]  dump_stack+0x120/0x19c
[   38.307507]  __lock_acquire+0x171c/0x2c34
[   38.311521]  lock_acquire.part.0+0x230/0x490
[   38.315793]  lock_acquire+0x70/0x90
[   38.319285]  __mutex_lock+0x11c/0x6d0
[   38.322952]  mutex_lock_nested+0x58/0x90
[   38.326877]  host1x_client_resume+0x30/0x100 [host1x]
[   38.332047]  host1x_client_resume+0x44/0x100 [host1x]
[   38.337200]  tegra_display_hub_prepare+0xf8/0x170 [tegra_drm]
[   38.343084]  host1x_drm_probe+0x1fc/0x4f0 [tegra_drm]
[   38.348256]  host1x_device_probe+0x3c/0x50 [host1x]
[   38.353240]  really_probe+0x148/0x6f0
[   38.356906]  driver_probe_device+0x78/0xe4
[   38.361005]  __device_attach_driver+0x10c/0x170
[   38.365536]  bus_for_each_drv+0xf0/0x160
[   38.369461]  __device_attach+0x168/0x230
[   38.373385]  device_initial_probe+0x14/0x20
[   38.377571]  bus_probe_device+0xec/0x100
[   38.381494]  device_add+0x580/0xbcc
[   38.384985]  host1x_subdev_register+0x178/0x1cc [host1x]
[   38.390397]  host1x_client_register+0x138/0x220 [host1x]
[   38.395808]  nvdec_probe+0x240/0x3ec [tegra_drm]
[   38.400549]  platform_probe+0x8c/0x110
[   38.404302]  really_probe+0x148/0x6f0
[   38.407966]  driver_probe_device+0x78/0xe4
[   38.412065]  device_driver_attach+0x120/0x130
[   38.416423]  __driver_attach+0xb4/0x190
[   38.420261]  bus_for_each_dev+0xe8/0x160
[   38.424185]  driver_attach+0x34/0x44
[   38.427761]  bus_add_driver+0x1a4/0x2b0
[   38.431598]  driver_register+0xe0/0x210
[   38.435437]  __platform_register_drivers+0x6c/0x104
[   38.440318]  host1x_drm_init+0x54/0x1000 [tegra_drm]
[   38.445405]  do_one_initcall+0xec/0x5e0
[   38.449244]  do_init_module+0xe0/0x384
[   38.453000]  load_module+0x32d8/0x3c60
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-22 15:19         ` Mikko Perttunen
@ 2021-03-22 16:01           ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-22 16:01 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

22.03.2021 18:19, Mikko Perttunen пишет:
> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>> 22.03.2021 17:46, Thierry Reding пишет:
>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>> To avoid false lockdep warnings, give each client lock a different
>>>> lock class, passed from the initialization site by macro.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>> --- a/drivers/gpu/host1x/bus.c
>>>> +++ b/drivers/gpu/host1x/bus.c
>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>    * device and call host1x_device_init(), which will in turn call
>>>> each client's
>>>>    * &host1x_client_ops.init implementation.
>>>>    */
>>>> -int host1x_client_register(struct host1x_client *client)
>>>> +int __host1x_client_register(struct host1x_client *client,
>>>> +               struct lock_class_key *key)
>>>
>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>> out of date.
>>>
>>>>   {
>>>>       struct host1x *host1x;
>>>>       int err;
>>>>         INIT_LIST_HEAD(&client->list);
>>>> -    mutex_init(&client->lock);
>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>
>>> Should we maybe attempt to make this unique? Could we use something like
>>> dev_name(client->dev) for this?
>>
>> I'm curious who the lockdep warning could be triggered at all, I don't
>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>> the warning?
>>
> 
> This is pretty difficult to read but I guess it's some interaction
> related to the delayed initialization of host1x clients? In any case, I
> consistently get it at boot (though it may be triggered by vic probe
> instead of nvdec).
> 
> I'll fix the kbuild robot warnings and see if I can add a
> client-specific lock name for v6.

Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.

[    3.808338] ============================================
[    3.808355] WARNING: possible recursive locking detected
[    3.808376] 5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219 Tainted: G        W        
[    3.808406] --------------------------------------------
[    3.808421] kworker/1:2/108 is trying to acquire lock:
[    3.808449] c36b70a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.808586] 
               but task is already holding lock:
[    3.808603] c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.808712] 
               other info that might help us debug this:
[    3.808729]  Possible unsafe locking scenario:

[    3.808744]        CPU0
[    3.808757]        ----
[    3.808771]   lock(&client->lock);
[    3.808810]   lock(&client->lock);
[    3.808821] 
                *** DEADLOCK ***

[    3.808825]  May be due to missing lock nesting notation

[    3.808829] 15 locks held by kworker/1:2/108:
[    3.808836]  #0: c20068a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
[    3.808878]  #1: c2bbbf18 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
[    3.808912]  #2: c366d4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
[    3.808953]  #3: c141a980 (devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x35/0xfc
[    3.808986]  #4: c34df64c (&host1x->devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x51/0xfc
[    3.809017]  #5: c34ed4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
[    3.809050]  #6: c13faf5c (registration_lock){+.+.}-{3:3}, at: register_framebuffer+0x2d/0x274
[    3.809092]  #7: c132566c (console_lock){+.+.}-{0:0}, at: register_framebuffer+0x219/0x274
[    3.809124]  #8: c36e7848 (&fb_info->lock){+.+.}-{3:3}, at: register_framebuffer+0x19f/0x274
[    3.809157]  #9: c36d2d6c (&helper->lock){+.+.}-{3:3}, at: __drm_fb_helper_restore_fbdev_mode_unlocked+0x41/0x8c
[    3.809199]  #10: c36f00e8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_master_internal_acquire+0x17/0x28
[    3.809233]  #11: c36d2c50 (&client->modeset_mutex){+.+.}-{3:3}, at: drm_client_modeset_commit_locked+0x1d/0x138
[    3.809272]  #12: c2bbba28 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_client_modeset_commit_atomic+0x2f/0x1c4
[    3.809306]  #13: c36e6448 (crtc_ww_class_mutex){+.+.}-{3:3}, at: drm_modeset_backoff+0x63/0x190
[    3.809337]  #14: c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.809369] 
               stack backtrace:
[    3.809375] CPU: 1 PID: 108 Comm: kworker/1:2 Tainted: G        W         5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219
[    3.809387] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    3.809396] Workqueue: events deferred_probe_work_func
[    3.809417] [<c010d1ad>] (unwind_backtrace) from [<c010961d>] (show_stack+0x11/0x14)
[    3.809447] [<c010961d>] (show_stack) from [<c0b7d7c9>] (dump_stack+0x9f/0xb8)
[    3.809467] [<c0b7d7c9>] (dump_stack) from [<c0179eef>] (__lock_acquire+0x7fb/0x253c)
[    3.809495] [<c0179eef>] (__lock_acquire) from [<c017c403>] (lock_acquire+0xf3/0x420)
[    3.809516] [<c017c403>] (lock_acquire) from [<c0b87663>] (__mutex_lock+0x87/0x814)
[    3.809544] [<c0b87663>] (__mutex_lock) from [<c0b87e09>] (mutex_lock_nested+0x19/0x20)
[    3.809565] [<c0b87e09>] (mutex_lock_nested) from [<c05ccd2f>] (host1x_client_resume+0x17/0x58)
[    3.809587] [<c05ccd2f>] (host1x_client_resume) from [<c05ccd37>] (host1x_client_resume+0x1f/0x58)
[    3.809604] [<c05ccd37>] (host1x_client_resume) from [<c061d9a3>] (tegra_crtc_atomic_enable+0x33/0x21c4)
[    3.809634] [<c061d9a3>] (tegra_crtc_atomic_enable) from [<c05e0355>] (drm_atomic_helper_commit_modeset_enables+0x131/0x16c)
[    3.809667] [<c05e0355>] (drm_atomic_helper_commit_modeset_enables) from [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm+0x1d/0x4c)
[    3.809691] [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm) from [<c0610157>] (tegra_atomic_commit_tail+0x83/0x84)
[    3.809712] [<c0610157>] (tegra_atomic_commit_tail) from [<c05e1271>] (commit_tail+0x71/0x138)
[    3.809732] [<c05e1271>] (commit_tail) from [<c05e1b95>] (drm_atomic_helper_commit+0xf1/0x114)
[    3.809753] [<c05e1b95>] (drm_atomic_helper_commit) from [<c0607355>] (drm_client_modeset_commit_atomic+0x199/0x1c4)
[    3.809777] [<c0607355>] (drm_client_modeset_commit_atomic) from [<c0607401>] (drm_client_modeset_commit_locked+0x3d/0x138)
[    3.809798] [<c0607401>] (drm_client_modeset_commit_locked) from [<c0607517>] (drm_client_modeset_commit+0x1b/0x2c)
[    3.809818] [<c0607517>] (drm_client_modeset_commit) from [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked+0x73/0x8c)
[    3.809842] [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked) from [<c05e5cc9>] (drm_fb_helper_set_par+0x2d/0x4c)
[    3.809862] [<c05e5cc9>] (drm_fb_helper_set_par) from [<c056c763>] (fbcon_init+0x1cb/0x370)
[    3.809883] [<c056c763>] (fbcon_init) from [<c05af8c7>] (visual_init+0x8b/0xc8)
[    3.809902] [<c05af8c7>] (visual_init) from [<c05b07c5>] (do_bind_con_driver+0x13d/0x2b4)
[    3.809919] [<c05b07c5>] (do_bind_con_driver) from [<c05b0b93>] (do_take_over_console+0xdf/0x15c)
[    3.809937] [<c05b0b93>] (do_take_over_console) from [<c056b1df>] (do_fbcon_takeover+0x4f/0x90)
[    3.809955] [<c056b1df>] (do_fbcon_takeover) from [<c056545d>] (register_framebuffer+0x1a5/0x274)
[    3.809977] [<c056545d>] (register_framebuffer) from [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock+0x29f/0x438)
[    3.809999] [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock) from [<c06115e1>] (tegra_drm_fb_init+0x25/0x5c)
[    3.810022] [<c06115e1>] (tegra_drm_fb_init) from [<c060feff>] (host1x_drm_probe+0x247/0x404)
[    3.810041] [<c060feff>] (host1x_drm_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
[    3.810064] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
[    3.810086] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
[    3.810107] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
[    3.810127] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
[    3.810147] [<c064702d>] (bus_probe_device) from [<c064581b>] (device_add+0x293/0x5c0)
[    3.810166] [<c064581b>] (device_add) from [<c05cd211>] (host1x_subdev_register+0x8d/0xac)
[    3.810186] [<c05cd211>] (host1x_subdev_register) from [<c05cd4d3>] (host1x_client_register+0x8f/0xfc)
[    3.810204] [<c05cd4d3>] (host1x_client_register) from [<c061870f>] (tegra_dc_probe+0x1bf/0x2b0)
[    3.810225] [<c061870f>] (tegra_dc_probe) from [<c064977b>] (platform_probe+0x43/0x80)
[    3.810247] [<c064977b>] (platform_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
[    3.810266] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
[    3.810286] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
[    3.810307] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
[    3.810326] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
[    3.810346] [<c064702d>] (bus_probe_device) from [<c0647379>] (deferred_probe_work_func+0x4d/0x70)
[    3.810367] [<c0647379>] (deferred_probe_work_func) from [<c0139557>] (process_one_work+0x1eb/0x608)
[    3.810391] [<c0139557>] (process_one_work) from [<c0139a6d>] (worker_thread+0xf9/0x3bc)
[    3.810411] [<c0139a6d>] (worker_thread) from [<c013f3db>] (kthread+0xff/0x134)
[    3.810432] [<c013f3db>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38)
[    3.810449] Exception stack(0xc2bbbfb0 to 0xc2bbbff8)



^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-22 16:01           ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-22 16:01 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

22.03.2021 18:19, Mikko Perttunen пишет:
> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>> 22.03.2021 17:46, Thierry Reding пишет:
>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>> To avoid false lockdep warnings, give each client lock a different
>>>> lock class, passed from the initialization site by macro.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>> --- a/drivers/gpu/host1x/bus.c
>>>> +++ b/drivers/gpu/host1x/bus.c
>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>    * device and call host1x_device_init(), which will in turn call
>>>> each client's
>>>>    * &host1x_client_ops.init implementation.
>>>>    */
>>>> -int host1x_client_register(struct host1x_client *client)
>>>> +int __host1x_client_register(struct host1x_client *client,
>>>> +               struct lock_class_key *key)
>>>
>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>> out of date.
>>>
>>>>   {
>>>>       struct host1x *host1x;
>>>>       int err;
>>>>         INIT_LIST_HEAD(&client->list);
>>>> -    mutex_init(&client->lock);
>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>
>>> Should we maybe attempt to make this unique? Could we use something like
>>> dev_name(client->dev) for this?
>>
>> I'm curious who the lockdep warning could be triggered at all, I don't
>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>> the warning?
>>
> 
> This is pretty difficult to read but I guess it's some interaction
> related to the delayed initialization of host1x clients? In any case, I
> consistently get it at boot (though it may be triggered by vic probe
> instead of nvdec).
> 
> I'll fix the kbuild robot warnings and see if I can add a
> client-specific lock name for v6.

Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.

[    3.808338] ============================================
[    3.808355] WARNING: possible recursive locking detected
[    3.808376] 5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219 Tainted: G        W        
[    3.808406] --------------------------------------------
[    3.808421] kworker/1:2/108 is trying to acquire lock:
[    3.808449] c36b70a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.808586] 
               but task is already holding lock:
[    3.808603] c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.808712] 
               other info that might help us debug this:
[    3.808729]  Possible unsafe locking scenario:

[    3.808744]        CPU0
[    3.808757]        ----
[    3.808771]   lock(&client->lock);
[    3.808810]   lock(&client->lock);
[    3.808821] 
                *** DEADLOCK ***

[    3.808825]  May be due to missing lock nesting notation

[    3.808829] 15 locks held by kworker/1:2/108:
[    3.808836]  #0: c20068a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
[    3.808878]  #1: c2bbbf18 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
[    3.808912]  #2: c366d4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
[    3.808953]  #3: c141a980 (devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x35/0xfc
[    3.808986]  #4: c34df64c (&host1x->devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x51/0xfc
[    3.809017]  #5: c34ed4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
[    3.809050]  #6: c13faf5c (registration_lock){+.+.}-{3:3}, at: register_framebuffer+0x2d/0x274
[    3.809092]  #7: c132566c (console_lock){+.+.}-{0:0}, at: register_framebuffer+0x219/0x274
[    3.809124]  #8: c36e7848 (&fb_info->lock){+.+.}-{3:3}, at: register_framebuffer+0x19f/0x274
[    3.809157]  #9: c36d2d6c (&helper->lock){+.+.}-{3:3}, at: __drm_fb_helper_restore_fbdev_mode_unlocked+0x41/0x8c
[    3.809199]  #10: c36f00e8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_master_internal_acquire+0x17/0x28
[    3.809233]  #11: c36d2c50 (&client->modeset_mutex){+.+.}-{3:3}, at: drm_client_modeset_commit_locked+0x1d/0x138
[    3.809272]  #12: c2bbba28 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_client_modeset_commit_atomic+0x2f/0x1c4
[    3.809306]  #13: c36e6448 (crtc_ww_class_mutex){+.+.}-{3:3}, at: drm_modeset_backoff+0x63/0x190
[    3.809337]  #14: c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
[    3.809369] 
               stack backtrace:
[    3.809375] CPU: 1 PID: 108 Comm: kworker/1:2 Tainted: G        W         5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219
[    3.809387] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
[    3.809396] Workqueue: events deferred_probe_work_func
[    3.809417] [<c010d1ad>] (unwind_backtrace) from [<c010961d>] (show_stack+0x11/0x14)
[    3.809447] [<c010961d>] (show_stack) from [<c0b7d7c9>] (dump_stack+0x9f/0xb8)
[    3.809467] [<c0b7d7c9>] (dump_stack) from [<c0179eef>] (__lock_acquire+0x7fb/0x253c)
[    3.809495] [<c0179eef>] (__lock_acquire) from [<c017c403>] (lock_acquire+0xf3/0x420)
[    3.809516] [<c017c403>] (lock_acquire) from [<c0b87663>] (__mutex_lock+0x87/0x814)
[    3.809544] [<c0b87663>] (__mutex_lock) from [<c0b87e09>] (mutex_lock_nested+0x19/0x20)
[    3.809565] [<c0b87e09>] (mutex_lock_nested) from [<c05ccd2f>] (host1x_client_resume+0x17/0x58)
[    3.809587] [<c05ccd2f>] (host1x_client_resume) from [<c05ccd37>] (host1x_client_resume+0x1f/0x58)
[    3.809604] [<c05ccd37>] (host1x_client_resume) from [<c061d9a3>] (tegra_crtc_atomic_enable+0x33/0x21c4)
[    3.809634] [<c061d9a3>] (tegra_crtc_atomic_enable) from [<c05e0355>] (drm_atomic_helper_commit_modeset_enables+0x131/0x16c)
[    3.809667] [<c05e0355>] (drm_atomic_helper_commit_modeset_enables) from [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm+0x1d/0x4c)
[    3.809691] [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm) from [<c0610157>] (tegra_atomic_commit_tail+0x83/0x84)
[    3.809712] [<c0610157>] (tegra_atomic_commit_tail) from [<c05e1271>] (commit_tail+0x71/0x138)
[    3.809732] [<c05e1271>] (commit_tail) from [<c05e1b95>] (drm_atomic_helper_commit+0xf1/0x114)
[    3.809753] [<c05e1b95>] (drm_atomic_helper_commit) from [<c0607355>] (drm_client_modeset_commit_atomic+0x199/0x1c4)
[    3.809777] [<c0607355>] (drm_client_modeset_commit_atomic) from [<c0607401>] (drm_client_modeset_commit_locked+0x3d/0x138)
[    3.809798] [<c0607401>] (drm_client_modeset_commit_locked) from [<c0607517>] (drm_client_modeset_commit+0x1b/0x2c)
[    3.809818] [<c0607517>] (drm_client_modeset_commit) from [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked+0x73/0x8c)
[    3.809842] [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked) from [<c05e5cc9>] (drm_fb_helper_set_par+0x2d/0x4c)
[    3.809862] [<c05e5cc9>] (drm_fb_helper_set_par) from [<c056c763>] (fbcon_init+0x1cb/0x370)
[    3.809883] [<c056c763>] (fbcon_init) from [<c05af8c7>] (visual_init+0x8b/0xc8)
[    3.809902] [<c05af8c7>] (visual_init) from [<c05b07c5>] (do_bind_con_driver+0x13d/0x2b4)
[    3.809919] [<c05b07c5>] (do_bind_con_driver) from [<c05b0b93>] (do_take_over_console+0xdf/0x15c)
[    3.809937] [<c05b0b93>] (do_take_over_console) from [<c056b1df>] (do_fbcon_takeover+0x4f/0x90)
[    3.809955] [<c056b1df>] (do_fbcon_takeover) from [<c056545d>] (register_framebuffer+0x1a5/0x274)
[    3.809977] [<c056545d>] (register_framebuffer) from [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock+0x29f/0x438)
[    3.809999] [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock) from [<c06115e1>] (tegra_drm_fb_init+0x25/0x5c)
[    3.810022] [<c06115e1>] (tegra_drm_fb_init) from [<c060feff>] (host1x_drm_probe+0x247/0x404)
[    3.810041] [<c060feff>] (host1x_drm_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
[    3.810064] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
[    3.810086] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
[    3.810107] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
[    3.810127] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
[    3.810147] [<c064702d>] (bus_probe_device) from [<c064581b>] (device_add+0x293/0x5c0)
[    3.810166] [<c064581b>] (device_add) from [<c05cd211>] (host1x_subdev_register+0x8d/0xac)
[    3.810186] [<c05cd211>] (host1x_subdev_register) from [<c05cd4d3>] (host1x_client_register+0x8f/0xfc)
[    3.810204] [<c05cd4d3>] (host1x_client_register) from [<c061870f>] (tegra_dc_probe+0x1bf/0x2b0)
[    3.810225] [<c061870f>] (tegra_dc_probe) from [<c064977b>] (platform_probe+0x43/0x80)
[    3.810247] [<c064977b>] (platform_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
[    3.810266] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
[    3.810286] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
[    3.810307] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
[    3.810326] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
[    3.810346] [<c064702d>] (bus_probe_device) from [<c0647379>] (deferred_probe_work_func+0x4d/0x70)
[    3.810367] [<c0647379>] (deferred_probe_work_func) from [<c0139557>] (process_one_work+0x1eb/0x608)
[    3.810391] [<c0139557>] (process_one_work) from [<c0139a6d>] (worker_thread+0xf9/0x3bc)
[    3.810411] [<c0139a6d>] (worker_thread) from [<c013f3db>] (kthread+0xff/0x134)
[    3.810432] [<c013f3db>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38)
[    3.810449] Exception stack(0xc2bbbfb0 to 0xc2bbbff8)


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 10:10     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:10 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 2088 bytes --]

On Mon, Jan 11, 2021 at 03:00:00PM +0200, Mikko Perttunen wrote:
> Syncpoints don't need to be associated with any client,
> so remove the property, and expose host1x_syncpt_alloc.
> This will allow allocating syncpoints without prior knowledge
> of the engine that it will be used with.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v3:
> * Clean up host1x_syncpt_alloc signature to allow specifying
>   a name for the syncpoint.
> * Export the function.
> ---
>  drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
>  drivers/gpu/host1x/syncpt.h |  1 -
>  include/linux/host1x.h      |  3 +++
>  3 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index fce7892d5137..5982fdf64e1c 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
>  		base->requested = false;
>  }
>  
> -static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
> -						 struct host1x_client *client,
> -						 unsigned long flags)
> +struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
> +					  unsigned long flags,
> +					  const char *name)

If we expose it publicly, it's a good idea to add kerneldoc.

>  {
>  	struct host1x_syncpt *sp = host->syncpt;
> +	char *full_name;
>  	unsigned int i;
> -	char *name;
>  
>  	mutex_lock(&host->syncpt_mutex);
>  
> @@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>  			goto unlock;
>  	}
>  
> -	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
> -			 client ? dev_name(client->dev) : NULL);
> -	if (!name)
> +	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
> +	if (!full_name)

I know this just keeps with the status quo, but I wonder if we should
change this to be just "%u" if name == NULL to avoid a weird-looking
name. Or perhaps we want to enforce name != NULL by failing if that's
not the case?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
@ 2021-03-23 10:10     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:10 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 2088 bytes --]

On Mon, Jan 11, 2021 at 03:00:00PM +0200, Mikko Perttunen wrote:
> Syncpoints don't need to be associated with any client,
> so remove the property, and expose host1x_syncpt_alloc.
> This will allow allocating syncpoints without prior knowledge
> of the engine that it will be used with.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v3:
> * Clean up host1x_syncpt_alloc signature to allow specifying
>   a name for the syncpoint.
> * Export the function.
> ---
>  drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
>  drivers/gpu/host1x/syncpt.h |  1 -
>  include/linux/host1x.h      |  3 +++
>  3 files changed, 13 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index fce7892d5137..5982fdf64e1c 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
>  		base->requested = false;
>  }
>  
> -static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
> -						 struct host1x_client *client,
> -						 unsigned long flags)
> +struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
> +					  unsigned long flags,
> +					  const char *name)

If we expose it publicly, it's a good idea to add kerneldoc.

>  {
>  	struct host1x_syncpt *sp = host->syncpt;
> +	char *full_name;
>  	unsigned int i;
> -	char *name;
>  
>  	mutex_lock(&host->syncpt_mutex);
>  
> @@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>  			goto unlock;
>  	}
>  
> -	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
> -			 client ? dev_name(client->dev) : NULL);
> -	if (!name)
> +	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
> +	if (!full_name)

I know this just keeps with the status quo, but I wonder if we should
change this to be just "%u" if name == NULL to avoid a weird-looking
name. Or perhaps we want to enforce name != NULL by failing if that's
not the case?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 10:16     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:16 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 2048 bytes --]

On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> Show the number of pending waiters in the debugfs status file.
> This is useful for testing to verify that waiters do not leak
> or accumulate incorrectly.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/debug.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> index 1b4997bda1c7..8a14880c61bb 100644
> --- a/drivers/gpu/host1x/debug.c
> +++ b/drivers/gpu/host1x/debug.c
> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>  
>  static void show_syncpts(struct host1x *m, struct output *o)
>  {
> +	struct list_head *pos;
>  	unsigned int i;
>  
>  	host1x_debug_output(o, "---- syncpts ----\n");
> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>  	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>  		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>  		u32 min = host1x_syncpt_load(m->syncpt + i);
> +		unsigned int waiters = 0;
>  
> -		if (!min && !max)
> +		spin_lock(&m->syncpt[i].intr.lock);
> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> +			waiters++;
> +		spin_unlock(&m->syncpt[i].intr.lock);

Would it make sense to keep a running count so that we don't have to
compute it here?

> +
> +		if (!min && !max && !waiters)
>  			continue;
>  
> -		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
> -				    i, m->syncpt[i].name, min, max);
> +		host1x_debug_output(o,
> +				    "id %u (%s) min %d max %d (%d waiters)\n",
> +				    i, m->syncpt[i].name, min, max, waiters);

Or alternatively, would it be useful to collect a bit more information
about waiters so that when they leak we get a better understanding of
which ones leak?

It doesn't look like we currently have much information in struct
host1x_waitlist to identify waiters, but perhaps that can be extended?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-03-23 10:16     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:16 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 2048 bytes --]

On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> Show the number of pending waiters in the debugfs status file.
> This is useful for testing to verify that waiters do not leak
> or accumulate incorrectly.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/debug.c | 14 +++++++++++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> index 1b4997bda1c7..8a14880c61bb 100644
> --- a/drivers/gpu/host1x/debug.c
> +++ b/drivers/gpu/host1x/debug.c
> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>  
>  static void show_syncpts(struct host1x *m, struct output *o)
>  {
> +	struct list_head *pos;
>  	unsigned int i;
>  
>  	host1x_debug_output(o, "---- syncpts ----\n");
> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>  	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>  		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>  		u32 min = host1x_syncpt_load(m->syncpt + i);
> +		unsigned int waiters = 0;
>  
> -		if (!min && !max)
> +		spin_lock(&m->syncpt[i].intr.lock);
> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> +			waiters++;
> +		spin_unlock(&m->syncpt[i].intr.lock);

Would it make sense to keep a running count so that we don't have to
compute it here?

> +
> +		if (!min && !max && !waiters)
>  			continue;
>  
> -		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
> -				    i, m->syncpt[i].name, min, max);
> +		host1x_debug_output(o,
> +				    "id %u (%s) min %d max %d (%d waiters)\n",
> +				    i, m->syncpt[i].name, min, max, waiters);

Or alternatively, would it be useful to collect a bit more information
about waiters so that when they leak we get a better understanding of
which ones leak?

It doesn't look like we currently have much information in struct
host1x_waitlist to identify waiters, but perhaps that can be extended?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-22 16:01           ` Dmitry Osipenko
@ 2021-03-23 10:20             ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:20 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 10688 bytes --]

On Mon, Mar 22, 2021 at 07:01:34PM +0300, Dmitry Osipenko wrote:
> 22.03.2021 18:19, Mikko Perttunen пишет:
> > On 22.3.2021 16.48, Dmitry Osipenko wrote:
> >> 22.03.2021 17:46, Thierry Reding пишет:
> >>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
> >>>> To avoid false lockdep warnings, give each client lock a different
> >>>> lock class, passed from the initialization site by macro.
> >>>>
> >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> >>>> ---
> >>>>   drivers/gpu/host1x/bus.c | 7 ++++---
> >>>>   include/linux/host1x.h   | 9 ++++++++-
> >>>>   2 files changed, 12 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
> >>>> index 347fb962b6c9..8fc79e9cb652 100644
> >>>> --- a/drivers/gpu/host1x/bus.c
> >>>> +++ b/drivers/gpu/host1x/bus.c
> >>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
> >>>>    * device and call host1x_device_init(), which will in turn call
> >>>> each client's
> >>>>    * &host1x_client_ops.init implementation.
> >>>>    */
> >>>> -int host1x_client_register(struct host1x_client *client)
> >>>> +int __host1x_client_register(struct host1x_client *client,
> >>>> +               struct lock_class_key *key)
> >>>
> >>> I've seen the kbuild robot warn about this because the kerneldoc is now
> >>> out of date.
> >>>
> >>>>   {
> >>>>       struct host1x *host1x;
> >>>>       int err;
> >>>>         INIT_LIST_HEAD(&client->list);
> >>>> -    mutex_init(&client->lock);
> >>>> +    __mutex_init(&client->lock, "host1x client lock", key);
> >>>
> >>> Should we maybe attempt to make this unique? Could we use something like
> >>> dev_name(client->dev) for this?
> >>
> >> I'm curious who the lockdep warning could be triggered at all, I don't
> >> recall ever seeing it. Mikko, could you please clarify how to reproduce
> >> the warning?
> >>
> > 
> > This is pretty difficult to read but I guess it's some interaction
> > related to the delayed initialization of host1x clients? In any case, I
> > consistently get it at boot (though it may be triggered by vic probe
> > instead of nvdec).
> > 
> > I'll fix the kbuild robot warnings and see if I can add a
> > client-specific lock name for v6.
> 
> Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.
> 
> [    3.808338] ============================================
> [    3.808355] WARNING: possible recursive locking detected
> [    3.808376] 5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219 Tainted: G        W        
> [    3.808406] --------------------------------------------
> [    3.808421] kworker/1:2/108 is trying to acquire lock:
> [    3.808449] c36b70a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.808586] 
>                but task is already holding lock:
> [    3.808603] c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.808712] 
>                other info that might help us debug this:
> [    3.808729]  Possible unsafe locking scenario:
> 
> [    3.808744]        CPU0
> [    3.808757]        ----
> [    3.808771]   lock(&client->lock);
> [    3.808810]   lock(&client->lock);
> [    3.808821] 
>                 *** DEADLOCK ***
> 
> [    3.808825]  May be due to missing lock nesting notation
> 
> [    3.808829] 15 locks held by kworker/1:2/108:
> [    3.808836]  #0: c20068a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
> [    3.808878]  #1: c2bbbf18 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
> [    3.808912]  #2: c366d4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
> [    3.808953]  #3: c141a980 (devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x35/0xfc
> [    3.808986]  #4: c34df64c (&host1x->devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x51/0xfc
> [    3.809017]  #5: c34ed4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
> [    3.809050]  #6: c13faf5c (registration_lock){+.+.}-{3:3}, at: register_framebuffer+0x2d/0x274
> [    3.809092]  #7: c132566c (console_lock){+.+.}-{0:0}, at: register_framebuffer+0x219/0x274
> [    3.809124]  #8: c36e7848 (&fb_info->lock){+.+.}-{3:3}, at: register_framebuffer+0x19f/0x274
> [    3.809157]  #9: c36d2d6c (&helper->lock){+.+.}-{3:3}, at: __drm_fb_helper_restore_fbdev_mode_unlocked+0x41/0x8c
> [    3.809199]  #10: c36f00e8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_master_internal_acquire+0x17/0x28
> [    3.809233]  #11: c36d2c50 (&client->modeset_mutex){+.+.}-{3:3}, at: drm_client_modeset_commit_locked+0x1d/0x138
> [    3.809272]  #12: c2bbba28 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_client_modeset_commit_atomic+0x2f/0x1c4
> [    3.809306]  #13: c36e6448 (crtc_ww_class_mutex){+.+.}-{3:3}, at: drm_modeset_backoff+0x63/0x190
> [    3.809337]  #14: c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.809369] 
>                stack backtrace:
> [    3.809375] CPU: 1 PID: 108 Comm: kworker/1:2 Tainted: G        W         5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219
> [    3.809387] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [    3.809396] Workqueue: events deferred_probe_work_func
> [    3.809417] [<c010d1ad>] (unwind_backtrace) from [<c010961d>] (show_stack+0x11/0x14)
> [    3.809447] [<c010961d>] (show_stack) from [<c0b7d7c9>] (dump_stack+0x9f/0xb8)
> [    3.809467] [<c0b7d7c9>] (dump_stack) from [<c0179eef>] (__lock_acquire+0x7fb/0x253c)
> [    3.809495] [<c0179eef>] (__lock_acquire) from [<c017c403>] (lock_acquire+0xf3/0x420)
> [    3.809516] [<c017c403>] (lock_acquire) from [<c0b87663>] (__mutex_lock+0x87/0x814)
> [    3.809544] [<c0b87663>] (__mutex_lock) from [<c0b87e09>] (mutex_lock_nested+0x19/0x20)
> [    3.809565] [<c0b87e09>] (mutex_lock_nested) from [<c05ccd2f>] (host1x_client_resume+0x17/0x58)
> [    3.809587] [<c05ccd2f>] (host1x_client_resume) from [<c05ccd37>] (host1x_client_resume+0x1f/0x58)
> [    3.809604] [<c05ccd37>] (host1x_client_resume) from [<c061d9a3>] (tegra_crtc_atomic_enable+0x33/0x21c4)
> [    3.809634] [<c061d9a3>] (tegra_crtc_atomic_enable) from [<c05e0355>] (drm_atomic_helper_commit_modeset_enables+0x131/0x16c)
> [    3.809667] [<c05e0355>] (drm_atomic_helper_commit_modeset_enables) from [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm+0x1d/0x4c)
> [    3.809691] [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm) from [<c0610157>] (tegra_atomic_commit_tail+0x83/0x84)
> [    3.809712] [<c0610157>] (tegra_atomic_commit_tail) from [<c05e1271>] (commit_tail+0x71/0x138)
> [    3.809732] [<c05e1271>] (commit_tail) from [<c05e1b95>] (drm_atomic_helper_commit+0xf1/0x114)
> [    3.809753] [<c05e1b95>] (drm_atomic_helper_commit) from [<c0607355>] (drm_client_modeset_commit_atomic+0x199/0x1c4)
> [    3.809777] [<c0607355>] (drm_client_modeset_commit_atomic) from [<c0607401>] (drm_client_modeset_commit_locked+0x3d/0x138)
> [    3.809798] [<c0607401>] (drm_client_modeset_commit_locked) from [<c0607517>] (drm_client_modeset_commit+0x1b/0x2c)
> [    3.809818] [<c0607517>] (drm_client_modeset_commit) from [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked+0x73/0x8c)
> [    3.809842] [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked) from [<c05e5cc9>] (drm_fb_helper_set_par+0x2d/0x4c)
> [    3.809862] [<c05e5cc9>] (drm_fb_helper_set_par) from [<c056c763>] (fbcon_init+0x1cb/0x370)
> [    3.809883] [<c056c763>] (fbcon_init) from [<c05af8c7>] (visual_init+0x8b/0xc8)
> [    3.809902] [<c05af8c7>] (visual_init) from [<c05b07c5>] (do_bind_con_driver+0x13d/0x2b4)
> [    3.809919] [<c05b07c5>] (do_bind_con_driver) from [<c05b0b93>] (do_take_over_console+0xdf/0x15c)
> [    3.809937] [<c05b0b93>] (do_take_over_console) from [<c056b1df>] (do_fbcon_takeover+0x4f/0x90)
> [    3.809955] [<c056b1df>] (do_fbcon_takeover) from [<c056545d>] (register_framebuffer+0x1a5/0x274)
> [    3.809977] [<c056545d>] (register_framebuffer) from [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock+0x29f/0x438)
> [    3.809999] [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock) from [<c06115e1>] (tegra_drm_fb_init+0x25/0x5c)
> [    3.810022] [<c06115e1>] (tegra_drm_fb_init) from [<c060feff>] (host1x_drm_probe+0x247/0x404)
> [    3.810041] [<c060feff>] (host1x_drm_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
> [    3.810064] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
> [    3.810086] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
> [    3.810107] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
> [    3.810127] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
> [    3.810147] [<c064702d>] (bus_probe_device) from [<c064581b>] (device_add+0x293/0x5c0)
> [    3.810166] [<c064581b>] (device_add) from [<c05cd211>] (host1x_subdev_register+0x8d/0xac)
> [    3.810186] [<c05cd211>] (host1x_subdev_register) from [<c05cd4d3>] (host1x_client_register+0x8f/0xfc)
> [    3.810204] [<c05cd4d3>] (host1x_client_register) from [<c061870f>] (tegra_dc_probe+0x1bf/0x2b0)
> [    3.810225] [<c061870f>] (tegra_dc_probe) from [<c064977b>] (platform_probe+0x43/0x80)
> [    3.810247] [<c064977b>] (platform_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
> [    3.810266] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
> [    3.810286] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
> [    3.810307] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
> [    3.810326] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
> [    3.810346] [<c064702d>] (bus_probe_device) from [<c0647379>] (deferred_probe_work_func+0x4d/0x70)
> [    3.810367] [<c0647379>] (deferred_probe_work_func) from [<c0139557>] (process_one_work+0x1eb/0x608)
> [    3.810391] [<c0139557>] (process_one_work) from [<c0139a6d>] (worker_thread+0xf9/0x3bc)
> [    3.810411] [<c0139a6d>] (worker_thread) from [<c013f3db>] (kthread+0xff/0x134)
> [    3.810432] [<c013f3db>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38)
> [    3.810449] Exception stack(0xc2bbbfb0 to 0xc2bbbff8)

Sounds like we should decouple this from the series and fast-track this
for v5.13, or perhaps even v5.12 along with the DC coupling fix?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-23 10:20             ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:20 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 10688 bytes --]

On Mon, Mar 22, 2021 at 07:01:34PM +0300, Dmitry Osipenko wrote:
> 22.03.2021 18:19, Mikko Perttunen пишет:
> > On 22.3.2021 16.48, Dmitry Osipenko wrote:
> >> 22.03.2021 17:46, Thierry Reding пишет:
> >>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
> >>>> To avoid false lockdep warnings, give each client lock a different
> >>>> lock class, passed from the initialization site by macro.
> >>>>
> >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> >>>> ---
> >>>>   drivers/gpu/host1x/bus.c | 7 ++++---
> >>>>   include/linux/host1x.h   | 9 ++++++++-
> >>>>   2 files changed, 12 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
> >>>> index 347fb962b6c9..8fc79e9cb652 100644
> >>>> --- a/drivers/gpu/host1x/bus.c
> >>>> +++ b/drivers/gpu/host1x/bus.c
> >>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
> >>>>    * device and call host1x_device_init(), which will in turn call
> >>>> each client's
> >>>>    * &host1x_client_ops.init implementation.
> >>>>    */
> >>>> -int host1x_client_register(struct host1x_client *client)
> >>>> +int __host1x_client_register(struct host1x_client *client,
> >>>> +               struct lock_class_key *key)
> >>>
> >>> I've seen the kbuild robot warn about this because the kerneldoc is now
> >>> out of date.
> >>>
> >>>>   {
> >>>>       struct host1x *host1x;
> >>>>       int err;
> >>>>         INIT_LIST_HEAD(&client->list);
> >>>> -    mutex_init(&client->lock);
> >>>> +    __mutex_init(&client->lock, "host1x client lock", key);
> >>>
> >>> Should we maybe attempt to make this unique? Could we use something like
> >>> dev_name(client->dev) for this?
> >>
> >> I'm curious who the lockdep warning could be triggered at all, I don't
> >> recall ever seeing it. Mikko, could you please clarify how to reproduce
> >> the warning?
> >>
> > 
> > This is pretty difficult to read but I guess it's some interaction
> > related to the delayed initialization of host1x clients? In any case, I
> > consistently get it at boot (though it may be triggered by vic probe
> > instead of nvdec).
> > 
> > I'll fix the kbuild robot warnings and see if I can add a
> > client-specific lock name for v6.
> 
> Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.
> 
> [    3.808338] ============================================
> [    3.808355] WARNING: possible recursive locking detected
> [    3.808376] 5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219 Tainted: G        W        
> [    3.808406] --------------------------------------------
> [    3.808421] kworker/1:2/108 is trying to acquire lock:
> [    3.808449] c36b70a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.808586] 
>                but task is already holding lock:
> [    3.808603] c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.808712] 
>                other info that might help us debug this:
> [    3.808729]  Possible unsafe locking scenario:
> 
> [    3.808744]        CPU0
> [    3.808757]        ----
> [    3.808771]   lock(&client->lock);
> [    3.808810]   lock(&client->lock);
> [    3.808821] 
>                 *** DEADLOCK ***
> 
> [    3.808825]  May be due to missing lock nesting notation
> 
> [    3.808829] 15 locks held by kworker/1:2/108:
> [    3.808836]  #0: c20068a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
> [    3.808878]  #1: c2bbbf18 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x15a/0x608
> [    3.808912]  #2: c366d4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
> [    3.808953]  #3: c141a980 (devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x35/0xfc
> [    3.808986]  #4: c34df64c (&host1x->devices_lock){+.+.}-{3:3}, at: host1x_client_register+0x51/0xfc
> [    3.809017]  #5: c34ed4d8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x29/0xdc
> [    3.809050]  #6: c13faf5c (registration_lock){+.+.}-{3:3}, at: register_framebuffer+0x2d/0x274
> [    3.809092]  #7: c132566c (console_lock){+.+.}-{0:0}, at: register_framebuffer+0x219/0x274
> [    3.809124]  #8: c36e7848 (&fb_info->lock){+.+.}-{3:3}, at: register_framebuffer+0x19f/0x274
> [    3.809157]  #9: c36d2d6c (&helper->lock){+.+.}-{3:3}, at: __drm_fb_helper_restore_fbdev_mode_unlocked+0x41/0x8c
> [    3.809199]  #10: c36f00e8 (&dev->master_mutex){+.+.}-{3:3}, at: drm_master_internal_acquire+0x17/0x28
> [    3.809233]  #11: c36d2c50 (&client->modeset_mutex){+.+.}-{3:3}, at: drm_client_modeset_commit_locked+0x1d/0x138
> [    3.809272]  #12: c2bbba28 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_client_modeset_commit_atomic+0x2f/0x1c4
> [    3.809306]  #13: c36e6448 (crtc_ww_class_mutex){+.+.}-{3:3}, at: drm_modeset_backoff+0x63/0x190
> [    3.809337]  #14: c34df8a4 (&client->lock){+.+.}-{3:3}, at: host1x_client_resume+0x17/0x58
> [    3.809369] 
>                stack backtrace:
> [    3.809375] CPU: 1 PID: 108 Comm: kworker/1:2 Tainted: G        W         5.12.0-rc3-next-20210319-00176-g60867e51e180 #7219
> [    3.809387] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree)
> [    3.809396] Workqueue: events deferred_probe_work_func
> [    3.809417] [<c010d1ad>] (unwind_backtrace) from [<c010961d>] (show_stack+0x11/0x14)
> [    3.809447] [<c010961d>] (show_stack) from [<c0b7d7c9>] (dump_stack+0x9f/0xb8)
> [    3.809467] [<c0b7d7c9>] (dump_stack) from [<c0179eef>] (__lock_acquire+0x7fb/0x253c)
> [    3.809495] [<c0179eef>] (__lock_acquire) from [<c017c403>] (lock_acquire+0xf3/0x420)
> [    3.809516] [<c017c403>] (lock_acquire) from [<c0b87663>] (__mutex_lock+0x87/0x814)
> [    3.809544] [<c0b87663>] (__mutex_lock) from [<c0b87e09>] (mutex_lock_nested+0x19/0x20)
> [    3.809565] [<c0b87e09>] (mutex_lock_nested) from [<c05ccd2f>] (host1x_client_resume+0x17/0x58)
> [    3.809587] [<c05ccd2f>] (host1x_client_resume) from [<c05ccd37>] (host1x_client_resume+0x1f/0x58)
> [    3.809604] [<c05ccd37>] (host1x_client_resume) from [<c061d9a3>] (tegra_crtc_atomic_enable+0x33/0x21c4)
> [    3.809634] [<c061d9a3>] (tegra_crtc_atomic_enable) from [<c05e0355>] (drm_atomic_helper_commit_modeset_enables+0x131/0x16c)
> [    3.809667] [<c05e0355>] (drm_atomic_helper_commit_modeset_enables) from [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm+0x1d/0x4c)
> [    3.809691] [<c05e0e89>] (drm_atomic_helper_commit_tail_rpm) from [<c0610157>] (tegra_atomic_commit_tail+0x83/0x84)
> [    3.809712] [<c0610157>] (tegra_atomic_commit_tail) from [<c05e1271>] (commit_tail+0x71/0x138)
> [    3.809732] [<c05e1271>] (commit_tail) from [<c05e1b95>] (drm_atomic_helper_commit+0xf1/0x114)
> [    3.809753] [<c05e1b95>] (drm_atomic_helper_commit) from [<c0607355>] (drm_client_modeset_commit_atomic+0x199/0x1c4)
> [    3.809777] [<c0607355>] (drm_client_modeset_commit_atomic) from [<c0607401>] (drm_client_modeset_commit_locked+0x3d/0x138)
> [    3.809798] [<c0607401>] (drm_client_modeset_commit_locked) from [<c0607517>] (drm_client_modeset_commit+0x1b/0x2c)
> [    3.809818] [<c0607517>] (drm_client_modeset_commit) from [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked+0x73/0x8c)
> [    3.809842] [<c05e5c4f>] (__drm_fb_helper_restore_fbdev_mode_unlocked) from [<c05e5cc9>] (drm_fb_helper_set_par+0x2d/0x4c)
> [    3.809862] [<c05e5cc9>] (drm_fb_helper_set_par) from [<c056c763>] (fbcon_init+0x1cb/0x370)
> [    3.809883] [<c056c763>] (fbcon_init) from [<c05af8c7>] (visual_init+0x8b/0xc8)
> [    3.809902] [<c05af8c7>] (visual_init) from [<c05b07c5>] (do_bind_con_driver+0x13d/0x2b4)
> [    3.809919] [<c05b07c5>] (do_bind_con_driver) from [<c05b0b93>] (do_take_over_console+0xdf/0x15c)
> [    3.809937] [<c05b0b93>] (do_take_over_console) from [<c056b1df>] (do_fbcon_takeover+0x4f/0x90)
> [    3.809955] [<c056b1df>] (do_fbcon_takeover) from [<c056545d>] (register_framebuffer+0x1a5/0x274)
> [    3.809977] [<c056545d>] (register_framebuffer) from [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock+0x29f/0x438)
> [    3.809999] [<c05e57cf>] (__drm_fb_helper_initial_config_and_unlock) from [<c06115e1>] (tegra_drm_fb_init+0x25/0x5c)
> [    3.810022] [<c06115e1>] (tegra_drm_fb_init) from [<c060feff>] (host1x_drm_probe+0x247/0x404)
> [    3.810041] [<c060feff>] (host1x_drm_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
> [    3.810064] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
> [    3.810086] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
> [    3.810107] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
> [    3.810127] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
> [    3.810147] [<c064702d>] (bus_probe_device) from [<c064581b>] (device_add+0x293/0x5c0)
> [    3.810166] [<c064581b>] (device_add) from [<c05cd211>] (host1x_subdev_register+0x8d/0xac)
> [    3.810186] [<c05cd211>] (host1x_subdev_register) from [<c05cd4d3>] (host1x_client_register+0x8f/0xfc)
> [    3.810204] [<c05cd4d3>] (host1x_client_register) from [<c061870f>] (tegra_dc_probe+0x1bf/0x2b0)
> [    3.810225] [<c061870f>] (tegra_dc_probe) from [<c064977b>] (platform_probe+0x43/0x80)
> [    3.810247] [<c064977b>] (platform_probe) from [<c0647ad9>] (really_probe+0xb1/0x2a4)
> [    3.810266] [<c0647ad9>] (really_probe) from [<c0647d0b>] (driver_probe_device+0x3f/0x78)
> [    3.810286] [<c0647d0b>] (driver_probe_device) from [<c0646737>] (bus_for_each_drv+0x4f/0x78)
> [    3.810307] [<c0646737>] (bus_for_each_drv) from [<c06479d5>] (__device_attach+0x95/0xdc)
> [    3.810326] [<c06479d5>] (__device_attach) from [<c064702d>] (bus_probe_device+0x5d/0x64)
> [    3.810346] [<c064702d>] (bus_probe_device) from [<c0647379>] (deferred_probe_work_func+0x4d/0x70)
> [    3.810367] [<c0647379>] (deferred_probe_work_func) from [<c0139557>] (process_one_work+0x1eb/0x608)
> [    3.810391] [<c0139557>] (process_one_work) from [<c0139a6d>] (worker_thread+0xf9/0x3bc)
> [    3.810411] [<c0139a6d>] (worker_thread) from [<c013f3db>] (kthread+0xff/0x134)
> [    3.810432] [<c013f3db>] (kthread) from [<c0100159>] (ret_from_fork+0x11/0x38)
> [    3.810449] Exception stack(0xc2bbbfb0 to 0xc2bbbff8)

Sounds like we should decouple this from the series and fast-track this
for v5.13, or perhaps even v5.12 along with the DC coupling fix?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
  2021-01-12 22:20       ` Mikko Perttunen
@ 2021-03-23 10:23         ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:23 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Dmitry Osipenko, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 2100 bytes --]

On Wed, Jan 13, 2021 at 12:20:38AM +0200, Mikko Perttunen wrote:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> > > +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> > > +			 bool flush)
> > >   {
> > >   	struct host1x_waitlist *waiter = ref;
> > >   	struct host1x_syncpt *syncpt;
> > > -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> > > -	       WLS_REMOVED)
> > > -		schedule();
> > > +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
> > >   	syncpt = host->syncpt + id;
> > > -	(void)process_wait_list(host, syncpt,
> > > -				host1x_syncpt_load(host->syncpt + id));
> > > +
> > > +	spin_lock(&syncpt->intr.lock);
> > > +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> > > +	    WLS_CANCELLED) {
> > > +		list_del(&waiter->list);
> > > +		kref_put(&waiter->refcount, waiter_release);
> > > +	}
> > > +	spin_unlock(&syncpt->intr.lock);
> > > +
> > > +	if (flush) {
> > > +		/* Wait until any concurrently executing handler has finished. */
> > > +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> > > +			cpu_relax();
> > > +	}
> > 
> > A busy-loop shouldn't be used in kernel unless there is a very good
> > reason. The wait_event() should be used instead.
> > 
> > But please don't hurry to update this patch, we may need or want to
> > retire the host1x-waiter and then these all waiter-related patches won't
> > be needed.
> > 
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.

I agree, there's no reason to hold back any interim improvements. But I
do agree with Dmitry's argument about the busy loop. Prior to this, the
code used schedule() to let the CPU run other code while waiting for the
waiter to change state. Is there any reason why we can't keep schedule()
here?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately
@ 2021-03-23 10:23         ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:23 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Dmitry Osipenko, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 2100 bytes --]

On Wed, Jan 13, 2021 at 12:20:38AM +0200, Mikko Perttunen wrote:
> On 1/13/21 12:07 AM, Dmitry Osipenko wrote:
> > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > -void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref)
> > > +void host1x_intr_put_ref(struct host1x *host, unsigned int id, void *ref,
> > > +			 bool flush)
> > >   {
> > >   	struct host1x_waitlist *waiter = ref;
> > >   	struct host1x_syncpt *syncpt;
> > > -	while (atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED) ==
> > > -	       WLS_REMOVED)
> > > -		schedule();
> > > +	atomic_cmpxchg(&waiter->state, WLS_PENDING, WLS_CANCELLED);
> > >   	syncpt = host->syncpt + id;
> > > -	(void)process_wait_list(host, syncpt,
> > > -				host1x_syncpt_load(host->syncpt + id));
> > > +
> > > +	spin_lock(&syncpt->intr.lock);
> > > +	if (atomic_cmpxchg(&waiter->state, WLS_CANCELLED, WLS_HANDLED) ==
> > > +	    WLS_CANCELLED) {
> > > +		list_del(&waiter->list);
> > > +		kref_put(&waiter->refcount, waiter_release);
> > > +	}
> > > +	spin_unlock(&syncpt->intr.lock);
> > > +
> > > +	if (flush) {
> > > +		/* Wait until any concurrently executing handler has finished. */
> > > +		while (atomic_read(&waiter->state) != WLS_HANDLED)
> > > +			cpu_relax();
> > > +	}
> > 
> > A busy-loop shouldn't be used in kernel unless there is a very good
> > reason. The wait_event() should be used instead.
> > 
> > But please don't hurry to update this patch, we may need or want to
> > retire the host1x-waiter and then these all waiter-related patches won't
> > be needed.
> > 
> 
> Yes, we should improve the intr code to remove all this complexity. But
> let's merge this first to get a functional baseline and do larger design
> changes in follow-up patches.

I agree, there's no reason to hold back any interim improvements. But I
do agree with Dmitry's argument about the busy loop. Prior to this, the
code used schedule() to let the CPU run other code while waiting for the
waiter to change state. Is there any reason why we can't keep schedule()
here?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 05/21] gpu: host1x: Use HW-equivalent syncpoint expiration check
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 10:26     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:26 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 3190 bytes --]

On Mon, Jan 11, 2021 at 03:00:03PM +0200, Mikko Perttunen wrote:
> Make syncpoint expiration checks always use the same logic used by
> the hardware. This ensures that there are no race conditions that
> could occur because of the hardware triggering a syncpoint interrupt
> and then the driver disagreeing.
> 
> One situation where this could occur is if a job incremented a
> syncpoint too many times -- then the hardware would trigger an
> interrupt, but the driver would assume that a syncpoint value
> greater than the syncpoint's max value is in the future, and not
> clean up the job.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
>  1 file changed, 2 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index e48b4595cf53..9ccdf7709946 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
>  bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
>  {
>  	u32 current_val;
> -	u32 future_val;
>  
>  	smp_rmb();
>  
>  	current_val = (u32)atomic_read(&sp->min_val);
> -	future_val = (u32)atomic_read(&sp->max_val);
> -
> -	/* Note the use of unsigned arithmetic here (mod 1<<32).
> -	 *
> -	 * c = current_val = min_val	= the current value of the syncpoint.
> -	 * t = thresh			= the value we are checking
> -	 * f = future_val  = max_val	= the value c will reach when all
> -	 *				  outstanding increments have completed.
> -	 *
> -	 * Note that c always chases f until it reaches f.
> -	 *
> -	 * Dtf = (f - t)
> -	 * Dtc = (c - t)
> -	 *
> -	 *  Consider all cases:
> -	 *
> -	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
> -	 *	B) .....c.....f..t..	Dtf > Dtc	expired
> -	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
> -	 *
> -	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
> -	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
> -	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
> -	 *							Dtc!=0)
> -	 *
> -	 *  Other cases:
> -	 *
> -	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
> -	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
> -	 *	A) .....f..t..c.....	Dtf > Dtc	expired
> -	 *
> -	 *   So:
> -	 *	   Dtf >= Dtc implies EXPIRED	(return true)
> -	 *	   Dtf <  Dtc implies WAIT	(return false)
> -	 *
> -	 * Note: If t is expired then we *cannot* wait on it. We would wait
> -	 * forever (hang the system).
> -	 *
> -	 * Note: do NOT get clever and remove the -thresh from both sides. It
> -	 * is NOT the same.
> -	 *
> -	 * If future valueis zero, we have a client managed sync point. In that
> -	 * case we do a direct comparison.
> -	 */
> -	if (!host1x_syncpt_client_managed(sp))
> -		return future_val - thresh >= current_val - thresh;
> -	else
> -		return (s32)(current_val - thresh) >= 0;
> +
> +	return ((current_val - thresh) & 0x80000000U) == 0U;

Heh... now I finally understand what this is supposed to do. =)

Nice one.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 05/21] gpu: host1x: Use HW-equivalent syncpoint expiration check
@ 2021-03-23 10:26     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:26 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 3190 bytes --]

On Mon, Jan 11, 2021 at 03:00:03PM +0200, Mikko Perttunen wrote:
> Make syncpoint expiration checks always use the same logic used by
> the hardware. This ensures that there are no race conditions that
> could occur because of the hardware triggering a syncpoint interrupt
> and then the driver disagreeing.
> 
> One situation where this could occur is if a job incremented a
> syncpoint too many times -- then the hardware would trigger an
> interrupt, but the driver would assume that a syncpoint value
> greater than the syncpoint's max value is in the future, and not
> clean up the job.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/syncpt.c | 51 ++-----------------------------------
>  1 file changed, 2 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
> index e48b4595cf53..9ccdf7709946 100644
> --- a/drivers/gpu/host1x/syncpt.c
> +++ b/drivers/gpu/host1x/syncpt.c
> @@ -306,59 +306,12 @@ EXPORT_SYMBOL(host1x_syncpt_wait);
>  bool host1x_syncpt_is_expired(struct host1x_syncpt *sp, u32 thresh)
>  {
>  	u32 current_val;
> -	u32 future_val;
>  
>  	smp_rmb();
>  
>  	current_val = (u32)atomic_read(&sp->min_val);
> -	future_val = (u32)atomic_read(&sp->max_val);
> -
> -	/* Note the use of unsigned arithmetic here (mod 1<<32).
> -	 *
> -	 * c = current_val = min_val	= the current value of the syncpoint.
> -	 * t = thresh			= the value we are checking
> -	 * f = future_val  = max_val	= the value c will reach when all
> -	 *				  outstanding increments have completed.
> -	 *
> -	 * Note that c always chases f until it reaches f.
> -	 *
> -	 * Dtf = (f - t)
> -	 * Dtc = (c - t)
> -	 *
> -	 *  Consider all cases:
> -	 *
> -	 *	A) .....c..t..f.....	Dtf < Dtc	need to wait
> -	 *	B) .....c.....f..t..	Dtf > Dtc	expired
> -	 *	C) ..t..c.....f.....	Dtf > Dtc	expired	   (Dct very large)
> -	 *
> -	 *  Any case where f==c: always expired (for any t).	Dtf == Dcf
> -	 *  Any case where t==c: always expired (for any f).	Dtf >= Dtc (because Dtc==0)
> -	 *  Any case where t==f!=c: always wait.		Dtf <  Dtc (because Dtf==0,
> -	 *							Dtc!=0)
> -	 *
> -	 *  Other cases:
> -	 *
> -	 *	A) .....t..f..c.....	Dtf < Dtc	need to wait
> -	 *	A) .....f..c..t.....	Dtf < Dtc	need to wait
> -	 *	A) .....f..t..c.....	Dtf > Dtc	expired
> -	 *
> -	 *   So:
> -	 *	   Dtf >= Dtc implies EXPIRED	(return true)
> -	 *	   Dtf <  Dtc implies WAIT	(return false)
> -	 *
> -	 * Note: If t is expired then we *cannot* wait on it. We would wait
> -	 * forever (hang the system).
> -	 *
> -	 * Note: do NOT get clever and remove the -thresh from both sides. It
> -	 * is NOT the same.
> -	 *
> -	 * If future valueis zero, we have a client managed sync point. In that
> -	 * case we do a direct comparison.
> -	 */
> -	if (!host1x_syncpt_client_managed(sp))
> -		return future_val - thresh >= current_val - thresh;
> -	else
> -		return (s32)(current_val - thresh) >= 0;
> +
> +	return ((current_val - thresh) & 0x80000000U) == 0U;

Heh... now I finally understand what this is supposed to do. =)

Nice one.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
  2021-03-23 10:10     ` Thierry Reding
@ 2021-03-23 10:32       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 10:32 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

On 3/23/21 12:10 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:00PM +0200, Mikko Perttunen wrote:
>> Syncpoints don't need to be associated with any client,
>> so remove the property, and expose host1x_syncpt_alloc.
>> This will allow allocating syncpoints without prior knowledge
>> of the engine that it will be used with.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v3:
>> * Clean up host1x_syncpt_alloc signature to allow specifying
>>    a name for the syncpoint.
>> * Export the function.
>> ---
>>   drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
>>   drivers/gpu/host1x/syncpt.h |  1 -
>>   include/linux/host1x.h      |  3 +++
>>   3 files changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index fce7892d5137..5982fdf64e1c 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
>>   		base->requested = false;
>>   }
>>   
>> -static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>> -						 struct host1x_client *client,
>> -						 unsigned long flags)
>> +struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>> +					  unsigned long flags,
>> +					  const char *name)
> 
> If we expose it publicly, it's a good idea to add kerneldoc.

Will fix.

> 
>>   {
>>   	struct host1x_syncpt *sp = host->syncpt;
>> +	char *full_name;
>>   	unsigned int i;
>> -	char *name;
>>   
>>   	mutex_lock(&host->syncpt_mutex);
>>   
>> @@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>>   			goto unlock;
>>   	}
>>   
>> -	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
>> -			 client ? dev_name(client->dev) : NULL);
>> -	if (!name)
>> +	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
>> +	if (!full_name)
> 
> I know this just keeps with the status quo, but I wonder if we should
> change this to be just "%u" if name == NULL to avoid a weird-looking
> name. Or perhaps we want to enforce name != NULL by failing if that's
> not the case?

I'll see about making the name mandatory.

> 
> Thierry
> 

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client
@ 2021-03-23 10:32       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 10:32 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 12:10 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:00PM +0200, Mikko Perttunen wrote:
>> Syncpoints don't need to be associated with any client,
>> so remove the property, and expose host1x_syncpt_alloc.
>> This will allow allocating syncpoints without prior knowledge
>> of the engine that it will be used with.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v3:
>> * Clean up host1x_syncpt_alloc signature to allow specifying
>>    a name for the syncpoint.
>> * Export the function.
>> ---
>>   drivers/gpu/host1x/syncpt.c | 22 ++++++++++------------
>>   drivers/gpu/host1x/syncpt.h |  1 -
>>   include/linux/host1x.h      |  3 +++
>>   3 files changed, 13 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/syncpt.c b/drivers/gpu/host1x/syncpt.c
>> index fce7892d5137..5982fdf64e1c 100644
>> --- a/drivers/gpu/host1x/syncpt.c
>> +++ b/drivers/gpu/host1x/syncpt.c
>> @@ -42,13 +42,13 @@ static void host1x_syncpt_base_free(struct host1x_syncpt_base *base)
>>   		base->requested = false;
>>   }
>>   
>> -static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>> -						 struct host1x_client *client,
>> -						 unsigned long flags)
>> +struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>> +					  unsigned long flags,
>> +					  const char *name)
> 
> If we expose it publicly, it's a good idea to add kerneldoc.

Will fix.

> 
>>   {
>>   	struct host1x_syncpt *sp = host->syncpt;
>> +	char *full_name;
>>   	unsigned int i;
>> -	char *name;
>>   
>>   	mutex_lock(&host->syncpt_mutex);
>>   
>> @@ -64,13 +64,11 @@ static struct host1x_syncpt *host1x_syncpt_alloc(struct host1x *host,
>>   			goto unlock;
>>   	}
>>   
>> -	name = kasprintf(GFP_KERNEL, "%02u-%s", sp->id,
>> -			 client ? dev_name(client->dev) : NULL);
>> -	if (!name)
>> +	full_name = kasprintf(GFP_KERNEL, "%u-%s", sp->id, name);
>> +	if (!full_name)
> 
> I know this just keeps with the status quo, but I wonder if we should
> change this to be just "%u" if name == NULL to avoid a weird-looking
> name. Or perhaps we want to enforce name != NULL by failing if that's
> not the case?

I'll see about making the name mandatory.

> 
> Thierry
> 

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 10:36     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:36 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 4640 bytes --]

On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
> Add reference counting for allocated syncpoints to allow keeping
> them allocated while jobs are referencing them. Additionally,
> clean up various places using syncpoint IDs to use host1x_syncpt
> pointers instead.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> - Remove host1x_syncpt_put in submit code, as job_put already
>   puts the syncpoint.
> - Changes due to rebase in VI driver.
> v4:
> - Update from _free to _put in VI driver as well
> ---
>  drivers/gpu/drm/tegra/dc.c             |  4 +-
>  drivers/gpu/drm/tegra/drm.c            | 14 ++---
>  drivers/gpu/drm/tegra/gr2d.c           |  4 +-
>  drivers/gpu/drm/tegra/gr3d.c           |  4 +-
>  drivers/gpu/drm/tegra/vic.c            |  4 +-
>  drivers/gpu/host1x/cdma.c              | 11 ++--
>  drivers/gpu/host1x/dev.h               |  7 ++-
>  drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
>  drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
>  drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
>  drivers/gpu/host1x/job.c               |  5 +-
>  drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
>  drivers/gpu/host1x/syncpt.h            |  3 ++
>  drivers/staging/media/tegra-video/vi.c |  4 +-
>  include/linux/host1x.h                 |  8 +--
>  15 files changed, 98 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
> index 85dd7131553a..033032dfc4b9 100644
> --- a/drivers/gpu/drm/tegra/dc.c
> +++ b/drivers/gpu/drm/tegra/dc.c
> @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
>  		drm_plane_cleanup(primary);
>  
>  	host1x_client_iommu_detach(client);
> -	host1x_syncpt_free(dc->syncpt);
> +	host1x_syncpt_put(dc->syncpt);
>  
>  	return err;
>  }
> @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
>  	}
>  
>  	host1x_client_iommu_detach(client);
> -	host1x_syncpt_free(dc->syncpt);
> +	host1x_syncpt_put(dc->syncpt);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index e45c8414e2a3..5a6037eff37f 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	struct drm_tegra_syncpt syncpt;
>  	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
>  	struct drm_gem_object **refs;
> -	struct host1x_syncpt *sp;
> +	struct host1x_syncpt *sp = NULL;
>  	struct host1x_job *job;
>  	unsigned int num_refs;
>  	int err;
> @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  		goto fail;
>  	}
>  
> -	/* check whether syncpoint ID is valid */
> -	sp = host1x_syncpt_get(host1x, syncpt.id);
> +	/* Syncpoint ref will be dropped on job release. */
> +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);

It's a bit odd to replace the comment like that. Perhaps instead of
replacing it, just extend it with the note about the lifetime?

>  	if (!sp) {
>  		err = -ENOENT;
>  		goto fail;
> @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	job->is_addr_reg = context->client->ops->is_addr_reg;
>  	job->is_valid_class = context->client->ops->is_valid_class;
>  	job->syncpt_incrs = syncpt.incrs;
> -	job->syncpt_id = syncpt.id;
> +	job->syncpt = sp;
>  	job->timeout = 10000;
>  
>  	if (args->timeout && args->timeout < 10000)
> @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
>  	struct drm_tegra_syncpt_read *args = data;
>  	struct host1x_syncpt *sp;
>  
> -	sp = host1x_syncpt_get(host, args->id);
> +	sp = host1x_syncpt_get_by_id_noref(host, args->id);

Why don't we need a reference here? It's perhaps unlikely, because this
function is short-lived, but the otherwise last reference to this could
go away at any point after this line and cause sp to become invalid.

In general it's very rare to not have to keep a reference to a reference
counted object.

>  	if (!sp)
>  		return -EINVAL;
>  
> @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
>  	struct drm_tegra_syncpt_incr *args = data;
>  	struct host1x_syncpt *sp;
>  
> -	sp = host1x_syncpt_get(host1x, args->id);
> +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);

Same here. Or am I missing some other way by which it is ensured that
the reference stays around?

Generally I like this because it makes the handling of syncpoints much
more consistent.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2021-03-23 10:36     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:36 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 4640 bytes --]

On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
> Add reference counting for allocated syncpoints to allow keeping
> them allocated while jobs are referencing them. Additionally,
> clean up various places using syncpoint IDs to use host1x_syncpt
> pointers instead.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> - Remove host1x_syncpt_put in submit code, as job_put already
>   puts the syncpoint.
> - Changes due to rebase in VI driver.
> v4:
> - Update from _free to _put in VI driver as well
> ---
>  drivers/gpu/drm/tegra/dc.c             |  4 +-
>  drivers/gpu/drm/tegra/drm.c            | 14 ++---
>  drivers/gpu/drm/tegra/gr2d.c           |  4 +-
>  drivers/gpu/drm/tegra/gr3d.c           |  4 +-
>  drivers/gpu/drm/tegra/vic.c            |  4 +-
>  drivers/gpu/host1x/cdma.c              | 11 ++--
>  drivers/gpu/host1x/dev.h               |  7 ++-
>  drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
>  drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
>  drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
>  drivers/gpu/host1x/job.c               |  5 +-
>  drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
>  drivers/gpu/host1x/syncpt.h            |  3 ++
>  drivers/staging/media/tegra-video/vi.c |  4 +-
>  include/linux/host1x.h                 |  8 +--
>  15 files changed, 98 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
> index 85dd7131553a..033032dfc4b9 100644
> --- a/drivers/gpu/drm/tegra/dc.c
> +++ b/drivers/gpu/drm/tegra/dc.c
> @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
>  		drm_plane_cleanup(primary);
>  
>  	host1x_client_iommu_detach(client);
> -	host1x_syncpt_free(dc->syncpt);
> +	host1x_syncpt_put(dc->syncpt);
>  
>  	return err;
>  }
> @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
>  	}
>  
>  	host1x_client_iommu_detach(client);
> -	host1x_syncpt_free(dc->syncpt);
> +	host1x_syncpt_put(dc->syncpt);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index e45c8414e2a3..5a6037eff37f 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	struct drm_tegra_syncpt syncpt;
>  	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
>  	struct drm_gem_object **refs;
> -	struct host1x_syncpt *sp;
> +	struct host1x_syncpt *sp = NULL;
>  	struct host1x_job *job;
>  	unsigned int num_refs;
>  	int err;
> @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  		goto fail;
>  	}
>  
> -	/* check whether syncpoint ID is valid */
> -	sp = host1x_syncpt_get(host1x, syncpt.id);
> +	/* Syncpoint ref will be dropped on job release. */
> +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);

It's a bit odd to replace the comment like that. Perhaps instead of
replacing it, just extend it with the note about the lifetime?

>  	if (!sp) {
>  		err = -ENOENT;
>  		goto fail;
> @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	job->is_addr_reg = context->client->ops->is_addr_reg;
>  	job->is_valid_class = context->client->ops->is_valid_class;
>  	job->syncpt_incrs = syncpt.incrs;
> -	job->syncpt_id = syncpt.id;
> +	job->syncpt = sp;
>  	job->timeout = 10000;
>  
>  	if (args->timeout && args->timeout < 10000)
> @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
>  	struct drm_tegra_syncpt_read *args = data;
>  	struct host1x_syncpt *sp;
>  
> -	sp = host1x_syncpt_get(host, args->id);
> +	sp = host1x_syncpt_get_by_id_noref(host, args->id);

Why don't we need a reference here? It's perhaps unlikely, because this
function is short-lived, but the otherwise last reference to this could
go away at any point after this line and cause sp to become invalid.

In general it's very rare to not have to keep a reference to a reference
counted object.

>  	if (!sp)
>  		return -EINVAL;
>  
> @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
>  	struct drm_tegra_syncpt_incr *args = data;
>  	struct host1x_syncpt *sp;
>  
> -	sp = host1x_syncpt_get(host1x, args->id);
> +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);

Same here. Or am I missing some other way by which it is ensured that
the reference stays around?

Generally I like this because it makes the handling of syncpoints much
more consistent.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
  2021-03-23 10:36     ` Thierry Reding
@ 2021-03-23 10:44       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 10:44 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

On 3/23/21 12:36 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
>> Add reference counting for allocated syncpoints to allow keeping
>> them allocated while jobs are referencing them. Additionally,
>> clean up various places using syncpoint IDs to use host1x_syncpt
>> pointers instead.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> - Remove host1x_syncpt_put in submit code, as job_put already
>>    puts the syncpoint.
>> - Changes due to rebase in VI driver.
>> v4:
>> - Update from _free to _put in VI driver as well
>> ---
>>   drivers/gpu/drm/tegra/dc.c             |  4 +-
>>   drivers/gpu/drm/tegra/drm.c            | 14 ++---
>>   drivers/gpu/drm/tegra/gr2d.c           |  4 +-
>>   drivers/gpu/drm/tegra/gr3d.c           |  4 +-
>>   drivers/gpu/drm/tegra/vic.c            |  4 +-
>>   drivers/gpu/host1x/cdma.c              | 11 ++--
>>   drivers/gpu/host1x/dev.h               |  7 ++-
>>   drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
>>   drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
>>   drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
>>   drivers/gpu/host1x/job.c               |  5 +-
>>   drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
>>   drivers/gpu/host1x/syncpt.h            |  3 ++
>>   drivers/staging/media/tegra-video/vi.c |  4 +-
>>   include/linux/host1x.h                 |  8 +--
>>   15 files changed, 98 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
>> index 85dd7131553a..033032dfc4b9 100644
>> --- a/drivers/gpu/drm/tegra/dc.c
>> +++ b/drivers/gpu/drm/tegra/dc.c
>> @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
>>   		drm_plane_cleanup(primary);
>>   
>>   	host1x_client_iommu_detach(client);
>> -	host1x_syncpt_free(dc->syncpt);
>> +	host1x_syncpt_put(dc->syncpt);
>>   
>>   	return err;
>>   }
>> @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
>>   	}
>>   
>>   	host1x_client_iommu_detach(client);
>> -	host1x_syncpt_free(dc->syncpt);
>> +	host1x_syncpt_put(dc->syncpt);
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index e45c8414e2a3..5a6037eff37f 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   	struct drm_tegra_syncpt syncpt;
>>   	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
>>   	struct drm_gem_object **refs;
>> -	struct host1x_syncpt *sp;
>> +	struct host1x_syncpt *sp = NULL;
>>   	struct host1x_job *job;
>>   	unsigned int num_refs;
>>   	int err;
>> @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   		goto fail;
>>   	}
>>   
>> -	/* check whether syncpoint ID is valid */
>> -	sp = host1x_syncpt_get(host1x, syncpt.id);
>> +	/* Syncpoint ref will be dropped on job release. */
>> +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
> 
> It's a bit odd to replace the comment like that. Perhaps instead of
> replacing it, just extend it with the note about the lifetime?

I replaced it because in the past the check was there really to just 
check if the ID is valid (the pointer was thrown away) -- now we 
actually pass the pointer into the job structure, so it serves a more 
general "get the syncpoint" purpose which is clear based on the name of 
the function. The new comment is then a new comment to clarify the 
lifetime of the reference.

> 
>>   	if (!sp) {
>>   		err = -ENOENT;
>>   		goto fail;
>> @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   	job->is_addr_reg = context->client->ops->is_addr_reg;
>>   	job->is_valid_class = context->client->ops->is_valid_class;
>>   	job->syncpt_incrs = syncpt.incrs;
>> -	job->syncpt_id = syncpt.id;
>> +	job->syncpt = sp;
>>   	job->timeout = 10000;
>>   
>>   	if (args->timeout && args->timeout < 10000)
>> @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
>>   	struct drm_tegra_syncpt_read *args = data;
>>   	struct host1x_syncpt *sp;
>>   
>> -	sp = host1x_syncpt_get(host, args->id);
>> +	sp = host1x_syncpt_get_by_id_noref(host, args->id);
> 
> Why don't we need a reference here? It's perhaps unlikely, because this
> function is short-lived, but the otherwise last reference to this could
> go away at any point after this line and cause sp to become invalid.
> 
> In general it's very rare to not have to keep a reference to a reference
> counted object.

Having a reference to a syncpoint indicates ownership of the syncpoint. 
Since here we are just reading it, we don't want ownership. (The non 
_noref functions will fail if the syncpoint is not currently allocated, 
which would break this interface.) The host1x_syncpt structure itself 
always exists even if the refcount drops to zero.

> 
>>   	if (!sp)
>>   		return -EINVAL;
>>   
>> @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
>>   	struct drm_tegra_syncpt_incr *args = data;
>>   	struct host1x_syncpt *sp;
>>   
>> -	sp = host1x_syncpt_get(host1x, args->id);
>> +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
> 
> Same here. Or am I missing some other way by which it is ensured that
> the reference stays around?

As above, though here we actually mutate the syncpoint even though we 
don't have a reference and as such ownership. But that's just a quirk of 
this old interface allowing incrementing of syncpoints you don't own.

> 
> Generally I like this because it makes the handling of syncpoints much
> more consistent.
> 
> Thierry
> 

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2021-03-23 10:44       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 10:44 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 12:36 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
>> Add reference counting for allocated syncpoints to allow keeping
>> them allocated while jobs are referencing them. Additionally,
>> clean up various places using syncpoint IDs to use host1x_syncpt
>> pointers instead.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> - Remove host1x_syncpt_put in submit code, as job_put already
>>    puts the syncpoint.
>> - Changes due to rebase in VI driver.
>> v4:
>> - Update from _free to _put in VI driver as well
>> ---
>>   drivers/gpu/drm/tegra/dc.c             |  4 +-
>>   drivers/gpu/drm/tegra/drm.c            | 14 ++---
>>   drivers/gpu/drm/tegra/gr2d.c           |  4 +-
>>   drivers/gpu/drm/tegra/gr3d.c           |  4 +-
>>   drivers/gpu/drm/tegra/vic.c            |  4 +-
>>   drivers/gpu/host1x/cdma.c              | 11 ++--
>>   drivers/gpu/host1x/dev.h               |  7 ++-
>>   drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
>>   drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
>>   drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
>>   drivers/gpu/host1x/job.c               |  5 +-
>>   drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
>>   drivers/gpu/host1x/syncpt.h            |  3 ++
>>   drivers/staging/media/tegra-video/vi.c |  4 +-
>>   include/linux/host1x.h                 |  8 +--
>>   15 files changed, 98 insertions(+), 59 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
>> index 85dd7131553a..033032dfc4b9 100644
>> --- a/drivers/gpu/drm/tegra/dc.c
>> +++ b/drivers/gpu/drm/tegra/dc.c
>> @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
>>   		drm_plane_cleanup(primary);
>>   
>>   	host1x_client_iommu_detach(client);
>> -	host1x_syncpt_free(dc->syncpt);
>> +	host1x_syncpt_put(dc->syncpt);
>>   
>>   	return err;
>>   }
>> @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
>>   	}
>>   
>>   	host1x_client_iommu_detach(client);
>> -	host1x_syncpt_free(dc->syncpt);
>> +	host1x_syncpt_put(dc->syncpt);
>>   
>>   	return 0;
>>   }
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index e45c8414e2a3..5a6037eff37f 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   	struct drm_tegra_syncpt syncpt;
>>   	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
>>   	struct drm_gem_object **refs;
>> -	struct host1x_syncpt *sp;
>> +	struct host1x_syncpt *sp = NULL;
>>   	struct host1x_job *job;
>>   	unsigned int num_refs;
>>   	int err;
>> @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   		goto fail;
>>   	}
>>   
>> -	/* check whether syncpoint ID is valid */
>> -	sp = host1x_syncpt_get(host1x, syncpt.id);
>> +	/* Syncpoint ref will be dropped on job release. */
>> +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
> 
> It's a bit odd to replace the comment like that. Perhaps instead of
> replacing it, just extend it with the note about the lifetime?

I replaced it because in the past the check was there really to just 
check if the ID is valid (the pointer was thrown away) -- now we 
actually pass the pointer into the job structure, so it serves a more 
general "get the syncpoint" purpose which is clear based on the name of 
the function. The new comment is then a new comment to clarify the 
lifetime of the reference.

> 
>>   	if (!sp) {
>>   		err = -ENOENT;
>>   		goto fail;
>> @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>   	job->is_addr_reg = context->client->ops->is_addr_reg;
>>   	job->is_valid_class = context->client->ops->is_valid_class;
>>   	job->syncpt_incrs = syncpt.incrs;
>> -	job->syncpt_id = syncpt.id;
>> +	job->syncpt = sp;
>>   	job->timeout = 10000;
>>   
>>   	if (args->timeout && args->timeout < 10000)
>> @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
>>   	struct drm_tegra_syncpt_read *args = data;
>>   	struct host1x_syncpt *sp;
>>   
>> -	sp = host1x_syncpt_get(host, args->id);
>> +	sp = host1x_syncpt_get_by_id_noref(host, args->id);
> 
> Why don't we need a reference here? It's perhaps unlikely, because this
> function is short-lived, but the otherwise last reference to this could
> go away at any point after this line and cause sp to become invalid.
> 
> In general it's very rare to not have to keep a reference to a reference
> counted object.

Having a reference to a syncpoint indicates ownership of the syncpoint. 
Since here we are just reading it, we don't want ownership. (The non 
_noref functions will fail if the syncpoint is not currently allocated, 
which would break this interface.) The host1x_syncpt structure itself 
always exists even if the refcount drops to zero.

> 
>>   	if (!sp)
>>   		return -EINVAL;
>>   
>> @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
>>   	struct drm_tegra_syncpt_incr *args = data;
>>   	struct host1x_syncpt *sp;
>>   
>> -	sp = host1x_syncpt_get(host1x, args->id);
>> +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
> 
> Same here. Or am I missing some other way by which it is ensured that
> the reference stays around?

As above, though here we actually mutate the syncpoint even though we 
don't have a reference and as such ownership. But that's just a quirk of 
this old interface allowing incrementing of syncpoints you don't own.

> 
> Generally I like this because it makes the handling of syncpoints much
> more consistent.
> 
> Thierry
> 

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 10:52     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:52 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 5488 bytes --]

On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
> Add the userspace interface header, specifying interfaces
> for allocating and accessing syncpoints from userspace,
> and for creating sync_file based fences based on syncpoint
> thresholds.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 134 insertions(+)
>  create mode 100644 include/uapi/linux/host1x.h

What's the number of these syncpoints that we expect userspace to
create? There's a limited amount of open file descriptors available by
default, so this needs to be kept reasonably low.

> diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
> new file mode 100644
> index 000000000000..9c8fb9425cb2
> --- /dev/null
> +++ b/include/uapi/linux/host1x.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _UAPI__LINUX_HOST1X_H
> +#define _UAPI__LINUX_HOST1X_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +struct host1x_allocate_syncpoint {
> +	/**
> +	 * @fd: [out]
> +	 *
> +	 * New file descriptor representing the allocated syncpoint.
> +	 */
> +	__s32 fd;
> +
> +	__u32 reserved[3];
> +};
> +
> +struct host1x_syncpoint_info {
> +	/**
> +	 * @id: [out]
> +	 *
> +	 * System-global ID of the syncpoint.
> +	 */
> +	__u32 id;
> +
> +	__u32 reserved[3];
> +};

Given that this has only out parameters, I expect this will be called on
the FD returned by HOST1X_IOCTL_ALLOCATE_SYNCPOINT? It might be worth
pointing that out explicitly in a comment.

> +
> +struct host1x_syncpoint_increment {
> +	/**
> +	 * @count: [in]
> +	 *
> +	 * Number of times to increment the syncpoint. The syncpoint can
> +	 * be observed at in-between values, but each increment is atomic.
> +	 */
> +	__u32 count;
> +};

This seems like it would have to be called on the FD as well...

> +
> +struct host1x_read_syncpoint {
> +	/**
> +	 * @id: [in]
> +	 *
> +	 * ID of the syncpoint to read.
> +	 */
> +	__u32 id;
> +
> +	/**
> +	 * @value: [out]
> +	 *
> +	 * Current value of the syncpoint.
> +	 */
> +	__u32 value;
> +};

... but then, all of a sudden you seem to switch things around and allow
reading the value of an arbitrary syncpoint specified by ID.

Now, I suspect that's because reading the syncpoint is harmless and does
not allow abuse, whereas incrementing could be abused if allowed on an
arbitrary syncpoint ID. But I think it's worth spelling all that out in
some documentation to make this clear from a security point of view and
from a usability point of view for people trying to figure out how to
use these interfaces.

> +
> +struct host1x_create_fence {
> +	/**
> +	 * @id: [in]
> +	 *
> +	 * ID of the syncpoint to create a fence for.
> +	 */
> +	__u32 id;
> +
> +	/**
> +	 * @threshold: [in]
> +	 *
> +	 * When the syncpoint reaches this value, the fence will be signaled.
> +	 * The syncpoint is considered to have reached the threshold when the
> +	 * following condition is true:
> +	 *
> +	 * 	((value - threshold) & 0x80000000U) == 0U
> +	 *
> +	 */
> +	__u32 threshold;
> +
> +	/**
> +	 * @fence_fd: [out]
> +	 *
> +	 * New sync_file file descriptor containing the created fence.
> +	 */
> +	__s32 fence_fd;
> +
> +	__u32 reserved[1];
> +};

Again this takes an arbitrary syncpoint ID as input, so I expect that
the corresponding IOCTL will have to be called on the host1x device
node? Again, I think it would be good to either point that out for each
structure or IOCTL, or alternatively maybe reorder these such that this
becomes clearer.

> +
> +struct host1x_fence_extract_fence {
> +	__u32 id;
> +	__u32 threshold;
> +};
> +
> +struct host1x_fence_extract {
> +	/**
> +	 * @fence_fd: [in]
> +	 *
> +	 * sync_file file descriptor
> +	 */
> +	__s32 fence_fd;
> +
> +	/**
> +	 * @num_fences: [in,out]
> +	 *
> +	 * In: size of the `fences_ptr` array counted in elements.
> +	 * Out: required size of the `fences_ptr` array counted in elements.
> +	 */
> +	__u32 num_fences;
> +
> +	/**
> +	 * @fences_ptr: [in]
> +	 *
> +	 * Pointer to array of `struct host1x_fence_extract_fence`.
> +	 */
> +	__u64 fences_ptr;
> +
> +	__u32 reserved[2];
> +};

For the others it's pretty clear to me what the purpose is, but I'm at a
complete loss with this one. What's the use-case for this?

In general I think it'd make sense to add a bit more documentation about
how all these IOCTLs are meant to be used to give people a better
understanding of why these are needed.

Thierry

> +
> +#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
> +#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
> +#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
> +#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
> +#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
> +#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif
> -- 
> 2.30.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
@ 2021-03-23 10:52     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 10:52 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 5488 bytes --]

On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
> Add the userspace interface header, specifying interfaces
> for allocating and accessing syncpoints from userspace,
> and for creating sync_file based fences based on syncpoint
> thresholds.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 134 insertions(+)
>  create mode 100644 include/uapi/linux/host1x.h

What's the number of these syncpoints that we expect userspace to
create? There's a limited amount of open file descriptors available by
default, so this needs to be kept reasonably low.

> diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
> new file mode 100644
> index 000000000000..9c8fb9425cb2
> --- /dev/null
> +++ b/include/uapi/linux/host1x.h
> @@ -0,0 +1,134 @@
> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _UAPI__LINUX_HOST1X_H
> +#define _UAPI__LINUX_HOST1X_H
> +
> +#include <linux/ioctl.h>
> +#include <linux/types.h>
> +
> +#if defined(__cplusplus)
> +extern "C" {
> +#endif
> +
> +struct host1x_allocate_syncpoint {
> +	/**
> +	 * @fd: [out]
> +	 *
> +	 * New file descriptor representing the allocated syncpoint.
> +	 */
> +	__s32 fd;
> +
> +	__u32 reserved[3];
> +};
> +
> +struct host1x_syncpoint_info {
> +	/**
> +	 * @id: [out]
> +	 *
> +	 * System-global ID of the syncpoint.
> +	 */
> +	__u32 id;
> +
> +	__u32 reserved[3];
> +};

Given that this has only out parameters, I expect this will be called on
the FD returned by HOST1X_IOCTL_ALLOCATE_SYNCPOINT? It might be worth
pointing that out explicitly in a comment.

> +
> +struct host1x_syncpoint_increment {
> +	/**
> +	 * @count: [in]
> +	 *
> +	 * Number of times to increment the syncpoint. The syncpoint can
> +	 * be observed at in-between values, but each increment is atomic.
> +	 */
> +	__u32 count;
> +};

This seems like it would have to be called on the FD as well...

> +
> +struct host1x_read_syncpoint {
> +	/**
> +	 * @id: [in]
> +	 *
> +	 * ID of the syncpoint to read.
> +	 */
> +	__u32 id;
> +
> +	/**
> +	 * @value: [out]
> +	 *
> +	 * Current value of the syncpoint.
> +	 */
> +	__u32 value;
> +};

... but then, all of a sudden you seem to switch things around and allow
reading the value of an arbitrary syncpoint specified by ID.

Now, I suspect that's because reading the syncpoint is harmless and does
not allow abuse, whereas incrementing could be abused if allowed on an
arbitrary syncpoint ID. But I think it's worth spelling all that out in
some documentation to make this clear from a security point of view and
from a usability point of view for people trying to figure out how to
use these interfaces.

> +
> +struct host1x_create_fence {
> +	/**
> +	 * @id: [in]
> +	 *
> +	 * ID of the syncpoint to create a fence for.
> +	 */
> +	__u32 id;
> +
> +	/**
> +	 * @threshold: [in]
> +	 *
> +	 * When the syncpoint reaches this value, the fence will be signaled.
> +	 * The syncpoint is considered to have reached the threshold when the
> +	 * following condition is true:
> +	 *
> +	 * 	((value - threshold) & 0x80000000U) == 0U
> +	 *
> +	 */
> +	__u32 threshold;
> +
> +	/**
> +	 * @fence_fd: [out]
> +	 *
> +	 * New sync_file file descriptor containing the created fence.
> +	 */
> +	__s32 fence_fd;
> +
> +	__u32 reserved[1];
> +};

Again this takes an arbitrary syncpoint ID as input, so I expect that
the corresponding IOCTL will have to be called on the host1x device
node? Again, I think it would be good to either point that out for each
structure or IOCTL, or alternatively maybe reorder these such that this
becomes clearer.

> +
> +struct host1x_fence_extract_fence {
> +	__u32 id;
> +	__u32 threshold;
> +};
> +
> +struct host1x_fence_extract {
> +	/**
> +	 * @fence_fd: [in]
> +	 *
> +	 * sync_file file descriptor
> +	 */
> +	__s32 fence_fd;
> +
> +	/**
> +	 * @num_fences: [in,out]
> +	 *
> +	 * In: size of the `fences_ptr` array counted in elements.
> +	 * Out: required size of the `fences_ptr` array counted in elements.
> +	 */
> +	__u32 num_fences;
> +
> +	/**
> +	 * @fences_ptr: [in]
> +	 *
> +	 * Pointer to array of `struct host1x_fence_extract_fence`.
> +	 */
> +	__u64 fences_ptr;
> +
> +	__u32 reserved[2];
> +};

For the others it's pretty clear to me what the purpose is, but I'm at a
complete loss with this one. What's the use-case for this?

In general I think it'd make sense to add a bit more documentation about
how all these IOCTLs are meant to be used to give people a better
understanding of why these are needed.

Thierry

> +
> +#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
> +#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
> +#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
> +#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
> +#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
> +#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
> +
> +#if defined(__cplusplus)
> +}
> +#endif
> +
> +#endif
> -- 
> 2.30.0
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 11:02     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:02 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 6776 bytes --]

On Mon, Jan 11, 2021 at 03:00:06PM +0200, Mikko Perttunen wrote:
> Add the /dev/host1x device node, implementing the following
> functionality:
> 
> - Reading syncpoint values
> - Allocating syncpoints (providing syncpoint FDs)
> - Incrementing syncpoints (based on syncpoint FD)
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v4:
> * Put UAPI under CONFIG_DRM_TEGRA_STAGING
> v3:
> * Pass process name as syncpoint name when allocating
>   syncpoint.
> ---
>  drivers/gpu/host1x/Makefile |   1 +
>  drivers/gpu/host1x/dev.c    |   9 ++
>  drivers/gpu/host1x/dev.h    |   3 +
>  drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/host1x/uapi.h   |  22 +++
>  include/linux/host1x.h      |   2 +
>  6 files changed, 319 insertions(+)
>  create mode 100644 drivers/gpu/host1x/uapi.c
>  create mode 100644 drivers/gpu/host1x/uapi.h
> 
> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> index 096017b8789d..882f928d75e1 100644
> --- a/drivers/gpu/host1x/Makefile
> +++ b/drivers/gpu/host1x/Makefile
> @@ -9,6 +9,7 @@ host1x-y = \
>  	job.o \
>  	debug.o \
>  	mipi.o \
> +	uapi.o \
>  	hw/host1x01.o \
>  	hw/host1x02.o \
>  	hw/host1x04.o \
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index d0ebb70e2fdd..641317d23828 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
>  		goto deinit_syncpt;
>  	}
>  
> +	err = host1x_uapi_init(&host->uapi, host);

It's a bit pointless to pass &host->uapi and host to the function since
you can access the former through the latter.

> +	if (err) {
> +		dev_err(&pdev->dev, "failed to initialize uapi\n");

s/uapi/UAPI/, and perhaps include the error code to give a better hint
as to why things failed.

> +		goto deinit_intr;
> +	}
> +
>  	host1x_debug_init(host);
>  
>  	if (host->info->has_hypervisor)
> @@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
>  	host1x_unregister(host);
>  deinit_debugfs:
>  	host1x_debug_deinit(host);
> +	host1x_uapi_deinit(&host->uapi);
> +deinit_intr:
>  	host1x_intr_deinit(host);
>  deinit_syncpt:
>  	host1x_syncpt_deinit(host);
> @@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
>  
>  	host1x_unregister(host);
>  	host1x_debug_deinit(host);
> +	host1x_uapi_deinit(&host->uapi);
>  	host1x_intr_deinit(host);
>  	host1x_syncpt_deinit(host);
>  	reset_control_assert(host->rst);
> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> index 63010ae37a97..7b8b7e20e32b 100644
> --- a/drivers/gpu/host1x/dev.h
> +++ b/drivers/gpu/host1x/dev.h
> @@ -17,6 +17,7 @@
>  #include "intr.h"
>  #include "job.h"
>  #include "syncpt.h"
> +#include "uapi.h"
>  
>  struct host1x_syncpt;
>  struct host1x_syncpt_base;
> @@ -143,6 +144,8 @@ struct host1x {
>  	struct list_head list;
>  
>  	struct device_dma_parameters dma_parms;
> +
> +	struct host1x_uapi uapi;
>  };
>  
>  void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
> diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
> new file mode 100644
> index 000000000000..27b8761c3f35
> --- /dev/null
> +++ b/drivers/gpu/host1x/uapi.c
> @@ -0,0 +1,282 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * /dev/host1x syncpoint interface
> + *
> + * Copyright (c) 2020, NVIDIA Corporation.
> + */
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/cdev.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/host1x.h>
> +#include <linux/nospec.h>
> +
> +#include "dev.h"
> +#include "syncpt.h"
> +#include "uapi.h"
> +
> +#include <uapi/linux/host1x.h>
> +
> +static int syncpt_file_release(struct inode *inode, struct file *file)
> +{
> +	struct host1x_syncpt *sp = file->private_data;
> +
> +	host1x_syncpt_put(sp);
> +
> +	return 0;
> +}
> +
> +static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
> +{
> +	struct host1x_syncpoint_info args;
> +	unsigned long copy_err;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
> +		return -EINVAL;

Yes! \o/

> +
> +	args.id = sp->id;
> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
> +{
> +	struct host1x_syncpoint_increment args;
> +	unsigned long copy_err;
> +	u32 i;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	for (i = 0; i < args.count; i++) {
> +		host1x_syncpt_incr(sp);
> +		if (signal_pending(current))
> +			return -EINTR;
> +	}
> +
> +	return 0;
> +}
> +
> +static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
> +			      unsigned long arg)
> +{
> +	void __user *data = (void __user *)arg;
> +	long err;
> +
> +	switch (cmd) {
> +	case HOST1X_IOCTL_SYNCPOINT_INFO:
> +		err = syncpt_file_ioctl_info(file->private_data, data);
> +		break;
> +
> +	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
> +		err = syncpt_file_ioctl_incr(file->private_data, data);
> +		break;
> +
> +	default:
> +		err = -ENOTTY;
> +	}
> +
> +	return err;
> +}

I wonder if it's worth adding some more logic to this demuxing. I'm
thinking along the lines of what the DRM IOCTL demuxer does, which
ultimately allows the IOCTLs to be extended. It does this by doing a
bit of sanitizing and removing the parameter size field from the cmd
argument so that the same IOCTL may handle different parameter sizes.

> +static const struct file_operations syncpt_file_fops = {
> +	.owner = THIS_MODULE,
> +	.release = syncpt_file_release,
> +	.unlocked_ioctl = syncpt_file_ioctl,
> +	.compat_ioctl = syncpt_file_ioctl,
> +};
> +
> +struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
> +{
> +	struct host1x_syncpt *sp;
> +	struct file *file = fget(fd);
> +
> +	if (!file)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (file->f_op != &syncpt_file_fops) {
> +		fput(file);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	sp = file->private_data;
> +
> +	host1x_syncpt_get(sp);
> +
> +	fput(file);
> +
> +	return sp;
> +}
> +EXPORT_SYMBOL(host1x_syncpt_fd_get);
> +
> +static int dev_file_open(struct inode *inode, struct file *file)

Maybe use the more specific host1x_ as prefix instead of the generic
dev_? That might make things like stack traces more readable.

Otherwise looks good.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
@ 2021-03-23 11:02     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:02 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 6776 bytes --]

On Mon, Jan 11, 2021 at 03:00:06PM +0200, Mikko Perttunen wrote:
> Add the /dev/host1x device node, implementing the following
> functionality:
> 
> - Reading syncpoint values
> - Allocating syncpoints (providing syncpoint FDs)
> - Incrementing syncpoints (based on syncpoint FD)
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v4:
> * Put UAPI under CONFIG_DRM_TEGRA_STAGING
> v3:
> * Pass process name as syncpoint name when allocating
>   syncpoint.
> ---
>  drivers/gpu/host1x/Makefile |   1 +
>  drivers/gpu/host1x/dev.c    |   9 ++
>  drivers/gpu/host1x/dev.h    |   3 +
>  drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/host1x/uapi.h   |  22 +++
>  include/linux/host1x.h      |   2 +
>  6 files changed, 319 insertions(+)
>  create mode 100644 drivers/gpu/host1x/uapi.c
>  create mode 100644 drivers/gpu/host1x/uapi.h
> 
> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> index 096017b8789d..882f928d75e1 100644
> --- a/drivers/gpu/host1x/Makefile
> +++ b/drivers/gpu/host1x/Makefile
> @@ -9,6 +9,7 @@ host1x-y = \
>  	job.o \
>  	debug.o \
>  	mipi.o \
> +	uapi.o \
>  	hw/host1x01.o \
>  	hw/host1x02.o \
>  	hw/host1x04.o \
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index d0ebb70e2fdd..641317d23828 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
>  		goto deinit_syncpt;
>  	}
>  
> +	err = host1x_uapi_init(&host->uapi, host);

It's a bit pointless to pass &host->uapi and host to the function since
you can access the former through the latter.

> +	if (err) {
> +		dev_err(&pdev->dev, "failed to initialize uapi\n");

s/uapi/UAPI/, and perhaps include the error code to give a better hint
as to why things failed.

> +		goto deinit_intr;
> +	}
> +
>  	host1x_debug_init(host);
>  
>  	if (host->info->has_hypervisor)
> @@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
>  	host1x_unregister(host);
>  deinit_debugfs:
>  	host1x_debug_deinit(host);
> +	host1x_uapi_deinit(&host->uapi);
> +deinit_intr:
>  	host1x_intr_deinit(host);
>  deinit_syncpt:
>  	host1x_syncpt_deinit(host);
> @@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
>  
>  	host1x_unregister(host);
>  	host1x_debug_deinit(host);
> +	host1x_uapi_deinit(&host->uapi);
>  	host1x_intr_deinit(host);
>  	host1x_syncpt_deinit(host);
>  	reset_control_assert(host->rst);
> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> index 63010ae37a97..7b8b7e20e32b 100644
> --- a/drivers/gpu/host1x/dev.h
> +++ b/drivers/gpu/host1x/dev.h
> @@ -17,6 +17,7 @@
>  #include "intr.h"
>  #include "job.h"
>  #include "syncpt.h"
> +#include "uapi.h"
>  
>  struct host1x_syncpt;
>  struct host1x_syncpt_base;
> @@ -143,6 +144,8 @@ struct host1x {
>  	struct list_head list;
>  
>  	struct device_dma_parameters dma_parms;
> +
> +	struct host1x_uapi uapi;
>  };
>  
>  void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
> diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
> new file mode 100644
> index 000000000000..27b8761c3f35
> --- /dev/null
> +++ b/drivers/gpu/host1x/uapi.c
> @@ -0,0 +1,282 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * /dev/host1x syncpoint interface
> + *
> + * Copyright (c) 2020, NVIDIA Corporation.
> + */
> +
> +#include <linux/anon_inodes.h>
> +#include <linux/cdev.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/host1x.h>
> +#include <linux/nospec.h>
> +
> +#include "dev.h"
> +#include "syncpt.h"
> +#include "uapi.h"
> +
> +#include <uapi/linux/host1x.h>
> +
> +static int syncpt_file_release(struct inode *inode, struct file *file)
> +{
> +	struct host1x_syncpt *sp = file->private_data;
> +
> +	host1x_syncpt_put(sp);
> +
> +	return 0;
> +}
> +
> +static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
> +{
> +	struct host1x_syncpoint_info args;
> +	unsigned long copy_err;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
> +		return -EINVAL;

Yes! \o/

> +
> +	args.id = sp->id;
> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
> +{
> +	struct host1x_syncpoint_increment args;
> +	unsigned long copy_err;
> +	u32 i;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	for (i = 0; i < args.count; i++) {
> +		host1x_syncpt_incr(sp);
> +		if (signal_pending(current))
> +			return -EINTR;
> +	}
> +
> +	return 0;
> +}
> +
> +static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
> +			      unsigned long arg)
> +{
> +	void __user *data = (void __user *)arg;
> +	long err;
> +
> +	switch (cmd) {
> +	case HOST1X_IOCTL_SYNCPOINT_INFO:
> +		err = syncpt_file_ioctl_info(file->private_data, data);
> +		break;
> +
> +	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
> +		err = syncpt_file_ioctl_incr(file->private_data, data);
> +		break;
> +
> +	default:
> +		err = -ENOTTY;
> +	}
> +
> +	return err;
> +}

I wonder if it's worth adding some more logic to this demuxing. I'm
thinking along the lines of what the DRM IOCTL demuxer does, which
ultimately allows the IOCTLs to be extended. It does this by doing a
bit of sanitizing and removing the parameter size field from the cmd
argument so that the same IOCTL may handle different parameter sizes.

> +static const struct file_operations syncpt_file_fops = {
> +	.owner = THIS_MODULE,
> +	.release = syncpt_file_release,
> +	.unlocked_ioctl = syncpt_file_ioctl,
> +	.compat_ioctl = syncpt_file_ioctl,
> +};
> +
> +struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
> +{
> +	struct host1x_syncpt *sp;
> +	struct file *file = fget(fd);
> +
> +	if (!file)
> +		return ERR_PTR(-EINVAL);
> +
> +	if (file->f_op != &syncpt_file_fops) {
> +		fput(file);
> +		return ERR_PTR(-EINVAL);
> +	}
> +
> +	sp = file->private_data;
> +
> +	host1x_syncpt_get(sp);
> +
> +	fput(file);
> +
> +	return sp;
> +}
> +EXPORT_SYMBOL(host1x_syncpt_fd_get);
> +
> +static int dev_file_open(struct inode *inode, struct file *file)

Maybe use the more specific host1x_ as prefix instead of the generic
dev_? That might make things like stack traces more readable.

Otherwise looks good.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
  2021-03-23 10:52     ` Thierry Reding
@ 2021-03-23 11:12       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 11:12 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

On 3/23/21 12:52 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
>> Add the userspace interface header, specifying interfaces
>> for allocating and accessing syncpoints from userspace,
>> and for creating sync_file based fences based on syncpoint
>> thresholds.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 134 insertions(+)
>>   create mode 100644 include/uapi/linux/host1x.h
> 
> What's the number of these syncpoints that we expect userspace to
> create? There's a limited amount of open file descriptors available by
> default, so this needs to be kept reasonably low.
> 
>> diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
>> new file mode 100644
>> index 000000000000..9c8fb9425cb2
>> --- /dev/null
>> +++ b/include/uapi/linux/host1x.h
>> @@ -0,0 +1,134 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _UAPI__LINUX_HOST1X_H
>> +#define _UAPI__LINUX_HOST1X_H
>> +
>> +#include <linux/ioctl.h>
>> +#include <linux/types.h>
>> +
>> +#if defined(__cplusplus)
>> +extern "C" {
>> +#endif
>> +
>> +struct host1x_allocate_syncpoint {
>> +	/**
>> +	 * @fd: [out]
>> +	 *
>> +	 * New file descriptor representing the allocated syncpoint.
>> +	 */
>> +	__s32 fd;
>> +
>> +	__u32 reserved[3];
>> +};
>> +
>> +struct host1x_syncpoint_info {
>> +	/**
>> +	 * @id: [out]
>> +	 *
>> +	 * System-global ID of the syncpoint.
>> +	 */
>> +	__u32 id;
>> +
>> +	__u32 reserved[3];
>> +};
> 
> Given that this has only out parameters, I expect this will be called on
> the FD returned by HOST1X_IOCTL_ALLOCATE_SYNCPOINT? It might be worth
> pointing that out explicitly in a comment.
> 

Correct.

>> +
>> +struct host1x_syncpoint_increment {
>> +	/**
>> +	 * @count: [in]
>> +	 *
>> +	 * Number of times to increment the syncpoint. The syncpoint can
>> +	 * be observed at in-between values, but each increment is atomic.
>> +	 */
>> +	__u32 count;
>> +};
> 
> This seems like it would have to be called on the FD as well...

Yep.

> 
>> +
>> +struct host1x_read_syncpoint {
>> +	/**
>> +	 * @id: [in]
>> +	 *
>> +	 * ID of the syncpoint to read.
>> +	 */
>> +	__u32 id;
>> +
>> +	/**
>> +	 * @value: [out]
>> +	 *
>> +	 * Current value of the syncpoint.
>> +	 */
>> +	__u32 value;
>> +};
> 
> ... but then, all of a sudden you seem to switch things around and allow
> reading the value of an arbitrary syncpoint specified by ID.
> 
> Now, I suspect that's because reading the syncpoint is harmless and does
> not allow abuse, whereas incrementing could be abused if allowed on an
> arbitrary syncpoint ID. But I think it's worth spelling all that out in
> some documentation to make this clear from a security point of view and
> from a usability point of view for people trying to figure out how to
> use these interfaces.

Yeah. The model is that reading any syncpoint is OK but writing is not. 
I think these things were mentioned in the original proposal text but I 
did not carry them over to the comments. Will fix (however see below)

> 
>> +
>> +struct host1x_create_fence {
>> +	/**
>> +	 * @id: [in]
>> +	 *
>> +	 * ID of the syncpoint to create a fence for.
>> +	 */
>> +	__u32 id;
>> +
>> +	/**
>> +	 * @threshold: [in]
>> +	 *
>> +	 * When the syncpoint reaches this value, the fence will be signaled.
>> +	 * The syncpoint is considered to have reached the threshold when the
>> +	 * following condition is true:
>> +	 *
>> +	 * 	((value - threshold) & 0x80000000U) == 0U
>> +	 *
>> +	 */
>> +	__u32 threshold;
>> +
>> +	/**
>> +	 * @fence_fd: [out]
>> +	 *
>> +	 * New sync_file file descriptor containing the created fence.
>> +	 */
>> +	__s32 fence_fd;
>> +
>> +	__u32 reserved[1];
>> +};
> 
> Again this takes an arbitrary syncpoint ID as input, so I expect that
> the corresponding IOCTL will have to be called on the host1x device
> node? Again, I think it would be good to either point that out for each
> structure or IOCTL, or alternatively maybe reorder these such that this
> becomes clearer.
> 
>> +
>> +struct host1x_fence_extract_fence {
>> +	__u32 id;
>> +	__u32 threshold;
>> +};
>> +
>> +struct host1x_fence_extract {
>> +	/**
>> +	 * @fence_fd: [in]
>> +	 *
>> +	 * sync_file file descriptor
>> +	 */
>> +	__s32 fence_fd;
>> +
>> +	/**
>> +	 * @num_fences: [in,out]
>> +	 *
>> +	 * In: size of the `fences_ptr` array counted in elements.
>> +	 * Out: required size of the `fences_ptr` array counted in elements.
>> +	 */
>> +	__u32 num_fences;
>> +
>> +	/**
>> +	 * @fences_ptr: [in]
>> +	 *
>> +	 * Pointer to array of `struct host1x_fence_extract_fence`.
>> +	 */
>> +	__u64 fences_ptr;
>> +
>> +	__u32 reserved[2];
>> +};
> 
> For the others it's pretty clear to me what the purpose is, but I'm at a
> complete loss with this one. What's the use-case for this?

This is needed to process incoming prefences for userspace-programmed 
engines -- mainly, the GPU with usermode submit enabled.

To align with other upstream code, I've been thinking of removing this 
whole UAPI; moving the syncpoint allocation part to the DRM UAPI, and 
dropping the sync_file stuff altogether (if we have support for job 
submission outputting syncobjs, those could still be converted into 
sync_files). This doesn't support usecases like GPU usermode submit, so 
for downstream we'll have to add it back in, though. Would like to hear 
your opinion on it as well.

Mikko

> 
> In general I think it'd make sense to add a bit more documentation about
> how all these IOCTLs are meant to be used to give people a better
> understanding of why these are needed.
> 
> Thierry
> 
>> +
>> +#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
>> +#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
>> +#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
>> +#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
>> +#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
>> +#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
>> +
>> +#if defined(__cplusplus)
>> +}
>> +#endif
>> +
>> +#endif
>> -- 
>> 2.30.0
>>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
@ 2021-03-23 11:12       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 11:12 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 12:52 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
>> Add the userspace interface header, specifying interfaces
>> for allocating and accessing syncpoints from userspace,
>> and for creating sync_file based fences based on syncpoint
>> thresholds.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   include/uapi/linux/host1x.h | 134 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 134 insertions(+)
>>   create mode 100644 include/uapi/linux/host1x.h
> 
> What's the number of these syncpoints that we expect userspace to
> create? There's a limited amount of open file descriptors available by
> default, so this needs to be kept reasonably low.
> 
>> diff --git a/include/uapi/linux/host1x.h b/include/uapi/linux/host1x.h
>> new file mode 100644
>> index 000000000000..9c8fb9425cb2
>> --- /dev/null
>> +++ b/include/uapi/linux/host1x.h
>> @@ -0,0 +1,134 @@
>> +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _UAPI__LINUX_HOST1X_H
>> +#define _UAPI__LINUX_HOST1X_H
>> +
>> +#include <linux/ioctl.h>
>> +#include <linux/types.h>
>> +
>> +#if defined(__cplusplus)
>> +extern "C" {
>> +#endif
>> +
>> +struct host1x_allocate_syncpoint {
>> +	/**
>> +	 * @fd: [out]
>> +	 *
>> +	 * New file descriptor representing the allocated syncpoint.
>> +	 */
>> +	__s32 fd;
>> +
>> +	__u32 reserved[3];
>> +};
>> +
>> +struct host1x_syncpoint_info {
>> +	/**
>> +	 * @id: [out]
>> +	 *
>> +	 * System-global ID of the syncpoint.
>> +	 */
>> +	__u32 id;
>> +
>> +	__u32 reserved[3];
>> +};
> 
> Given that this has only out parameters, I expect this will be called on
> the FD returned by HOST1X_IOCTL_ALLOCATE_SYNCPOINT? It might be worth
> pointing that out explicitly in a comment.
> 

Correct.

>> +
>> +struct host1x_syncpoint_increment {
>> +	/**
>> +	 * @count: [in]
>> +	 *
>> +	 * Number of times to increment the syncpoint. The syncpoint can
>> +	 * be observed at in-between values, but each increment is atomic.
>> +	 */
>> +	__u32 count;
>> +};
> 
> This seems like it would have to be called on the FD as well...

Yep.

> 
>> +
>> +struct host1x_read_syncpoint {
>> +	/**
>> +	 * @id: [in]
>> +	 *
>> +	 * ID of the syncpoint to read.
>> +	 */
>> +	__u32 id;
>> +
>> +	/**
>> +	 * @value: [out]
>> +	 *
>> +	 * Current value of the syncpoint.
>> +	 */
>> +	__u32 value;
>> +};
> 
> ... but then, all of a sudden you seem to switch things around and allow
> reading the value of an arbitrary syncpoint specified by ID.
> 
> Now, I suspect that's because reading the syncpoint is harmless and does
> not allow abuse, whereas incrementing could be abused if allowed on an
> arbitrary syncpoint ID. But I think it's worth spelling all that out in
> some documentation to make this clear from a security point of view and
> from a usability point of view for people trying to figure out how to
> use these interfaces.

Yeah. The model is that reading any syncpoint is OK but writing is not. 
I think these things were mentioned in the original proposal text but I 
did not carry them over to the comments. Will fix (however see below)

> 
>> +
>> +struct host1x_create_fence {
>> +	/**
>> +	 * @id: [in]
>> +	 *
>> +	 * ID of the syncpoint to create a fence for.
>> +	 */
>> +	__u32 id;
>> +
>> +	/**
>> +	 * @threshold: [in]
>> +	 *
>> +	 * When the syncpoint reaches this value, the fence will be signaled.
>> +	 * The syncpoint is considered to have reached the threshold when the
>> +	 * following condition is true:
>> +	 *
>> +	 * 	((value - threshold) & 0x80000000U) == 0U
>> +	 *
>> +	 */
>> +	__u32 threshold;
>> +
>> +	/**
>> +	 * @fence_fd: [out]
>> +	 *
>> +	 * New sync_file file descriptor containing the created fence.
>> +	 */
>> +	__s32 fence_fd;
>> +
>> +	__u32 reserved[1];
>> +};
> 
> Again this takes an arbitrary syncpoint ID as input, so I expect that
> the corresponding IOCTL will have to be called on the host1x device
> node? Again, I think it would be good to either point that out for each
> structure or IOCTL, or alternatively maybe reorder these such that this
> becomes clearer.
> 
>> +
>> +struct host1x_fence_extract_fence {
>> +	__u32 id;
>> +	__u32 threshold;
>> +};
>> +
>> +struct host1x_fence_extract {
>> +	/**
>> +	 * @fence_fd: [in]
>> +	 *
>> +	 * sync_file file descriptor
>> +	 */
>> +	__s32 fence_fd;
>> +
>> +	/**
>> +	 * @num_fences: [in,out]
>> +	 *
>> +	 * In: size of the `fences_ptr` array counted in elements.
>> +	 * Out: required size of the `fences_ptr` array counted in elements.
>> +	 */
>> +	__u32 num_fences;
>> +
>> +	/**
>> +	 * @fences_ptr: [in]
>> +	 *
>> +	 * Pointer to array of `struct host1x_fence_extract_fence`.
>> +	 */
>> +	__u64 fences_ptr;
>> +
>> +	__u32 reserved[2];
>> +};
> 
> For the others it's pretty clear to me what the purpose is, but I'm at a
> complete loss with this one. What's the use-case for this?

This is needed to process incoming prefences for userspace-programmed 
engines -- mainly, the GPU with usermode submit enabled.

To align with other upstream code, I've been thinking of removing this 
whole UAPI; moving the syncpoint allocation part to the DRM UAPI, and 
dropping the sync_file stuff altogether (if we have support for job 
submission outputting syncobjs, those could still be converted into 
sync_files). This doesn't support usecases like GPU usermode submit, so 
for downstream we'll have to add it back in, though. Would like to hear 
your opinion on it as well.

Mikko

> 
> In general I think it'd make sense to add a bit more documentation about
> how all these IOCTLs are meant to be used to give people a better
> understanding of why these are needed.
> 
> Thierry
> 
>> +
>> +#define HOST1X_IOCTL_ALLOCATE_SYNCPOINT  _IOWR('X', 0x00, struct host1x_allocate_syncpoint)
>> +#define HOST1X_IOCTL_READ_SYNCPOINT      _IOR ('X', 0x01, struct host1x_read_syncpoint)
>> +#define HOST1X_IOCTL_CREATE_FENCE        _IOWR('X', 0x02, struct host1x_create_fence)
>> +#define HOST1X_IOCTL_SYNCPOINT_INFO      _IOWR('X', 0x03, struct host1x_syncpoint_info)
>> +#define HOST1X_IOCTL_SYNCPOINT_INCREMENT _IOWR('X', 0x04, struct host1x_syncpoint_increment)
>> +#define HOST1X_IOCTL_FENCE_EXTRACT       _IOWR('X', 0x05, struct host1x_fence_extract)
>> +
>> +#if defined(__cplusplus)
>> +}
>> +#endif
>> +
>> +#endif
>> -- 
>> 2.30.0
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 09/21] gpu: host1x: DMA fences and userspace fence creation
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 11:15     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:15 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 7525 bytes --]

On Mon, Jan 11, 2021 at 03:00:07PM +0200, Mikko Perttunen wrote:
> Add an implementation of dma_fences based on syncpoints. Syncpoint
> interrupts are used to signal fences. Additionally, after
> software signaling has been enabled, a 30 second timeout is started.
> If the syncpoint threshold is not reached within this period,
> the fence is signalled with an -ETIMEDOUT error code. This is to
> allow fences that would never reach their syncpoint threshold to
> be cleaned up.
> 
> Additionally, add a new /dev/host1x IOCTL for creating sync_file
> file descriptors backed by syncpoint fences.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Update for change in put_ref prototype.
> v4:
> * Fix _signal prototype and include it to avoid warning
> * Remove use of unused local in error path
> v3:
> * Move declaration of host1x_fence_extract to public header
> ---
>  drivers/gpu/host1x/Makefile |   1 +
>  drivers/gpu/host1x/fence.c  | 208 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/host1x/fence.h  |  13 +++
>  drivers/gpu/host1x/intr.c   |   9 ++
>  drivers/gpu/host1x/intr.h   |   2 +
>  drivers/gpu/host1x/uapi.c   | 103 ++++++++++++++++++
>  include/linux/host1x.h      |   4 +
>  7 files changed, 340 insertions(+)
>  create mode 100644 drivers/gpu/host1x/fence.c
>  create mode 100644 drivers/gpu/host1x/fence.h
> 
> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> index 882f928d75e1..a48af2cefae1 100644
> --- a/drivers/gpu/host1x/Makefile
> +++ b/drivers/gpu/host1x/Makefile
> @@ -10,6 +10,7 @@ host1x-y = \
>  	debug.o \
>  	mipi.o \
>  	uapi.o \
> +	fence.o \
>  	hw/host1x01.o \
>  	hw/host1x02.o \
>  	hw/host1x04.o \
> diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
> new file mode 100644
> index 000000000000..e96ad93ff656
> --- /dev/null
> +++ b/drivers/gpu/host1x/fence.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Syncpoint dma_fence implementation
> + *
> + * Copyright (c) 2020, NVIDIA Corporation.
> + */
> +
> +#include <linux/dma-fence.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/sync_file.h>
> +
> +#include "fence.h"
> +#include "intr.h"
> +#include "syncpt.h"
> +
> +DEFINE_SPINLOCK(lock);
> +
> +struct host1x_syncpt_fence {
> +	struct dma_fence base;
> +
> +	atomic_t signaling;
> +
> +	struct host1x_syncpt *sp;
> +	u32 threshold;
> +
> +	struct host1x_waitlist *waiter;
> +	void *waiter_ref;
> +
> +	struct delayed_work timeout_work;
> +};
> +
> +static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
> +{
> +	return "host1x";
> +}
> +
> +static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	return "syncpoint";
> +}
> +
> +static bool syncpt_fence_enable_signaling(struct dma_fence *f)
> +{
> +	struct host1x_syncpt_fence *sf =
> +		container_of(f, struct host1x_syncpt_fence, base);

Maybe add a casting helper to make this less annoying.

> +const struct dma_fence_ops syncpt_fence_ops = {

I'd prefer this to use the host1x_syncpt_ prefix for better scoping.

> +	.get_driver_name = syncpt_fence_get_driver_name,
> +	.get_timeline_name = syncpt_fence_get_timeline_name,
> +	.enable_signaling = syncpt_fence_enable_signaling,
> +	.release = syncpt_fence_release,

Maybe also do that for these, while at it.

> +static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
> +{
> +	struct host1x_create_fence args;
> +	unsigned long copy_err;

Any reason why this needs to have that cumbersome copy_ prefix? There's
no other "err" variables, so why not just use the shorter "err" for
this?

> +	int fd;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	if (args.reserved[0])
> +		return -EINVAL;
> +
> +	if (args.id >= host1x_syncpt_nb_pts(host1x))
> +		return -EINVAL;
> +
> +	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
> +
> +	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
> +	if (fd < 0)
> +		return fd;
> +
> +	args.fence_fd = fd;
> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
> +{
> +	struct host1x_fence_extract_fence __user *fences_user_ptr;
> +	struct dma_fence *fence, **fences;
> +	struct host1x_fence_extract args;
> +	struct dma_fence_array *array;
> +	unsigned int num_fences, i;
> +	unsigned long copy_err;

Can't do the same here, but perhaps just do what other copy_from_user()
callsites do and just use it directly in the conditional so you don't
even need to store the return value since you're not reusing it anyway.

In fact you could do the same thing above and just get rid of that
variable and render the code more idiomatic.

> +	int err;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
> +
> +	if (args.reserved[0] || args.reserved[1])
> +		return -EINVAL;
> +
> +	fence = sync_file_get_fence(args.fence_fd);
> +	if (!fence)
> +		return -EINVAL;
> +
> +	array = to_dma_fence_array(fence);
> +	if (array) {
> +		fences = array->fences;
> +		num_fences = array->num_fences;
> +	} else {
> +		fences = &fence;
> +		num_fences = 1;
> +	}
> +
> +	for (i = 0; i < min(num_fences, args.num_fences); i++) {
> +		struct host1x_fence_extract_fence f;
> +
> +		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
> +		if (err)
> +			goto put_fence;
> +
> +		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
> +		if (copy_err) {
> +			err = -EFAULT;
> +			goto put_fence;
> +		}
> +	}
> +
> +	args.num_fences = i+1;

checkpatch will probably complain about this not having spaces around
that '+'.

> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err) {
> +		err = -EFAULT;
> +		goto put_fence;
> +	}
> +
> +	return 0;
> +
> +put_fence:
> +	dma_fence_put(fence);
> +
> +	return err;
> +}
> +
>  static long dev_file_ioctl(struct file *file, unsigned int cmd,
>  			   unsigned long arg)
>  {
> @@ -210,6 +305,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
>  		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
>  		break;
>  
> +	case HOST1X_IOCTL_CREATE_FENCE:
> +		err = dev_file_ioctl_create_fence(file->private_data, data);
> +		break;
> +
> +	case HOST1X_IOCTL_FENCE_EXTRACT:
> +		err = dev_file_ioctl_fence_extract(file->private_data, data);
> +		break;
> +
>  	default:
>  		err = -ENOTTY;
>  	}
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index b3178ae51cae..080f9d3d29eb 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -165,6 +165,10 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>  
>  struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
>  
> +struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
> +int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
> +int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);

Do we need these outside of the IOCTL implementations?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 09/21] gpu: host1x: DMA fences and userspace fence creation
@ 2021-03-23 11:15     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:15 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 7525 bytes --]

On Mon, Jan 11, 2021 at 03:00:07PM +0200, Mikko Perttunen wrote:
> Add an implementation of dma_fences based on syncpoints. Syncpoint
> interrupts are used to signal fences. Additionally, after
> software signaling has been enabled, a 30 second timeout is started.
> If the syncpoint threshold is not reached within this period,
> the fence is signalled with an -ETIMEDOUT error code. This is to
> allow fences that would never reach their syncpoint threshold to
> be cleaned up.
> 
> Additionally, add a new /dev/host1x IOCTL for creating sync_file
> file descriptors backed by syncpoint fences.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Update for change in put_ref prototype.
> v4:
> * Fix _signal prototype and include it to avoid warning
> * Remove use of unused local in error path
> v3:
> * Move declaration of host1x_fence_extract to public header
> ---
>  drivers/gpu/host1x/Makefile |   1 +
>  drivers/gpu/host1x/fence.c  | 208 ++++++++++++++++++++++++++++++++++++
>  drivers/gpu/host1x/fence.h  |  13 +++
>  drivers/gpu/host1x/intr.c   |   9 ++
>  drivers/gpu/host1x/intr.h   |   2 +
>  drivers/gpu/host1x/uapi.c   | 103 ++++++++++++++++++
>  include/linux/host1x.h      |   4 +
>  7 files changed, 340 insertions(+)
>  create mode 100644 drivers/gpu/host1x/fence.c
>  create mode 100644 drivers/gpu/host1x/fence.h
> 
> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
> index 882f928d75e1..a48af2cefae1 100644
> --- a/drivers/gpu/host1x/Makefile
> +++ b/drivers/gpu/host1x/Makefile
> @@ -10,6 +10,7 @@ host1x-y = \
>  	debug.o \
>  	mipi.o \
>  	uapi.o \
> +	fence.o \
>  	hw/host1x01.o \
>  	hw/host1x02.o \
>  	hw/host1x04.o \
> diff --git a/drivers/gpu/host1x/fence.c b/drivers/gpu/host1x/fence.c
> new file mode 100644
> index 000000000000..e96ad93ff656
> --- /dev/null
> +++ b/drivers/gpu/host1x/fence.c
> @@ -0,0 +1,208 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Syncpoint dma_fence implementation
> + *
> + * Copyright (c) 2020, NVIDIA Corporation.
> + */
> +
> +#include <linux/dma-fence.h>
> +#include <linux/file.h>
> +#include <linux/fs.h>
> +#include <linux/slab.h>
> +#include <linux/sync_file.h>
> +
> +#include "fence.h"
> +#include "intr.h"
> +#include "syncpt.h"
> +
> +DEFINE_SPINLOCK(lock);
> +
> +struct host1x_syncpt_fence {
> +	struct dma_fence base;
> +
> +	atomic_t signaling;
> +
> +	struct host1x_syncpt *sp;
> +	u32 threshold;
> +
> +	struct host1x_waitlist *waiter;
> +	void *waiter_ref;
> +
> +	struct delayed_work timeout_work;
> +};
> +
> +static const char *syncpt_fence_get_driver_name(struct dma_fence *f)
> +{
> +	return "host1x";
> +}
> +
> +static const char *syncpt_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	return "syncpoint";
> +}
> +
> +static bool syncpt_fence_enable_signaling(struct dma_fence *f)
> +{
> +	struct host1x_syncpt_fence *sf =
> +		container_of(f, struct host1x_syncpt_fence, base);

Maybe add a casting helper to make this less annoying.

> +const struct dma_fence_ops syncpt_fence_ops = {

I'd prefer this to use the host1x_syncpt_ prefix for better scoping.

> +	.get_driver_name = syncpt_fence_get_driver_name,
> +	.get_timeline_name = syncpt_fence_get_timeline_name,
> +	.enable_signaling = syncpt_fence_enable_signaling,
> +	.release = syncpt_fence_release,

Maybe also do that for these, while at it.

> +static int dev_file_ioctl_create_fence(struct host1x *host1x, void __user *data)
> +{
> +	struct host1x_create_fence args;
> +	unsigned long copy_err;

Any reason why this needs to have that cumbersome copy_ prefix? There's
no other "err" variables, so why not just use the shorter "err" for
this?

> +	int fd;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	if (args.reserved[0])
> +		return -EINVAL;
> +
> +	if (args.id >= host1x_syncpt_nb_pts(host1x))
> +		return -EINVAL;
> +
> +	args.id = array_index_nospec(args.id, host1x_syncpt_nb_pts(host1x));
> +
> +	fd = host1x_fence_create_fd(&host1x->syncpt[args.id], args.threshold);
> +	if (fd < 0)
> +		return fd;
> +
> +	args.fence_fd = fd;
> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static int dev_file_ioctl_fence_extract(struct host1x *host1x, void __user *data)
> +{
> +	struct host1x_fence_extract_fence __user *fences_user_ptr;
> +	struct dma_fence *fence, **fences;
> +	struct host1x_fence_extract args;
> +	struct dma_fence_array *array;
> +	unsigned int num_fences, i;
> +	unsigned long copy_err;

Can't do the same here, but perhaps just do what other copy_from_user()
callsites do and just use it directly in the conditional so you don't
even need to store the return value since you're not reusing it anyway.

In fact you could do the same thing above and just get rid of that
variable and render the code more idiomatic.

> +	int err;
> +
> +	copy_err = copy_from_user(&args, data, sizeof(args));
> +	if (copy_err)
> +		return -EFAULT;
> +
> +	fences_user_ptr = u64_to_user_ptr(args.fences_ptr);
> +
> +	if (args.reserved[0] || args.reserved[1])
> +		return -EINVAL;
> +
> +	fence = sync_file_get_fence(args.fence_fd);
> +	if (!fence)
> +		return -EINVAL;
> +
> +	array = to_dma_fence_array(fence);
> +	if (array) {
> +		fences = array->fences;
> +		num_fences = array->num_fences;
> +	} else {
> +		fences = &fence;
> +		num_fences = 1;
> +	}
> +
> +	for (i = 0; i < min(num_fences, args.num_fences); i++) {
> +		struct host1x_fence_extract_fence f;
> +
> +		err = host1x_fence_extract(fences[i], &f.id, &f.threshold);
> +		if (err)
> +			goto put_fence;
> +
> +		copy_err = copy_to_user(fences_user_ptr + i, &f, sizeof(f));
> +		if (copy_err) {
> +			err = -EFAULT;
> +			goto put_fence;
> +		}
> +	}
> +
> +	args.num_fences = i+1;

checkpatch will probably complain about this not having spaces around
that '+'.

> +
> +	copy_err = copy_to_user(data, &args, sizeof(args));
> +	if (copy_err) {
> +		err = -EFAULT;
> +		goto put_fence;
> +	}
> +
> +	return 0;
> +
> +put_fence:
> +	dma_fence_put(fence);
> +
> +	return err;
> +}
> +
>  static long dev_file_ioctl(struct file *file, unsigned int cmd,
>  			   unsigned long arg)
>  {
> @@ -210,6 +305,14 @@ static long dev_file_ioctl(struct file *file, unsigned int cmd,
>  		err = dev_file_ioctl_alloc_syncpoint(file->private_data, data);
>  		break;
>  
> +	case HOST1X_IOCTL_CREATE_FENCE:
> +		err = dev_file_ioctl_create_fence(file->private_data, data);
> +		break;
> +
> +	case HOST1X_IOCTL_FENCE_EXTRACT:
> +		err = dev_file_ioctl_fence_extract(file->private_data, data);
> +		break;
> +
>  	default:
>  		err = -ENOTTY;
>  	}
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index b3178ae51cae..080f9d3d29eb 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -165,6 +165,10 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>  
>  struct host1x_syncpt *host1x_syncpt_fd_get(int fd);
>  
> +struct dma_fence *host1x_fence_create(struct host1x_syncpt *sp, u32 threshold);
> +int host1x_fence_create_fd(struct host1x_syncpt *sp, u32 threshold);
> +int host1x_fence_extract(struct dma_fence *fence, u32 *id, u32 *threshold);

Do we need these outside of the IOCTL implementations?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
  2021-03-23 11:02     ` Thierry Reding
@ 2021-03-23 11:15       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 11:15 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

On 3/23/21 1:02 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:06PM +0200, Mikko Perttunen wrote:
>> Add the /dev/host1x device node, implementing the following
>> functionality:
>>
>> - Reading syncpoint values
>> - Allocating syncpoints (providing syncpoint FDs)
>> - Incrementing syncpoints (based on syncpoint FD)
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v4:
>> * Put UAPI under CONFIG_DRM_TEGRA_STAGING
>> v3:
>> * Pass process name as syncpoint name when allocating
>>    syncpoint.
>> ---
>>   drivers/gpu/host1x/Makefile |   1 +
>>   drivers/gpu/host1x/dev.c    |   9 ++
>>   drivers/gpu/host1x/dev.h    |   3 +
>>   drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/host1x/uapi.h   |  22 +++
>>   include/linux/host1x.h      |   2 +
>>   6 files changed, 319 insertions(+)
>>   create mode 100644 drivers/gpu/host1x/uapi.c
>>   create mode 100644 drivers/gpu/host1x/uapi.h
>>
>> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
>> index 096017b8789d..882f928d75e1 100644
>> --- a/drivers/gpu/host1x/Makefile
>> +++ b/drivers/gpu/host1x/Makefile
>> @@ -9,6 +9,7 @@ host1x-y = \
>>   	job.o \
>>   	debug.o \
>>   	mipi.o \
>> +	uapi.o \
>>   	hw/host1x01.o \
>>   	hw/host1x02.o \
>>   	hw/host1x04.o \
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>> index d0ebb70e2fdd..641317d23828 100644
>> --- a/drivers/gpu/host1x/dev.c
>> +++ b/drivers/gpu/host1x/dev.c
>> @@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
>>   		goto deinit_syncpt;
>>   	}
>>   
>> +	err = host1x_uapi_init(&host->uapi, host);
> 
> It's a bit pointless to pass &host->uapi and host to the function since
> you can access the former through the latter.

Yeah. I originally did it to separate the uapi module from the rest of 
the code interface-wise as much as possible, but I don't think I have 
done that consistently so it just looks weird.

> 
>> +	if (err) {
>> +		dev_err(&pdev->dev, "failed to initialize uapi\n");
> 
> s/uapi/UAPI/, and perhaps include the error code to give a better hint
> as to why things failed.

Sure (if this code is kept.)

> 
>> +		goto deinit_intr;
>> +	}
>> +
>>   	host1x_debug_init(host);
>>   
>>   	if (host->info->has_hypervisor)
>> @@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
>>   	host1x_unregister(host);
>>   deinit_debugfs:
>>   	host1x_debug_deinit(host);
>> +	host1x_uapi_deinit(&host->uapi);
>> +deinit_intr:
>>   	host1x_intr_deinit(host);
>>   deinit_syncpt:
>>   	host1x_syncpt_deinit(host);
>> @@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
>>   
>>   	host1x_unregister(host);
>>   	host1x_debug_deinit(host);
>> +	host1x_uapi_deinit(&host->uapi);
>>   	host1x_intr_deinit(host);
>>   	host1x_syncpt_deinit(host);
>>   	reset_control_assert(host->rst);
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
>> index 63010ae37a97..7b8b7e20e32b 100644
>> --- a/drivers/gpu/host1x/dev.h
>> +++ b/drivers/gpu/host1x/dev.h
>> @@ -17,6 +17,7 @@
>>   #include "intr.h"
>>   #include "job.h"
>>   #include "syncpt.h"
>> +#include "uapi.h"
>>   
>>   struct host1x_syncpt;
>>   struct host1x_syncpt_base;
>> @@ -143,6 +144,8 @@ struct host1x {
>>   	struct list_head list;
>>   
>>   	struct device_dma_parameters dma_parms;
>> +
>> +	struct host1x_uapi uapi;
>>   };
>>   
>>   void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
>> diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
>> new file mode 100644
>> index 000000000000..27b8761c3f35
>> --- /dev/null
>> +++ b/drivers/gpu/host1x/uapi.c
>> @@ -0,0 +1,282 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * /dev/host1x syncpoint interface
>> + *
>> + * Copyright (c) 2020, NVIDIA Corporation.
>> + */
>> +
>> +#include <linux/anon_inodes.h>
>> +#include <linux/cdev.h>
>> +#include <linux/file.h>
>> +#include <linux/fs.h>
>> +#include <linux/host1x.h>
>> +#include <linux/nospec.h>
>> +
>> +#include "dev.h"
>> +#include "syncpt.h"
>> +#include "uapi.h"
>> +
>> +#include <uapi/linux/host1x.h>
>> +
>> +static int syncpt_file_release(struct inode *inode, struct file *file)
>> +{
>> +	struct host1x_syncpt *sp = file->private_data;
>> +
>> +	host1x_syncpt_put(sp);
>> +
>> +	return 0;
>> +}
>> +
>> +static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
>> +{
>> +	struct host1x_syncpoint_info args;
>> +	unsigned long copy_err;
>> +
>> +	copy_err = copy_from_user(&args, data, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
>> +		return -EINVAL;
> 
> Yes! \o/
> 
>> +
>> +	args.id = sp->id;
>> +
>> +	copy_err = copy_to_user(data, &args, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>> +static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
>> +{
>> +	struct host1x_syncpoint_increment args;
>> +	unsigned long copy_err;
>> +	u32 i;
>> +
>> +	copy_err = copy_from_user(&args, data, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	for (i = 0; i < args.count; i++) {
>> +		host1x_syncpt_incr(sp);
>> +		if (signal_pending(current))
>> +			return -EINTR;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
>> +			      unsigned long arg)
>> +{
>> +	void __user *data = (void __user *)arg;
>> +	long err;
>> +
>> +	switch (cmd) {
>> +	case HOST1X_IOCTL_SYNCPOINT_INFO:
>> +		err = syncpt_file_ioctl_info(file->private_data, data);
>> +		break;
>> +
>> +	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
>> +		err = syncpt_file_ioctl_incr(file->private_data, data);
>> +		break;
>> +
>> +	default:
>> +		err = -ENOTTY;
>> +	}
>> +
>> +	return err;
>> +}
> 
> I wonder if it's worth adding some more logic to this demuxing. I'm
> thinking along the lines of what the DRM IOCTL demuxer does, which
> ultimately allows the IOCTLs to be extended. It does this by doing a
> bit of sanitizing and removing the parameter size field from the cmd
> argument so that the same IOCTL may handle different parameter sizes.

Yep, seems like a good idea (if we keep this).

> 
>> +static const struct file_operations syncpt_file_fops = {
>> +	.owner = THIS_MODULE,
>> +	.release = syncpt_file_release,
>> +	.unlocked_ioctl = syncpt_file_ioctl,
>> +	.compat_ioctl = syncpt_file_ioctl,
>> +};
>> +
>> +struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
>> +{
>> +	struct host1x_syncpt *sp;
>> +	struct file *file = fget(fd);
>> +
>> +	if (!file)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	if (file->f_op != &syncpt_file_fops) {
>> +		fput(file);
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +
>> +	sp = file->private_data;
>> +
>> +	host1x_syncpt_get(sp);
>> +
>> +	fput(file);
>> +
>> +	return sp;
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_fd_get);
>> +
>> +static int dev_file_open(struct inode *inode, struct file *file)
> 
> Maybe use the more specific host1x_ as prefix instead of the generic
> dev_? That might make things like stack traces more readable.

Yep.

> 
> Otherwise looks good.
> 
> Thierry
> 

thanks,
Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node
@ 2021-03-23 11:15       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 11:15 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 1:02 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:06PM +0200, Mikko Perttunen wrote:
>> Add the /dev/host1x device node, implementing the following
>> functionality:
>>
>> - Reading syncpoint values
>> - Allocating syncpoints (providing syncpoint FDs)
>> - Incrementing syncpoints (based on syncpoint FD)
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v4:
>> * Put UAPI under CONFIG_DRM_TEGRA_STAGING
>> v3:
>> * Pass process name as syncpoint name when allocating
>>    syncpoint.
>> ---
>>   drivers/gpu/host1x/Makefile |   1 +
>>   drivers/gpu/host1x/dev.c    |   9 ++
>>   drivers/gpu/host1x/dev.h    |   3 +
>>   drivers/gpu/host1x/uapi.c   | 282 ++++++++++++++++++++++++++++++++++++
>>   drivers/gpu/host1x/uapi.h   |  22 +++
>>   include/linux/host1x.h      |   2 +
>>   6 files changed, 319 insertions(+)
>>   create mode 100644 drivers/gpu/host1x/uapi.c
>>   create mode 100644 drivers/gpu/host1x/uapi.h
>>
>> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
>> index 096017b8789d..882f928d75e1 100644
>> --- a/drivers/gpu/host1x/Makefile
>> +++ b/drivers/gpu/host1x/Makefile
>> @@ -9,6 +9,7 @@ host1x-y = \
>>   	job.o \
>>   	debug.o \
>>   	mipi.o \
>> +	uapi.o \
>>   	hw/host1x01.o \
>>   	hw/host1x02.o \
>>   	hw/host1x04.o \
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
>> index d0ebb70e2fdd..641317d23828 100644
>> --- a/drivers/gpu/host1x/dev.c
>> +++ b/drivers/gpu/host1x/dev.c
>> @@ -461,6 +461,12 @@ static int host1x_probe(struct platform_device *pdev)
>>   		goto deinit_syncpt;
>>   	}
>>   
>> +	err = host1x_uapi_init(&host->uapi, host);
> 
> It's a bit pointless to pass &host->uapi and host to the function since
> you can access the former through the latter.

Yeah. I originally did it to separate the uapi module from the rest of 
the code interface-wise as much as possible, but I don't think I have 
done that consistently so it just looks weird.

> 
>> +	if (err) {
>> +		dev_err(&pdev->dev, "failed to initialize uapi\n");
> 
> s/uapi/UAPI/, and perhaps include the error code to give a better hint
> as to why things failed.

Sure (if this code is kept.)

> 
>> +		goto deinit_intr;
>> +	}
>> +
>>   	host1x_debug_init(host);
>>   
>>   	if (host->info->has_hypervisor)
>> @@ -480,6 +486,8 @@ static int host1x_probe(struct platform_device *pdev)
>>   	host1x_unregister(host);
>>   deinit_debugfs:
>>   	host1x_debug_deinit(host);
>> +	host1x_uapi_deinit(&host->uapi);
>> +deinit_intr:
>>   	host1x_intr_deinit(host);
>>   deinit_syncpt:
>>   	host1x_syncpt_deinit(host);
>> @@ -501,6 +509,7 @@ static int host1x_remove(struct platform_device *pdev)
>>   
>>   	host1x_unregister(host);
>>   	host1x_debug_deinit(host);
>> +	host1x_uapi_deinit(&host->uapi);
>>   	host1x_intr_deinit(host);
>>   	host1x_syncpt_deinit(host);
>>   	reset_control_assert(host->rst);
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
>> index 63010ae37a97..7b8b7e20e32b 100644
>> --- a/drivers/gpu/host1x/dev.h
>> +++ b/drivers/gpu/host1x/dev.h
>> @@ -17,6 +17,7 @@
>>   #include "intr.h"
>>   #include "job.h"
>>   #include "syncpt.h"
>> +#include "uapi.h"
>>   
>>   struct host1x_syncpt;
>>   struct host1x_syncpt_base;
>> @@ -143,6 +144,8 @@ struct host1x {
>>   	struct list_head list;
>>   
>>   	struct device_dma_parameters dma_parms;
>> +
>> +	struct host1x_uapi uapi;
>>   };
>>   
>>   void host1x_hypervisor_writel(struct host1x *host1x, u32 r, u32 v);
>> diff --git a/drivers/gpu/host1x/uapi.c b/drivers/gpu/host1x/uapi.c
>> new file mode 100644
>> index 000000000000..27b8761c3f35
>> --- /dev/null
>> +++ b/drivers/gpu/host1x/uapi.c
>> @@ -0,0 +1,282 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * /dev/host1x syncpoint interface
>> + *
>> + * Copyright (c) 2020, NVIDIA Corporation.
>> + */
>> +
>> +#include <linux/anon_inodes.h>
>> +#include <linux/cdev.h>
>> +#include <linux/file.h>
>> +#include <linux/fs.h>
>> +#include <linux/host1x.h>
>> +#include <linux/nospec.h>
>> +
>> +#include "dev.h"
>> +#include "syncpt.h"
>> +#include "uapi.h"
>> +
>> +#include <uapi/linux/host1x.h>
>> +
>> +static int syncpt_file_release(struct inode *inode, struct file *file)
>> +{
>> +	struct host1x_syncpt *sp = file->private_data;
>> +
>> +	host1x_syncpt_put(sp);
>> +
>> +	return 0;
>> +}
>> +
>> +static int syncpt_file_ioctl_info(struct host1x_syncpt *sp, void __user *data)
>> +{
>> +	struct host1x_syncpoint_info args;
>> +	unsigned long copy_err;
>> +
>> +	copy_err = copy_from_user(&args, data, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	if (args.reserved[0] || args.reserved[1] || args.reserved[2])
>> +		return -EINVAL;
> 
> Yes! \o/
> 
>> +
>> +	args.id = sp->id;
>> +
>> +	copy_err = copy_to_user(data, &args, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	return 0;
>> +}
>> +
>> +static int syncpt_file_ioctl_incr(struct host1x_syncpt *sp, void __user *data)
>> +{
>> +	struct host1x_syncpoint_increment args;
>> +	unsigned long copy_err;
>> +	u32 i;
>> +
>> +	copy_err = copy_from_user(&args, data, sizeof(args));
>> +	if (copy_err)
>> +		return -EFAULT;
>> +
>> +	for (i = 0; i < args.count; i++) {
>> +		host1x_syncpt_incr(sp);
>> +		if (signal_pending(current))
>> +			return -EINTR;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static long syncpt_file_ioctl(struct file *file, unsigned int cmd,
>> +			      unsigned long arg)
>> +{
>> +	void __user *data = (void __user *)arg;
>> +	long err;
>> +
>> +	switch (cmd) {
>> +	case HOST1X_IOCTL_SYNCPOINT_INFO:
>> +		err = syncpt_file_ioctl_info(file->private_data, data);
>> +		break;
>> +
>> +	case HOST1X_IOCTL_SYNCPOINT_INCREMENT:
>> +		err = syncpt_file_ioctl_incr(file->private_data, data);
>> +		break;
>> +
>> +	default:
>> +		err = -ENOTTY;
>> +	}
>> +
>> +	return err;
>> +}
> 
> I wonder if it's worth adding some more logic to this demuxing. I'm
> thinking along the lines of what the DRM IOCTL demuxer does, which
> ultimately allows the IOCTLs to be extended. It does this by doing a
> bit of sanitizing and removing the parameter size field from the cmd
> argument so that the same IOCTL may handle different parameter sizes.

Yep, seems like a good idea (if we keep this).

> 
>> +static const struct file_operations syncpt_file_fops = {
>> +	.owner = THIS_MODULE,
>> +	.release = syncpt_file_release,
>> +	.unlocked_ioctl = syncpt_file_ioctl,
>> +	.compat_ioctl = syncpt_file_ioctl,
>> +};
>> +
>> +struct host1x_syncpt *host1x_syncpt_fd_get(int fd)
>> +{
>> +	struct host1x_syncpt *sp;
>> +	struct file *file = fget(fd);
>> +
>> +	if (!file)
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	if (file->f_op != &syncpt_file_fops) {
>> +		fput(file);
>> +		return ERR_PTR(-EINVAL);
>> +	}
>> +
>> +	sp = file->private_data;
>> +
>> +	host1x_syncpt_get(sp);
>> +
>> +	fput(file);
>> +
>> +	return sp;
>> +}
>> +EXPORT_SYMBOL(host1x_syncpt_fd_get);
>> +
>> +static int dev_file_open(struct inode *inode, struct file *file)
> 
> Maybe use the more specific host1x_ as prefix instead of the generic
> dev_? That might make things like stack traces more readable.

Yep.

> 
> Otherwise looks good.
> 
> Thierry
> 

thanks,
Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
  2021-03-23 10:44       ` Mikko Perttunen
@ 2021-03-23 11:21         ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:21 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Mikko Perttunen, jonathanh, digetx, airlied, daniel, linux-tegra,
	talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 6323 bytes --]

On Tue, Mar 23, 2021 at 12:44:28PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:36 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
> > > Add reference counting for allocated syncpoints to allow keeping
> > > them allocated while jobs are referencing them. Additionally,
> > > clean up various places using syncpoint IDs to use host1x_syncpt
> > > pointers instead.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > > v5:
> > > - Remove host1x_syncpt_put in submit code, as job_put already
> > >    puts the syncpoint.
> > > - Changes due to rebase in VI driver.
> > > v4:
> > > - Update from _free to _put in VI driver as well
> > > ---
> > >   drivers/gpu/drm/tegra/dc.c             |  4 +-
> > >   drivers/gpu/drm/tegra/drm.c            | 14 ++---
> > >   drivers/gpu/drm/tegra/gr2d.c           |  4 +-
> > >   drivers/gpu/drm/tegra/gr3d.c           |  4 +-
> > >   drivers/gpu/drm/tegra/vic.c            |  4 +-
> > >   drivers/gpu/host1x/cdma.c              | 11 ++--
> > >   drivers/gpu/host1x/dev.h               |  7 ++-
> > >   drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
> > >   drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
> > >   drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
> > >   drivers/gpu/host1x/job.c               |  5 +-
> > >   drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
> > >   drivers/gpu/host1x/syncpt.h            |  3 ++
> > >   drivers/staging/media/tegra-video/vi.c |  4 +-
> > >   include/linux/host1x.h                 |  8 +--
> > >   15 files changed, 98 insertions(+), 59 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
> > > index 85dd7131553a..033032dfc4b9 100644
> > > --- a/drivers/gpu/drm/tegra/dc.c
> > > +++ b/drivers/gpu/drm/tegra/dc.c
> > > @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
> > >   		drm_plane_cleanup(primary);
> > >   	host1x_client_iommu_detach(client);
> > > -	host1x_syncpt_free(dc->syncpt);
> > > +	host1x_syncpt_put(dc->syncpt);
> > >   	return err;
> > >   }
> > > @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
> > >   	}
> > >   	host1x_client_iommu_detach(client);
> > > -	host1x_syncpt_free(dc->syncpt);
> > > +	host1x_syncpt_put(dc->syncpt);
> > >   	return 0;
> > >   }
> > > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > > index e45c8414e2a3..5a6037eff37f 100644
> > > --- a/drivers/gpu/drm/tegra/drm.c
> > > +++ b/drivers/gpu/drm/tegra/drm.c
> > > @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   	struct drm_tegra_syncpt syncpt;
> > >   	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
> > >   	struct drm_gem_object **refs;
> > > -	struct host1x_syncpt *sp;
> > > +	struct host1x_syncpt *sp = NULL;
> > >   	struct host1x_job *job;
> > >   	unsigned int num_refs;
> > >   	int err;
> > > @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   		goto fail;
> > >   	}
> > > -	/* check whether syncpoint ID is valid */
> > > -	sp = host1x_syncpt_get(host1x, syncpt.id);
> > > +	/* Syncpoint ref will be dropped on job release. */
> > > +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
> > 
> > It's a bit odd to replace the comment like that. Perhaps instead of
> > replacing it, just extend it with the note about the lifetime?
> 
> I replaced it because in the past the check was there really to just check
> if the ID is valid (the pointer was thrown away) -- now we actually pass the
> pointer into the job structure, so it serves a more general "get the
> syncpoint" purpose which is clear based on the name of the function. The new
> comment is then a new comment to clarify the lifetime of the reference.

Alright, makes sense.

> > 
> > >   	if (!sp) {
> > >   		err = -ENOENT;
> > >   		goto fail;
> > > @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   	job->is_addr_reg = context->client->ops->is_addr_reg;
> > >   	job->is_valid_class = context->client->ops->is_valid_class;
> > >   	job->syncpt_incrs = syncpt.incrs;
> > > -	job->syncpt_id = syncpt.id;
> > > +	job->syncpt = sp;
> > >   	job->timeout = 10000;
> > >   	if (args->timeout && args->timeout < 10000)
> > > @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
> > >   	struct drm_tegra_syncpt_read *args = data;
> > >   	struct host1x_syncpt *sp;
> > > -	sp = host1x_syncpt_get(host, args->id);
> > > +	sp = host1x_syncpt_get_by_id_noref(host, args->id);
> > 
> > Why don't we need a reference here? It's perhaps unlikely, because this
> > function is short-lived, but the otherwise last reference to this could
> > go away at any point after this line and cause sp to become invalid.
> > 
> > In general it's very rare to not have to keep a reference to a reference
> > counted object.
> 
> Having a reference to a syncpoint indicates ownership of the syncpoint.
> Since here we are just reading it, we don't want ownership. (The non _noref
> functions will fail if the syncpoint is not currently allocated, which would
> break this interface.) The host1x_syncpt structure itself always exists even
> if the refcount drops to zero.

Ah... you're right. host1x_syncpt_put() on the last reference doesn't
actually cause the backing memory to be freed. That's a bit counter-
intuitive, but I don't see why that can't work.

> > >   	if (!sp)
> > >   		return -EINVAL;
> > > @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
> > >   	struct drm_tegra_syncpt_incr *args = data;
> > >   	struct host1x_syncpt *sp;
> > > -	sp = host1x_syncpt_get(host1x, args->id);
> > > +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
> > 
> > Same here. Or am I missing some other way by which it is ensured that
> > the reference stays around?
> 
> As above, though here we actually mutate the syncpoint even though we don't
> have a reference and as such ownership. But that's just a quirk of this old
> interface allowing incrementing of syncpoints you don't own.

Yeah, doesn't actually make anything worse.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints
@ 2021-03-23 11:21         ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:21 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	digetx, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 6323 bytes --]

On Tue, Mar 23, 2021 at 12:44:28PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:36 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:04PM +0200, Mikko Perttunen wrote:
> > > Add reference counting for allocated syncpoints to allow keeping
> > > them allocated while jobs are referencing them. Additionally,
> > > clean up various places using syncpoint IDs to use host1x_syncpt
> > > pointers instead.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > > v5:
> > > - Remove host1x_syncpt_put in submit code, as job_put already
> > >    puts the syncpoint.
> > > - Changes due to rebase in VI driver.
> > > v4:
> > > - Update from _free to _put in VI driver as well
> > > ---
> > >   drivers/gpu/drm/tegra/dc.c             |  4 +-
> > >   drivers/gpu/drm/tegra/drm.c            | 14 ++---
> > >   drivers/gpu/drm/tegra/gr2d.c           |  4 +-
> > >   drivers/gpu/drm/tegra/gr3d.c           |  4 +-
> > >   drivers/gpu/drm/tegra/vic.c            |  4 +-
> > >   drivers/gpu/host1x/cdma.c              | 11 ++--
> > >   drivers/gpu/host1x/dev.h               |  7 ++-
> > >   drivers/gpu/host1x/hw/cdma_hw.c        |  2 +-
> > >   drivers/gpu/host1x/hw/channel_hw.c     | 10 ++--
> > >   drivers/gpu/host1x/hw/debug_hw.c       |  2 +-
> > >   drivers/gpu/host1x/job.c               |  5 +-
> > >   drivers/gpu/host1x/syncpt.c            | 75 +++++++++++++++++++-------
> > >   drivers/gpu/host1x/syncpt.h            |  3 ++
> > >   drivers/staging/media/tegra-video/vi.c |  4 +-
> > >   include/linux/host1x.h                 |  8 +--
> > >   15 files changed, 98 insertions(+), 59 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
> > > index 85dd7131553a..033032dfc4b9 100644
> > > --- a/drivers/gpu/drm/tegra/dc.c
> > > +++ b/drivers/gpu/drm/tegra/dc.c
> > > @@ -2129,7 +2129,7 @@ static int tegra_dc_init(struct host1x_client *client)
> > >   		drm_plane_cleanup(primary);
> > >   	host1x_client_iommu_detach(client);
> > > -	host1x_syncpt_free(dc->syncpt);
> > > +	host1x_syncpt_put(dc->syncpt);
> > >   	return err;
> > >   }
> > > @@ -2154,7 +2154,7 @@ static int tegra_dc_exit(struct host1x_client *client)
> > >   	}
> > >   	host1x_client_iommu_detach(client);
> > > -	host1x_syncpt_free(dc->syncpt);
> > > +	host1x_syncpt_put(dc->syncpt);
> > >   	return 0;
> > >   }
> > > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > > index e45c8414e2a3..5a6037eff37f 100644
> > > --- a/drivers/gpu/drm/tegra/drm.c
> > > +++ b/drivers/gpu/drm/tegra/drm.c
> > > @@ -171,7 +171,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   	struct drm_tegra_syncpt syncpt;
> > >   	struct host1x *host1x = dev_get_drvdata(drm->dev->parent);
> > >   	struct drm_gem_object **refs;
> > > -	struct host1x_syncpt *sp;
> > > +	struct host1x_syncpt *sp = NULL;
> > >   	struct host1x_job *job;
> > >   	unsigned int num_refs;
> > >   	int err;
> > > @@ -298,8 +298,8 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   		goto fail;
> > >   	}
> > > -	/* check whether syncpoint ID is valid */
> > > -	sp = host1x_syncpt_get(host1x, syncpt.id);
> > > +	/* Syncpoint ref will be dropped on job release. */
> > > +	sp = host1x_syncpt_get_by_id(host1x, syncpt.id);
> > 
> > It's a bit odd to replace the comment like that. Perhaps instead of
> > replacing it, just extend it with the note about the lifetime?
> 
> I replaced it because in the past the check was there really to just check
> if the ID is valid (the pointer was thrown away) -- now we actually pass the
> pointer into the job structure, so it serves a more general "get the
> syncpoint" purpose which is clear based on the name of the function. The new
> comment is then a new comment to clarify the lifetime of the reference.

Alright, makes sense.

> > 
> > >   	if (!sp) {
> > >   		err = -ENOENT;
> > >   		goto fail;
> > > @@ -308,7 +308,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
> > >   	job->is_addr_reg = context->client->ops->is_addr_reg;
> > >   	job->is_valid_class = context->client->ops->is_valid_class;
> > >   	job->syncpt_incrs = syncpt.incrs;
> > > -	job->syncpt_id = syncpt.id;
> > > +	job->syncpt = sp;
> > >   	job->timeout = 10000;
> > >   	if (args->timeout && args->timeout < 10000)
> > > @@ -380,7 +380,7 @@ static int tegra_syncpt_read(struct drm_device *drm, void *data,
> > >   	struct drm_tegra_syncpt_read *args = data;
> > >   	struct host1x_syncpt *sp;
> > > -	sp = host1x_syncpt_get(host, args->id);
> > > +	sp = host1x_syncpt_get_by_id_noref(host, args->id);
> > 
> > Why don't we need a reference here? It's perhaps unlikely, because this
> > function is short-lived, but the otherwise last reference to this could
> > go away at any point after this line and cause sp to become invalid.
> > 
> > In general it's very rare to not have to keep a reference to a reference
> > counted object.
> 
> Having a reference to a syncpoint indicates ownership of the syncpoint.
> Since here we are just reading it, we don't want ownership. (The non _noref
> functions will fail if the syncpoint is not currently allocated, which would
> break this interface.) The host1x_syncpt structure itself always exists even
> if the refcount drops to zero.

Ah... you're right. host1x_syncpt_put() on the last reference doesn't
actually cause the backing memory to be freed. That's a bit counter-
intuitive, but I don't see why that can't work.

> > >   	if (!sp)
> > >   		return -EINVAL;
> > > @@ -395,7 +395,7 @@ static int tegra_syncpt_incr(struct drm_device *drm, void *data,
> > >   	struct drm_tegra_syncpt_incr *args = data;
> > >   	struct host1x_syncpt *sp;
> > > -	sp = host1x_syncpt_get(host1x, args->id);
> > > +	sp = host1x_syncpt_get_by_id_noref(host1x, args->id);
> > 
> > Same here. Or am I missing some other way by which it is ensured that
> > the reference stays around?
> 
> As above, though here we actually mutate the syncpoint even though we don't
> have a reference and as such ownership. But that's just a quirk of this old
> interface allowing incrementing of syncpoints you don't own.

Yeah, doesn't actually make anything worse.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
  2021-03-23 11:12       ` Mikko Perttunen
@ 2021-03-23 11:43         ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:43 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Mikko Perttunen, jonathanh, digetx, airlied, daniel, linux-tegra,
	dri-devel, talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 3058 bytes --]

On Tue, Mar 23, 2021 at 01:12:36PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:52 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
[...]
> > > +struct host1x_fence_extract_fence {
> > > +	__u32 id;
> > > +	__u32 threshold;
> > > +};
> > > +
> > > +struct host1x_fence_extract {
> > > +	/**
> > > +	 * @fence_fd: [in]
> > > +	 *
> > > +	 * sync_file file descriptor
> > > +	 */
> > > +	__s32 fence_fd;
> > > +
> > > +	/**
> > > +	 * @num_fences: [in,out]
> > > +	 *
> > > +	 * In: size of the `fences_ptr` array counted in elements.
> > > +	 * Out: required size of the `fences_ptr` array counted in elements.
> > > +	 */
> > > +	__u32 num_fences;
> > > +
> > > +	/**
> > > +	 * @fences_ptr: [in]
> > > +	 *
> > > +	 * Pointer to array of `struct host1x_fence_extract_fence`.
> > > +	 */
> > > +	__u64 fences_ptr;
> > > +
> > > +	__u32 reserved[2];
> > > +};
> > 
> > For the others it's pretty clear to me what the purpose is, but I'm at a
> > complete loss with this one. What's the use-case for this?
> 
> This is needed to process incoming prefences for userspace-programmed
> engines -- mainly, the GPU with usermode submit enabled.

I'm not sure what GPU usermode submit is. The name would imply that it's
somehow a mechanism to submit work to the GPU without getting the kernel
involved at all. That's something we'd have to clarify with the Nouveau
team to see if it's something they'd consider implementing, or implement
it ourselves.

Currently there's no interoperation at the syncpoint level between
Nouveau and Tegra DRM, so Nouveau on Tegra doesn't use any syncpoints at
all and hence there's currently no use at all for this kind of API.

> To align with other upstream code, I've been thinking of removing this whole
> UAPI; moving the syncpoint allocation part to the DRM UAPI, and dropping the
> sync_file stuff altogether (if we have support for job submission outputting
> syncobjs, those could still be converted into sync_files). This doesn't
> support usecases like GPU usermode submit, so for downstream we'll have to
> add it back in, though. Would like to hear your opinion on it as well.

That certainly sounds like a much easier sell because we have use-cases
for all of that. Along with your patches for NVDEC, the existing
userspace for VIC and your work-in-progress NVDEC userspace, this should
cover all the requirements.

Long story short, I think we have some ground to cover before we can
start thinking about how to do GPU usermode submits in an upstream
stack. As such we have no clear idea of what this is going to look like
in the end, or if it's going to be supported at all, so I think it'd be
best to move forward with your alternate proposal and move the syncpoint
functionality into Tegra DRM so that it can be used for VIC, NVDEC and
potentially other engines. If we ever get to the point of having to
support GPU usermode submit, we can take another look at how best to
support it.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 07/21] gpu: host1x: Introduce UAPI header
@ 2021-03-23 11:43         ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:43 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	digetx, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 3058 bytes --]

On Tue, Mar 23, 2021 at 01:12:36PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:52 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:05PM +0200, Mikko Perttunen wrote:
[...]
> > > +struct host1x_fence_extract_fence {
> > > +	__u32 id;
> > > +	__u32 threshold;
> > > +};
> > > +
> > > +struct host1x_fence_extract {
> > > +	/**
> > > +	 * @fence_fd: [in]
> > > +	 *
> > > +	 * sync_file file descriptor
> > > +	 */
> > > +	__s32 fence_fd;
> > > +
> > > +	/**
> > > +	 * @num_fences: [in,out]
> > > +	 *
> > > +	 * In: size of the `fences_ptr` array counted in elements.
> > > +	 * Out: required size of the `fences_ptr` array counted in elements.
> > > +	 */
> > > +	__u32 num_fences;
> > > +
> > > +	/**
> > > +	 * @fences_ptr: [in]
> > > +	 *
> > > +	 * Pointer to array of `struct host1x_fence_extract_fence`.
> > > +	 */
> > > +	__u64 fences_ptr;
> > > +
> > > +	__u32 reserved[2];
> > > +};
> > 
> > For the others it's pretty clear to me what the purpose is, but I'm at a
> > complete loss with this one. What's the use-case for this?
> 
> This is needed to process incoming prefences for userspace-programmed
> engines -- mainly, the GPU with usermode submit enabled.

I'm not sure what GPU usermode submit is. The name would imply that it's
somehow a mechanism to submit work to the GPU without getting the kernel
involved at all. That's something we'd have to clarify with the Nouveau
team to see if it's something they'd consider implementing, or implement
it ourselves.

Currently there's no interoperation at the syncpoint level between
Nouveau and Tegra DRM, so Nouveau on Tegra doesn't use any syncpoints at
all and hence there's currently no use at all for this kind of API.

> To align with other upstream code, I've been thinking of removing this whole
> UAPI; moving the syncpoint allocation part to the DRM UAPI, and dropping the
> sync_file stuff altogether (if we have support for job submission outputting
> syncobjs, those could still be converted into sync_files). This doesn't
> support usecases like GPU usermode submit, so for downstream we'll have to
> add it back in, though. Would like to hear your opinion on it as well.

That certainly sounds like a much easier sell because we have use-cases
for all of that. Along with your patches for NVDEC, the existing
userspace for VIC and your work-in-progress NVDEC userspace, this should
cover all the requirements.

Long story short, I think we have some ground to cover before we can
start thinking about how to do GPU usermode submits in an upstream
stack. As such we have no clear idea of what this is going to look like
in the end, or if it's going to be supported at all, so I think it'd be
best to move forward with your alternate proposal and move the syncpoint
functionality into Tegra DRM so that it can be used for VIC, NVDEC and
potentially other engines. If we ever get to the point of having to
support GPU usermode submit, we can take another look at how best to
support it.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 11/21] gpu: host1x: Add job release callback
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 11:55     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:55 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 1540 bytes --]

On Mon, Jan 11, 2021 at 03:00:09PM +0200, Mikko Perttunen wrote:
> Add a callback field to the job structure, to be called just before
> the job is to be freed. This allows the job's submitter to clean
> up any of its own state, like decrement runtime PM refcounts.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/job.c | 3 +++
>  include/linux/host1x.h   | 4 ++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index 8f59b34672c2..09097e19c0d0 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
>  {
>  	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
>  
> +	if (job->release)
> +		job->release(job);
> +
>  	if (job->waiter)
>  		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
>  				    job->waiter, false);
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index 81ca70066c76..d48cab563d5c 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -265,6 +265,10 @@ struct host1x_job {
>  
>  	/* Fast-forward syncpoint increments on job timeout */
>  	bool syncpt_recovery;
> +
> +	/* Callback called when job is freed */
> +	void (*release)(struct host1x_job *job);
> +	void *user_data;

It's not clean to me what the user_data is used for. It's not used in
this patch at all, but perhaps it'll become relevant later on. I guess
I'll see.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 11/21] gpu: host1x: Add job release callback
@ 2021-03-23 11:55     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 11:55 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 1540 bytes --]

On Mon, Jan 11, 2021 at 03:00:09PM +0200, Mikko Perttunen wrote:
> Add a callback field to the job structure, to be called just before
> the job is to be freed. This allows the job's submitter to clean
> up any of its own state, like decrement runtime PM refcounts.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/job.c | 3 +++
>  include/linux/host1x.h   | 4 ++++
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index 8f59b34672c2..09097e19c0d0 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -79,6 +79,9 @@ static void job_free(struct kref *ref)
>  {
>  	struct host1x_job *job = container_of(ref, struct host1x_job, ref);
>  
> +	if (job->release)
> +		job->release(job);
> +
>  	if (job->waiter)
>  		host1x_intr_put_ref(job->syncpt->host, job->syncpt->id,
>  				    job->waiter, false);
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index 81ca70066c76..d48cab563d5c 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -265,6 +265,10 @@ struct host1x_job {
>  
>  	/* Fast-forward syncpoint increments on job timeout */
>  	bool syncpt_recovery;
> +
> +	/* Callback called when job is freed */
> +	void (*release)(struct host1x_job *job);
> +	void *user_data;

It's not clean to me what the user_data is used for. It's not used in
this patch at all, but perhaps it'll become relevant later on. I guess
I'll see.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-01-14 10:34           ` Mikko Perttunen
@ 2021-03-23 12:30             ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 12:30 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Dmitry Osipenko, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 2909 bytes --]

On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> > 13.01.2021 21:56, Mikko Perttunen пишет:
> > > On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> > > > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > > > +struct drm_tegra_submit_buf {
> > > > > +    /**
> > > > > +     * @mapping_id: [in]
> > > > > +     *
> > > > > +     * Identifier of the mapping to use in the submission.
> > > > > +     */
> > > > > +    __u32 mapping_id;
> > > > 
> > > > I'm now in process of trying out the UAPI using grate drivers and this
> > > > becomes the first obstacle.
> > > > 
> > > > Looks like this is not going to work well for older Tegra SoCs, in
> > > > particular for T20, which has a small GART.
> > > > 
> > > > Given that the usefulness of the partial mapping feature is very
> > > > questionable until it will be proven with a real userspace, we should
> > > > start with a dynamic mappings that are done at a time of job submission.
> > > > 
> > > > DRM already should have everything necessary for creating and managing
> > > > caches of mappings, grate kernel driver has been using drm_mm_scan for a
> > > > long time now for that.
> > > > 
> > > > It should be fine to support the static mapping feature, but it should
> > > > be done separately with the drm_mm integration, IMO.
> > > > 
> > > > What do think?
> > > > 
> > > 
> > > Can you elaborate on the requirements to be able to use GART? Are there
> > > any other reasons this would not work on older chips?
> > 
> > We have all DRM devices in a single address space on T30+, hence having
> > duplicated mappings for each device should be a bit wasteful.
> 
> I guess this should be pretty easy to change to only keep one mapping per
> GEM object.

The important point here is the semantics: this IOCTL establishes a
mapping for a given GEM object on a given channel. If the underlying
implementation is such that the mapping doesn't fit into the GART, then
that's an implementation detail that the driver needs to take care of.
Similarly, if multiple devices share a single address space, that's
something the driver already knows and can take advantage of by simply
reusing an existing mapping if one already exists. In both cases the
semantics would be correctly implemented and that's really all that
matters.

Overall this interface seems sound from a high-level point of view and
allows these mappings to be properly created even for the cases we have
where each channel may have a separate address space. It may not be the
optimal interface for all use-cases or any one individual case, but the
very nature of these interfaces is to abstract away certain differences
in order to provide a unified interface to a common programming model.
So there will always be certain tradeoffs.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-03-23 12:30             ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 12:30 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Dmitry Osipenko, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 2909 bytes --]

On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> > 13.01.2021 21:56, Mikko Perttunen пишет:
> > > On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> > > > 11.01.2021 16:00, Mikko Perttunen пишет:
> > > > > +struct drm_tegra_submit_buf {
> > > > > +    /**
> > > > > +     * @mapping_id: [in]
> > > > > +     *
> > > > > +     * Identifier of the mapping to use in the submission.
> > > > > +     */
> > > > > +    __u32 mapping_id;
> > > > 
> > > > I'm now in process of trying out the UAPI using grate drivers and this
> > > > becomes the first obstacle.
> > > > 
> > > > Looks like this is not going to work well for older Tegra SoCs, in
> > > > particular for T20, which has a small GART.
> > > > 
> > > > Given that the usefulness of the partial mapping feature is very
> > > > questionable until it will be proven with a real userspace, we should
> > > > start with a dynamic mappings that are done at a time of job submission.
> > > > 
> > > > DRM already should have everything necessary for creating and managing
> > > > caches of mappings, grate kernel driver has been using drm_mm_scan for a
> > > > long time now for that.
> > > > 
> > > > It should be fine to support the static mapping feature, but it should
> > > > be done separately with the drm_mm integration, IMO.
> > > > 
> > > > What do think?
> > > > 
> > > 
> > > Can you elaborate on the requirements to be able to use GART? Are there
> > > any other reasons this would not work on older chips?
> > 
> > We have all DRM devices in a single address space on T30+, hence having
> > duplicated mappings for each device should be a bit wasteful.
> 
> I guess this should be pretty easy to change to only keep one mapping per
> GEM object.

The important point here is the semantics: this IOCTL establishes a
mapping for a given GEM object on a given channel. If the underlying
implementation is such that the mapping doesn't fit into the GART, then
that's an implementation detail that the driver needs to take care of.
Similarly, if multiple devices share a single address space, that's
something the driver already knows and can take advantage of by simply
reusing an existing mapping if one already exists. In both cases the
semantics would be correctly implemented and that's really all that
matters.

Overall this interface seems sound from a high-level point of view and
allows these mappings to be properly created even for the cases we have
where each channel may have a separate address space. It may not be the
optimal interface for all use-cases or any one individual case, but the
very nature of these interfaces is to abstract away certain differences
in order to provide a unified interface to a common programming model.
So there will always be certain tradeoffs.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 12:35     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 12:35 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 2292 bytes --]

On Mon, Jan 11, 2021 at 03:00:16PM +0200, Mikko Perttunen wrote:
> To avoid duplication, allocate the per-engine shared channel in the
> core code instead. Once MLOCKs are implemented on Host1x side, we
> can also update this to avoid allocating a shared channel when
> MLOCKs are enabled.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>  2 files changed, 15 insertions(+)

It'd be helpful if the commit message explained what these per-engine
shared channels are used for.

> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index cd81b52a9e06..afd3f143c5e0 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
>  int tegra_drm_register_client(struct tegra_drm *tegra,
>  			      struct tegra_drm_client *client)
>  {
> +	/*
> +	 * When MLOCKs are implemented, change to allocate a shared channel
> +	 * only when MLOCKs are disabled.
> +	 */
> +	client->shared_channel = host1x_channel_request(&client->base);
> +	if (!client->shared_channel)
> +		return -EBUSY;
> +
>  	mutex_lock(&tegra->clients_lock);
>  	list_add_tail(&client->list, &tegra->clients);
>  	client->drm = tegra;
> @@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  	client->drm = NULL;
>  	mutex_unlock(&tegra->clients_lock);
>  
> +	if (client->shared_channel)
> +		host1x_channel_put(client->shared_channel);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index f38de08e0c95..0f38f159aa8e 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -87,8 +87,12 @@ struct tegra_drm_client {
>  	struct list_head list;
>  	struct tegra_drm *drm;
>  
> +	/* Set by driver */
>  	unsigned int version;
>  	const struct tegra_drm_client_ops *ops;
> +
> +	/* Set by TegraDRM core */
> +	struct host1x_channel *shared_channel;

Perhaps reorder this so that the core-initialized fields are closer to
the top and the client-initialized fields are closer to the bottom? That
seems like a more natural order.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
@ 2021-03-23 12:35     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 12:35 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 2292 bytes --]

On Mon, Jan 11, 2021 at 03:00:16PM +0200, Mikko Perttunen wrote:
> To avoid duplication, allocate the per-engine shared channel in the
> core code instead. Once MLOCKs are implemented on Host1x side, we
> can also update this to avoid allocating a shared channel when
> MLOCKs are enabled.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>  2 files changed, 15 insertions(+)

It'd be helpful if the commit message explained what these per-engine
shared channels are used for.

> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index cd81b52a9e06..afd3f143c5e0 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
>  int tegra_drm_register_client(struct tegra_drm *tegra,
>  			      struct tegra_drm_client *client)
>  {
> +	/*
> +	 * When MLOCKs are implemented, change to allocate a shared channel
> +	 * only when MLOCKs are disabled.
> +	 */
> +	client->shared_channel = host1x_channel_request(&client->base);
> +	if (!client->shared_channel)
> +		return -EBUSY;
> +
>  	mutex_lock(&tegra->clients_lock);
>  	list_add_tail(&client->list, &tegra->clients);
>  	client->drm = tegra;
> @@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  	client->drm = NULL;
>  	mutex_unlock(&tegra->clients_lock);
>  
> +	if (client->shared_channel)
> +		host1x_channel_put(client->shared_channel);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index f38de08e0c95..0f38f159aa8e 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -87,8 +87,12 @@ struct tegra_drm_client {
>  	struct list_head list;
>  	struct tegra_drm *drm;
>  
> +	/* Set by driver */
>  	unsigned int version;
>  	const struct tegra_drm_client_ops *ops;
> +
> +	/* Set by TegraDRM core */
> +	struct host1x_channel *shared_channel;

Perhaps reorder this so that the core-initialized fields are closer to
the top and the client-initialized fields are closer to the bottom? That
seems like a more natural order.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
  2021-03-23 12:35     ` Thierry Reding
@ 2021-03-23 13:15       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 13:15 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

On 3/23/21 2:35 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:16PM +0200, Mikko Perttunen wrote:
>> To avoid duplication, allocate the per-engine shared channel in the
>> core code instead. Once MLOCKs are implemented on Host1x side, we
>> can also update this to avoid allocating a shared channel when
>> MLOCKs are enabled.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
>>   drivers/gpu/drm/tegra/drm.h |  4 ++++
>>   2 files changed, 15 insertions(+)
> 
> It'd be helpful if the commit message explained what these per-engine
> shared channels are used for.

The per-engine shared channel is just the normal HW channel we are 
currently using across each userspace logical channel (per engine). In 
the future the plan is to use one HW channel for each logical channel 
opened by the userspace on Tegra186+, so this will be extended then.

I will rephrase to make it clearer.

> 
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index cd81b52a9e06..afd3f143c5e0 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
>>   int tegra_drm_register_client(struct tegra_drm *tegra,
>>   			      struct tegra_drm_client *client)
>>   {
>> +	/*
>> +	 * When MLOCKs are implemented, change to allocate a shared channel
>> +	 * only when MLOCKs are disabled.
>> +	 */
>> +	client->shared_channel = host1x_channel_request(&client->base);
>> +	if (!client->shared_channel)
>> +		return -EBUSY;
>> +
>>   	mutex_lock(&tegra->clients_lock);
>>   	list_add_tail(&client->list, &tegra->clients);
>>   	client->drm = tegra;
>> @@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>   	client->drm = NULL;
>>   	mutex_unlock(&tegra->clients_lock);
>>   
>> +	if (client->shared_channel)
>> +		host1x_channel_put(client->shared_channel);
>> +
>>   	return 0;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>> index f38de08e0c95..0f38f159aa8e 100644
>> --- a/drivers/gpu/drm/tegra/drm.h
>> +++ b/drivers/gpu/drm/tegra/drm.h
>> @@ -87,8 +87,12 @@ struct tegra_drm_client {
>>   	struct list_head list;
>>   	struct tegra_drm *drm;
>>   
>> +	/* Set by driver */
>>   	unsigned int version;
>>   	const struct tegra_drm_client_ops *ops;
>> +
>> +	/* Set by TegraDRM core */
>> +	struct host1x_channel *shared_channel;
> 
> Perhaps reorder this so that the core-initialized fields are closer to
> the top and the client-initialized fields are closer to the bottom? That
> seems like a more natural order.

Will do.

> 
> Thierry
> 

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code
@ 2021-03-23 13:15       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 13:15 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 2:35 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:16PM +0200, Mikko Perttunen wrote:
>> To avoid duplication, allocate the per-engine shared channel in the
>> core code instead. Once MLOCKs are implemented on Host1x side, we
>> can also update this to avoid allocating a shared channel when
>> MLOCKs are enabled.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/drm/tegra/drm.c | 11 +++++++++++
>>   drivers/gpu/drm/tegra/drm.h |  4 ++++
>>   2 files changed, 15 insertions(+)
> 
> It'd be helpful if the commit message explained what these per-engine
> shared channels are used for.

The per-engine shared channel is just the normal HW channel we are 
currently using across each userspace logical channel (per engine). In 
the future the plan is to use one HW channel for each logical channel 
opened by the userspace on Tegra186+, so this will be extended then.

I will rephrase to make it clearer.

> 
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index cd81b52a9e06..afd3f143c5e0 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -880,6 +880,14 @@ static const struct drm_driver tegra_drm_driver = {
>>   int tegra_drm_register_client(struct tegra_drm *tegra,
>>   			      struct tegra_drm_client *client)
>>   {
>> +	/*
>> +	 * When MLOCKs are implemented, change to allocate a shared channel
>> +	 * only when MLOCKs are disabled.
>> +	 */
>> +	client->shared_channel = host1x_channel_request(&client->base);
>> +	if (!client->shared_channel)
>> +		return -EBUSY;
>> +
>>   	mutex_lock(&tegra->clients_lock);
>>   	list_add_tail(&client->list, &tegra->clients);
>>   	client->drm = tegra;
>> @@ -896,6 +904,9 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>>   	client->drm = NULL;
>>   	mutex_unlock(&tegra->clients_lock);
>>   
>> +	if (client->shared_channel)
>> +		host1x_channel_put(client->shared_channel);
>> +
>>   	return 0;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>> index f38de08e0c95..0f38f159aa8e 100644
>> --- a/drivers/gpu/drm/tegra/drm.h
>> +++ b/drivers/gpu/drm/tegra/drm.h
>> @@ -87,8 +87,12 @@ struct tegra_drm_client {
>>   	struct list_head list;
>>   	struct tegra_drm *drm;
>>   
>> +	/* Set by driver */
>>   	unsigned int version;
>>   	const struct tegra_drm_client_ops *ops;
>> +
>> +	/* Set by TegraDRM core */
>> +	struct host1x_channel *shared_channel;
> 
> Perhaps reorder this so that the core-initialized fields are closer to
> the top and the client-initialized fields are closer to the bottom? That
> seems like a more natural order.

Will do.

> 
> Thierry
> 

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-23 10:20             ` Thierry Reding
@ 2021-03-23 13:25               ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 13:25 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

23.03.2021 13:20, Thierry Reding пишет:
> On Mon, Mar 22, 2021 at 07:01:34PM +0300, Dmitry Osipenko wrote:
>> 22.03.2021 18:19, Mikko Perttunen пишет:
>>> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>>>> 22.03.2021 17:46, Thierry Reding пишет:
>>>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>>>> To avoid false lockdep warnings, give each client lock a different
>>>>>> lock class, passed from the initialization site by macro.
>>>>>>
>>>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>>>> ---
>>>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>>>> --- a/drivers/gpu/host1x/bus.c
>>>>>> +++ b/drivers/gpu/host1x/bus.c
>>>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>>>    * device and call host1x_device_init(), which will in turn call
>>>>>> each client's
>>>>>>    * &host1x_client_ops.init implementation.
>>>>>>    */
>>>>>> -int host1x_client_register(struct host1x_client *client)
>>>>>> +int __host1x_client_register(struct host1x_client *client,
>>>>>> +               struct lock_class_key *key)
>>>>>
>>>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>>>> out of date.
>>>>>
>>>>>>   {
>>>>>>       struct host1x *host1x;
>>>>>>       int err;
>>>>>>         INIT_LIST_HEAD(&client->list);
>>>>>> -    mutex_init(&client->lock);
>>>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>>>
>>>>> Should we maybe attempt to make this unique? Could we use something like
>>>>> dev_name(client->dev) for this?
>>>>
>>>> I'm curious who the lockdep warning could be triggered at all, I don't
>>>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>>>> the warning?
>>>>
>>>
>>> This is pretty difficult to read but I guess it's some interaction
>>> related to the delayed initialization of host1x clients? In any case, I
>>> consistently get it at boot (though it may be triggered by vic probe
>>> instead of nvdec).
>>>
>>> I'll fix the kbuild robot warnings and see if I can add a
>>> client-specific lock name for v6.
>>
>> Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.
>>
...
> Sounds like we should decouple this from the series and fast-track this
> for v5.13, or perhaps even v5.12 along with the DC coupling fix?

Agree that the patch should be decoupled since it fixes a standalone
problem that isn't related to the rest of the patches.

It also should be good to have it backported, although this is optional.
If there are no merge conflicts with stable kernels for this patch, then
I'd add a stable tag to it.

Mikko, please update this patch and send it separately.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-23 13:25               ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 13:25 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

23.03.2021 13:20, Thierry Reding пишет:
> On Mon, Mar 22, 2021 at 07:01:34PM +0300, Dmitry Osipenko wrote:
>> 22.03.2021 18:19, Mikko Perttunen пишет:
>>> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>>>> 22.03.2021 17:46, Thierry Reding пишет:
>>>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>>>> To avoid false lockdep warnings, give each client lock a different
>>>>>> lock class, passed from the initialization site by macro.
>>>>>>
>>>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>>>> ---
>>>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>>>> --- a/drivers/gpu/host1x/bus.c
>>>>>> +++ b/drivers/gpu/host1x/bus.c
>>>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>>>    * device and call host1x_device_init(), which will in turn call
>>>>>> each client's
>>>>>>    * &host1x_client_ops.init implementation.
>>>>>>    */
>>>>>> -int host1x_client_register(struct host1x_client *client)
>>>>>> +int __host1x_client_register(struct host1x_client *client,
>>>>>> +               struct lock_class_key *key)
>>>>>
>>>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>>>> out of date.
>>>>>
>>>>>>   {
>>>>>>       struct host1x *host1x;
>>>>>>       int err;
>>>>>>         INIT_LIST_HEAD(&client->list);
>>>>>> -    mutex_init(&client->lock);
>>>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>>>
>>>>> Should we maybe attempt to make this unique? Could we use something like
>>>>> dev_name(client->dev) for this?
>>>>
>>>> I'm curious who the lockdep warning could be triggered at all, I don't
>>>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>>>> the warning?
>>>>
>>>
>>> This is pretty difficult to read but I guess it's some interaction
>>> related to the delayed initialization of host1x clients? In any case, I
>>> consistently get it at boot (though it may be triggered by vic probe
>>> instead of nvdec).
>>>
>>> I'll fix the kbuild robot warnings and see if I can add a
>>> client-specific lock name for v6.
>>
>> Thank you for the clarification! We now actually have a similar problem on Tegra20 after fixing the coupling of display controllers using the dc1_client->parent=dc0_client and I see the same warning when DC1 is enabled.
>>
...
> Sounds like we should decouple this from the series and fast-track this
> for v5.13, or perhaps even v5.12 along with the DC coupling fix?

Agree that the patch should be decoupled since it fixes a standalone
problem that isn't related to the rest of the patches.

It also should be good to have it backported, although this is optional.
If there are no merge conflicts with stable kernels for this patch, then
I'd add a stable tag to it.

Mikko, please update this patch and send it separately.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 13:25     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 13:25 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 21556 bytes --]

On Mon, Jan 11, 2021 at 03:00:17PM +0200, Mikko Perttunen wrote:
> Implement the non-submission parts of the new UAPI, including
> channel management and memory mapping. The UAPI is under the
> CONFIG_DRM_TEGRA_STAGING config flag for now.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Set iova_end in both mapping paths
> v4:
> * New patch, split out from combined UAPI + submit patch.
> ---
>  drivers/gpu/drm/tegra/Makefile    |   1 +
>  drivers/gpu/drm/tegra/drm.c       |  41 ++--
>  drivers/gpu/drm/tegra/drm.h       |   5 +
>  drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
>  drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++

I'd prefer if we kept the directory structure flat. There's something
like 19 pairs of files in the top-level directory, which is reasonably
manageable. Also, it looks like there's going to be a couple more files
in this new subdirectory. I'd prefer if that was all merged into the
single uapi.c source file to keep things simpler. These are all really
small files, so there's no need to aggressively split things up. Helps
with compilation time, too.

FWIW, I would've been fine with stashing all of this into drm.c as well
since the rest of the UAPI is in that already. The churn in this patch
is reasonably small, but it would've been even less if this was just all
in drm.c.

>  5 files changed, 401 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/gpu/drm/tegra/uapi.h
>  create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
> 
> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> index d6cf202414f0..0abdb21b38b9 100644
> --- a/drivers/gpu/drm/tegra/Makefile
> +++ b/drivers/gpu/drm/tegra/Makefile
> @@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>  
>  tegra-drm-y := \
>  	drm.o \
> +	uapi/uapi.o \
>  	gem.o \
>  	fb.o \
>  	dp.o \
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index afd3f143c5e0..6a51035ce33f 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -20,6 +20,7 @@
>  #include <drm/drm_prime.h>
>  #include <drm/drm_vblank.h>
>  
> +#include "uapi.h"
>  #include "drm.h"
>  #include "gem.h"
>  
> @@ -33,11 +34,6 @@
>  #define CARVEOUT_SZ SZ_64M
>  #define CDMA_GATHER_FETCHES_MAX_NB 16383
>  
> -struct tegra_drm_file {
> -	struct idr contexts;
> -	struct mutex lock;
> -};
> -
>  static int tegra_atomic_check(struct drm_device *drm,
>  			      struct drm_atomic_state *state)
>  {
> @@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  	if (!fpriv)
>  		return -ENOMEM;
>  
> -	idr_init_base(&fpriv->contexts, 1);
> +	idr_init_base(&fpriv->legacy_contexts, 1);
> +	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
>  	mutex_init(&fpriv->lock);
>  	filp->driver_priv = fpriv;
>  
> @@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
>  	if (err < 0)
>  		return err;
>  
> -	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
> +	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
>  	if (err < 0) {
>  		client->ops->close_channel(context);
>  		return err;
> @@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -EINVAL;
>  		goto unlock;
>  	}
>  
> -	idr_remove(&fpriv->contexts, context->id);
> +	idr_remove(&fpriv->legacy_contexts, context->id);
>  	tegra_drm_context_free(context);
>  
>  unlock:
> @@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
>  
>  static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>  #ifdef CONFIG_DRM_TEGRA_STAGING
> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
> +			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
> +			  DRM_RENDER_ALLOW),

I'd prefer to keep call these TEGRA_OPEN_CHANNEL and TEGRA_CLOSE_CHANNEL
because I find that easier to think of. My reasoning goes: the TEGRA_
prefix means we're operating at a global context and then we perform the
OPEN_CHANNEL and CLOSE_CHANNEL operations. Whereas by the same reasoning
TEGRA_CHANNEL_OPEN and TEGRA_CHANNEL_CLOSE suggest we're operating at
the channel context and perform OPEN and CLOSE operations. For close you
could make the argument that it makes sense, but you can't open a
channel that you don't have yet.

And if that doesn't convince you, I think appending _LEGACY here like we
do for CREATE and MMAP would be more consistent. Who's going to remember
which one is new: TEGRA_CHANNEL_OPEN or TEGRA_OPEN_CHANNEL?

> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
>  			  DRM_RENDER_ALLOW),
> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>  			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
> +			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> +			  DRM_RENDER_ALLOW),
> +
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
> @@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
>  	struct tegra_drm_file *fpriv = file->driver_priv;
>  
>  	mutex_lock(&fpriv->lock);
> -	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
> +	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
> +	tegra_drm_uapi_close_file(fpriv);
>  	mutex_unlock(&fpriv->lock);
>  
> -	idr_destroy(&fpriv->contexts);
> +	idr_destroy(&fpriv->legacy_contexts);
>  	mutex_destroy(&fpriv->lock);
>  	kfree(fpriv);
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 0f38f159aa8e..1af57c2016eb 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -59,6 +59,11 @@ struct tegra_drm {
>  	struct tegra_display_hub *hub;
>  };
>  
> +static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
> +{
> +	return dev_get_drvdata(tegra->drm->dev->parent);
> +}
> +
>  struct tegra_drm_client;
>  
>  struct tegra_drm_context {
> diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
> new file mode 100644
> index 000000000000..5c422607e8fa
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _TEGRA_DRM_UAPI_H
> +#define _TEGRA_DRM_UAPI_H
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/kref.h>
> +#include <linux/xarray.h>
> +
> +#include <drm/drm.h>
> +
> +struct drm_file;
> +struct drm_device;
> +
> +struct tegra_drm_file {
> +	/* Legacy UAPI state */
> +	struct idr legacy_contexts;
> +	struct mutex lock;
> +
> +	/* New UAPI state */
> +	struct xarray contexts;
> +};
> +
> +struct tegra_drm_channel_ctx {
> +	struct tegra_drm_client *client;
> +	struct host1x_channel *channel;
> +	struct xarray mappings;
> +};

This is mostly the same as tegra_drm_context, so can't we just merge the
two? There's going to be slight overlap, but overall things are going to
be less confusing to follow.

Even more so because I think we should consider phasing out the old UAPI
eventually and then we can just remove the unneeded fields from this.

> +
> +struct tegra_drm_mapping {
> +	struct kref ref;
> +
> +	struct device *dev;
> +	struct host1x_bo *bo;
> +	struct sg_table *sgt;
> +	enum dma_data_direction direction;
> +	dma_addr_t iova;
> +	dma_addr_t iova_end;

iova_end seems to never be used. Do we need it?

> +};
> +
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file);
> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
> +				  struct drm_file *file);
> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
> +				   struct drm_file *file);
> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +
> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
> +struct tegra_drm_channel_ctx *
> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
> +
> +#endif
> diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
> new file mode 100644
> index 000000000000..d503b5e817c4
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/uapi.c
> @@ -0,0 +1,307 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/host1x.h>
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +
> +#include "../uapi.h"
> +#include "../drm.h"
> +
> +struct tegra_drm_channel_ctx *
> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
> +{
> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	mutex_lock(&file->lock);
> +	ctx = xa_load(&file->contexts, id);
> +	if (!ctx)
> +		mutex_unlock(&file->lock);
> +
> +	return ctx;
> +}

This interface seems slightly odd. Looking at how this is used I see how
doing it this way saves a couple of lines. However, it also make this
difficult to understand, so I wonder if it wouldn't be better to just
open-code this in the three callsites to make the code flow a bit more
idiomatic.

> +
> +static void tegra_drm_mapping_release(struct kref *ref)
> +{
> +	struct tegra_drm_mapping *mapping =
> +		container_of(ref, struct tegra_drm_mapping, ref);
> +
> +	if (mapping->sgt)
> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
> +
> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
> +	host1x_bo_put(mapping->bo);
> +
> +	kfree(mapping);
> +}
> +
> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
> +{
> +	kref_put(&mapping->ref, tegra_drm_mapping_release);
> +}
> +
> +static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)

Yeah, the more often I read it, the more I'm in favour of just
collapsing tegra_drm_channel_ctx into tegra_drm_channel if for nothing
else but to get rid of that annoying _ctx suffix that's there for no
other reason than to differentiate it from "legacy" contexts.

> +{
> +	unsigned long mapping_id;

It's clear from the context that this is a mapping ID, so I think you
can just leave out the "mapping_" prefix to save a bit on screen space.

> +	struct tegra_drm_mapping *mapping;
> +
> +	xa_for_each(&ctx->mappings, mapping_id, mapping)
> +		tegra_drm_mapping_put(mapping);
> +
> +	xa_destroy(&ctx->mappings);
> +
> +	host1x_channel_put(ctx->channel);
> +
> +	kfree(ctx);
> +}
> +
> +int close_channel_ctx(int id, void *p, void *data)
> +{
> +	struct tegra_drm_channel_ctx *ctx = p;
> +
> +	tegra_drm_channel_ctx_close(ctx);
> +
> +	return 0;
> +}

The signature looked strange, so I went looking for where this is called
from and turns out I can't find any place where this is used. Do we need
it?

> +
> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
> +{
> +	unsigned long ctx_id;

Just like for mappings above, I think it's fine to leave out the ctx_
prefix here.

> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	xa_for_each(&file->contexts, ctx_id, ctx)
> +		tegra_drm_channel_ctx_close(ctx);
> +
> +	xa_destroy(&file->contexts);
> +}
> +
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;

This type of construct looks weird. I found that a good way around this
is to split this off into a separate function that does the lookup and
just returns NULL when it doesn't find one, which is very elegant:

	struct tegra_drm_client *tegra_drm_find_client(struct tegra_drm *tegra, u32 class)
	{
		struct tegra_drm_client *client;

		list_for_each_entry(client, &tegra->clients, list)
			if (client->base.class == class)
				return client;

		return NULL;
	}

and then all of a sudden, the very cumbersome construct above becomes
this pretty piece of code:

	client = tegra_drm_find_client(tegra, args->host1x_class);
	if (!client) {
		err = -ENODEV;
		goto free_ctx;
	}

No need for initializing client to NULL or preventatively setting err =
-ENODEV or anything.

> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);
> +	} else {
> +		ctx->channel = host1x_channel_request(&client->base);
> +		if (!ctx->channel) {
> +			err = -EBUSY;

I -EBUSY really appropriate here? Can host1x_channel_request() fail for
other reasons?

> +			goto free_ctx;
> +		}
> +	}
> +
> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0)
> +		goto put_channel;
> +
> +	ctx->client = client;
> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
> +
> +	args->hardware_version = client->version;
> +
> +	return 0;
> +
> +put_channel:
> +	host1x_channel_put(ctx->channel);
> +free_ctx:
> +	kfree(ctx);
> +
> +	return err;
> +}
> +
> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
> +				  struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_close *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	xa_erase(&fpriv->contexts, args->channel_ctx);
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	tegra_drm_channel_ctx_close(ctx);
> +
> +	return 0;
> +}
> +
> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> +				struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_map *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct tegra_drm_mapping *mapping;
> +	struct drm_gem_object *gem;
> +	u32 mapping_id;
> +	int err = 0;
> +
> +	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
> +		return -EINVAL;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
> +	if (!mapping) {
> +		err = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	kref_init(&mapping->ref);
> +
> +	gem = drm_gem_object_lookup(file, args->handle);
> +	if (!gem) {
> +		err = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	mapping->dev = ctx->client->base.dev;
> +	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;

We already have host1x_bo_lookup() in drm.c that you can use to avoid
this strange cast.

> +
> +	if (!iommu_get_domain_for_dev(mapping->dev) ||
> +	    ctx->client->base.group) {

This expression is now used in at least two places, so I wonder if we
should have a helper for it along with some documentation about why this
is the right thing to do. I have a local patch that adds a comment to
the other instance of this because I had forgotten why this was correct,
so I can pick that up and refactor later on.

> +		host1x_bo_pin(mapping->dev, mapping->bo,
> +			      &mapping->iova);
> +	} else {
> +		mapping->direction = DMA_TO_DEVICE;
> +		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
> +			mapping->direction = DMA_BIDIRECTIONAL;
> +
> +		mapping->sgt =
> +			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
> +		if (IS_ERR(mapping->sgt)) {
> +			err = PTR_ERR(mapping->sgt);
> +			goto put_gem;
> +		}
> +
> +		err = dma_map_sgtable(mapping->dev, mapping->sgt,
> +				      mapping->direction,
> +				      DMA_ATTR_SKIP_CPU_SYNC);
> +		if (err)
> +			goto unpin;
> +
> +		/* TODO only map the requested part */
> +		mapping->iova = sg_dma_address(mapping->sgt->sgl);

That comment seems misplaced here since the mapping already happens
above. Also, wouldn't the same TODO apply to the host1x_bo_pin() path in
the if block? Maybe the TODO should be at the top of the function?

Alternatively, if this isn't implemented in this patch anyway, maybe
just drop the comment altogether. In order to implement this, wouldn't
the UAPI have to change as well? In that case it might be better to add
the TODO somewhere in the UAPI header, or in a separate TODO file in the
driver's directory.

> +	}
> +
> +	mapping->iova_end = mapping->iova + gem->size;
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0)
> +		goto unmap;
> +
> +	args->mapping_id = mapping_id;
> +
> +	return 0;
> +
> +unmap:
> +	if (mapping->sgt) {
> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
> +	}
> +unpin:
> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
> +put_gem:
> +	drm_gem_object_put(gem);
> +	kfree(mapping);
> +unlock:
> +	mutex_unlock(&fpriv->lock);
> +	return err;
> +}
> +
> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
> +				  struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_unmap *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct tegra_drm_mapping *mapping;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	mapping = xa_erase(&ctx->mappings, args->mapping_id);
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	if (mapping) {
> +		tegra_drm_mapping_put(mapping);
> +		return 0;
> +	} else {
> +		return -EINVAL;
> +	}
> +}
> +
> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
> +			       struct drm_file *file)
> +{
> +	struct drm_tegra_gem_create *args = data;
> +	struct tegra_bo *bo;
> +
> +	if (args->flags)
> +		return -EINVAL;

I'm not sure it's worth doing this, especially because this is now a new
IOCTL that's actually a subset of the original. I think we should just
keep the original and if we want to deprecate the flags, or replace them
with new ones, let's just try and phase out the deprecated ones.

Thierry

> +
> +	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
> +					 &args->handle);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
> +
> +	return 0;
> +}
> +
> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_tegra_gem_mmap *args = data;
> +	struct drm_gem_object *gem;
> +	struct tegra_bo *bo;
> +
> +	gem = drm_gem_object_lookup(file, args->handle);
> +	if (!gem)
> +		return -EINVAL;
> +
> +	bo = to_tegra_bo(gem);
> +
> +	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
> +
> +	drm_gem_object_put(gem);
> +
> +	return 0;
> +}
> -- 
> 2.30.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-03-23 13:25     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 13:25 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 21556 bytes --]

On Mon, Jan 11, 2021 at 03:00:17PM +0200, Mikko Perttunen wrote:
> Implement the non-submission parts of the new UAPI, including
> channel management and memory mapping. The UAPI is under the
> CONFIG_DRM_TEGRA_STAGING config flag for now.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Set iova_end in both mapping paths
> v4:
> * New patch, split out from combined UAPI + submit patch.
> ---
>  drivers/gpu/drm/tegra/Makefile    |   1 +
>  drivers/gpu/drm/tegra/drm.c       |  41 ++--
>  drivers/gpu/drm/tegra/drm.h       |   5 +
>  drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
>  drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++

I'd prefer if we kept the directory structure flat. There's something
like 19 pairs of files in the top-level directory, which is reasonably
manageable. Also, it looks like there's going to be a couple more files
in this new subdirectory. I'd prefer if that was all merged into the
single uapi.c source file to keep things simpler. These are all really
small files, so there's no need to aggressively split things up. Helps
with compilation time, too.

FWIW, I would've been fine with stashing all of this into drm.c as well
since the rest of the UAPI is in that already. The churn in this patch
is reasonably small, but it would've been even less if this was just all
in drm.c.

>  5 files changed, 401 insertions(+), 16 deletions(-)
>  create mode 100644 drivers/gpu/drm/tegra/uapi.h
>  create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
> 
> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> index d6cf202414f0..0abdb21b38b9 100644
> --- a/drivers/gpu/drm/tegra/Makefile
> +++ b/drivers/gpu/drm/tegra/Makefile
> @@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>  
>  tegra-drm-y := \
>  	drm.o \
> +	uapi/uapi.o \
>  	gem.o \
>  	fb.o \
>  	dp.o \
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index afd3f143c5e0..6a51035ce33f 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -20,6 +20,7 @@
>  #include <drm/drm_prime.h>
>  #include <drm/drm_vblank.h>
>  
> +#include "uapi.h"
>  #include "drm.h"
>  #include "gem.h"
>  
> @@ -33,11 +34,6 @@
>  #define CARVEOUT_SZ SZ_64M
>  #define CDMA_GATHER_FETCHES_MAX_NB 16383
>  
> -struct tegra_drm_file {
> -	struct idr contexts;
> -	struct mutex lock;
> -};
> -
>  static int tegra_atomic_check(struct drm_device *drm,
>  			      struct drm_atomic_state *state)
>  {
> @@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  	if (!fpriv)
>  		return -ENOMEM;
>  
> -	idr_init_base(&fpriv->contexts, 1);
> +	idr_init_base(&fpriv->legacy_contexts, 1);
> +	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
>  	mutex_init(&fpriv->lock);
>  	filp->driver_priv = fpriv;
>  
> @@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
>  	if (err < 0)
>  		return err;
>  
> -	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
> +	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
>  	if (err < 0) {
>  		client->ops->close_channel(context);
>  		return err;
> @@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -EINVAL;
>  		goto unlock;
>  	}
>  
> -	idr_remove(&fpriv->contexts, context->id);
> +	idr_remove(&fpriv->legacy_contexts, context->id);
>  	tegra_drm_context_free(context);
>  
>  unlock:
> @@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
>  
>  	mutex_lock(&fpriv->lock);
>  
> -	context = idr_find(&fpriv->contexts, args->context);
> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>  	if (!context) {
>  		err = -ENODEV;
>  		goto unlock;
> @@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
>  
>  static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>  #ifdef CONFIG_DRM_TEGRA_STAGING
> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
> +			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
> +			  DRM_RENDER_ALLOW),

I'd prefer to keep call these TEGRA_OPEN_CHANNEL and TEGRA_CLOSE_CHANNEL
because I find that easier to think of. My reasoning goes: the TEGRA_
prefix means we're operating at a global context and then we perform the
OPEN_CHANNEL and CLOSE_CHANNEL operations. Whereas by the same reasoning
TEGRA_CHANNEL_OPEN and TEGRA_CHANNEL_CLOSE suggest we're operating at
the channel context and perform OPEN and CLOSE operations. For close you
could make the argument that it makes sense, but you can't open a
channel that you don't have yet.

And if that doesn't convince you, I think appending _LEGACY here like we
do for CREATE and MMAP would be more consistent. Who's going to remember
which one is new: TEGRA_CHANNEL_OPEN or TEGRA_OPEN_CHANNEL?

> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
>  			  DRM_RENDER_ALLOW),
> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>  			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
> +			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> +			  DRM_RENDER_ALLOW),
> +
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
> @@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
>  	struct tegra_drm_file *fpriv = file->driver_priv;
>  
>  	mutex_lock(&fpriv->lock);
> -	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
> +	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
> +	tegra_drm_uapi_close_file(fpriv);
>  	mutex_unlock(&fpriv->lock);
>  
> -	idr_destroy(&fpriv->contexts);
> +	idr_destroy(&fpriv->legacy_contexts);
>  	mutex_destroy(&fpriv->lock);
>  	kfree(fpriv);
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 0f38f159aa8e..1af57c2016eb 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -59,6 +59,11 @@ struct tegra_drm {
>  	struct tegra_display_hub *hub;
>  };
>  
> +static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
> +{
> +	return dev_get_drvdata(tegra->drm->dev->parent);
> +}
> +
>  struct tegra_drm_client;
>  
>  struct tegra_drm_context {
> diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
> new file mode 100644
> index 000000000000..5c422607e8fa
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi.h
> @@ -0,0 +1,63 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _TEGRA_DRM_UAPI_H
> +#define _TEGRA_DRM_UAPI_H
> +
> +#include <linux/dma-mapping.h>
> +#include <linux/idr.h>
> +#include <linux/kref.h>
> +#include <linux/xarray.h>
> +
> +#include <drm/drm.h>
> +
> +struct drm_file;
> +struct drm_device;
> +
> +struct tegra_drm_file {
> +	/* Legacy UAPI state */
> +	struct idr legacy_contexts;
> +	struct mutex lock;
> +
> +	/* New UAPI state */
> +	struct xarray contexts;
> +};
> +
> +struct tegra_drm_channel_ctx {
> +	struct tegra_drm_client *client;
> +	struct host1x_channel *channel;
> +	struct xarray mappings;
> +};

This is mostly the same as tegra_drm_context, so can't we just merge the
two? There's going to be slight overlap, but overall things are going to
be less confusing to follow.

Even more so because I think we should consider phasing out the old UAPI
eventually and then we can just remove the unneeded fields from this.

> +
> +struct tegra_drm_mapping {
> +	struct kref ref;
> +
> +	struct device *dev;
> +	struct host1x_bo *bo;
> +	struct sg_table *sgt;
> +	enum dma_data_direction direction;
> +	dma_addr_t iova;
> +	dma_addr_t iova_end;

iova_end seems to never be used. Do we need it?

> +};
> +
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file);
> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
> +				  struct drm_file *file);
> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
> +				   struct drm_file *file);
> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
> +				struct drm_file *file);
> +
> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
> +struct tegra_drm_channel_ctx *
> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
> +
> +#endif
> diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
> new file mode 100644
> index 000000000000..d503b5e817c4
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/uapi.c
> @@ -0,0 +1,307 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/host1x.h>
> +#include <linux/iommu.h>
> +#include <linux/list.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +
> +#include "../uapi.h"
> +#include "../drm.h"
> +
> +struct tegra_drm_channel_ctx *
> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
> +{
> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	mutex_lock(&file->lock);
> +	ctx = xa_load(&file->contexts, id);
> +	if (!ctx)
> +		mutex_unlock(&file->lock);
> +
> +	return ctx;
> +}

This interface seems slightly odd. Looking at how this is used I see how
doing it this way saves a couple of lines. However, it also make this
difficult to understand, so I wonder if it wouldn't be better to just
open-code this in the three callsites to make the code flow a bit more
idiomatic.

> +
> +static void tegra_drm_mapping_release(struct kref *ref)
> +{
> +	struct tegra_drm_mapping *mapping =
> +		container_of(ref, struct tegra_drm_mapping, ref);
> +
> +	if (mapping->sgt)
> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
> +
> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
> +	host1x_bo_put(mapping->bo);
> +
> +	kfree(mapping);
> +}
> +
> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
> +{
> +	kref_put(&mapping->ref, tegra_drm_mapping_release);
> +}
> +
> +static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)

Yeah, the more often I read it, the more I'm in favour of just
collapsing tegra_drm_channel_ctx into tegra_drm_channel if for nothing
else but to get rid of that annoying _ctx suffix that's there for no
other reason than to differentiate it from "legacy" contexts.

> +{
> +	unsigned long mapping_id;

It's clear from the context that this is a mapping ID, so I think you
can just leave out the "mapping_" prefix to save a bit on screen space.

> +	struct tegra_drm_mapping *mapping;
> +
> +	xa_for_each(&ctx->mappings, mapping_id, mapping)
> +		tegra_drm_mapping_put(mapping);
> +
> +	xa_destroy(&ctx->mappings);
> +
> +	host1x_channel_put(ctx->channel);
> +
> +	kfree(ctx);
> +}
> +
> +int close_channel_ctx(int id, void *p, void *data)
> +{
> +	struct tegra_drm_channel_ctx *ctx = p;
> +
> +	tegra_drm_channel_ctx_close(ctx);
> +
> +	return 0;
> +}

The signature looked strange, so I went looking for where this is called
from and turns out I can't find any place where this is used. Do we need
it?

> +
> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
> +{
> +	unsigned long ctx_id;

Just like for mappings above, I think it's fine to leave out the ctx_
prefix here.

> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	xa_for_each(&file->contexts, ctx_id, ctx)
> +		tegra_drm_channel_ctx_close(ctx);
> +
> +	xa_destroy(&file->contexts);
> +}
> +
> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
> +				 struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct tegra_drm *tegra = drm->dev_private;
> +	struct drm_tegra_channel_open *args = data;
> +	struct tegra_drm_client *client = NULL;
> +	struct tegra_drm_channel_ctx *ctx;
> +	int err;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = -ENODEV;
> +	list_for_each_entry(client, &tegra->clients, list) {
> +		if (client->base.class == args->host1x_class) {
> +			err = 0;
> +			break;
> +		}
> +	}
> +	if (err)
> +		goto free_ctx;

This type of construct looks weird. I found that a good way around this
is to split this off into a separate function that does the lookup and
just returns NULL when it doesn't find one, which is very elegant:

	struct tegra_drm_client *tegra_drm_find_client(struct tegra_drm *tegra, u32 class)
	{
		struct tegra_drm_client *client;

		list_for_each_entry(client, &tegra->clients, list)
			if (client->base.class == class)
				return client;

		return NULL;
	}

and then all of a sudden, the very cumbersome construct above becomes
this pretty piece of code:

	client = tegra_drm_find_client(tegra, args->host1x_class);
	if (!client) {
		err = -ENODEV;
		goto free_ctx;
	}

No need for initializing client to NULL or preventatively setting err =
-ENODEV or anything.

> +
> +	if (client->shared_channel) {
> +		ctx->channel = host1x_channel_get(client->shared_channel);
> +	} else {
> +		ctx->channel = host1x_channel_request(&client->base);
> +		if (!ctx->channel) {
> +			err = -EBUSY;

I -EBUSY really appropriate here? Can host1x_channel_request() fail for
other reasons?

> +			goto free_ctx;
> +		}
> +	}
> +
> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0)
> +		goto put_channel;
> +
> +	ctx->client = client;
> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
> +
> +	args->hardware_version = client->version;
> +
> +	return 0;
> +
> +put_channel:
> +	host1x_channel_put(ctx->channel);
> +free_ctx:
> +	kfree(ctx);
> +
> +	return err;
> +}
> +
> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
> +				  struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_close *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	xa_erase(&fpriv->contexts, args->channel_ctx);
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	tegra_drm_channel_ctx_close(ctx);
> +
> +	return 0;
> +}
> +
> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
> +				struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_map *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct tegra_drm_mapping *mapping;
> +	struct drm_gem_object *gem;
> +	u32 mapping_id;
> +	int err = 0;
> +
> +	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
> +		return -EINVAL;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
> +	if (!mapping) {
> +		err = -ENOMEM;
> +		goto unlock;
> +	}
> +
> +	kref_init(&mapping->ref);
> +
> +	gem = drm_gem_object_lookup(file, args->handle);
> +	if (!gem) {
> +		err = -EINVAL;
> +		goto unlock;
> +	}
> +
> +	mapping->dev = ctx->client->base.dev;
> +	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;

We already have host1x_bo_lookup() in drm.c that you can use to avoid
this strange cast.

> +
> +	if (!iommu_get_domain_for_dev(mapping->dev) ||
> +	    ctx->client->base.group) {

This expression is now used in at least two places, so I wonder if we
should have a helper for it along with some documentation about why this
is the right thing to do. I have a local patch that adds a comment to
the other instance of this because I had forgotten why this was correct,
so I can pick that up and refactor later on.

> +		host1x_bo_pin(mapping->dev, mapping->bo,
> +			      &mapping->iova);
> +	} else {
> +		mapping->direction = DMA_TO_DEVICE;
> +		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
> +			mapping->direction = DMA_BIDIRECTIONAL;
> +
> +		mapping->sgt =
> +			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
> +		if (IS_ERR(mapping->sgt)) {
> +			err = PTR_ERR(mapping->sgt);
> +			goto put_gem;
> +		}
> +
> +		err = dma_map_sgtable(mapping->dev, mapping->sgt,
> +				      mapping->direction,
> +				      DMA_ATTR_SKIP_CPU_SYNC);
> +		if (err)
> +			goto unpin;
> +
> +		/* TODO only map the requested part */
> +		mapping->iova = sg_dma_address(mapping->sgt->sgl);

That comment seems misplaced here since the mapping already happens
above. Also, wouldn't the same TODO apply to the host1x_bo_pin() path in
the if block? Maybe the TODO should be at the top of the function?

Alternatively, if this isn't implemented in this patch anyway, maybe
just drop the comment altogether. In order to implement this, wouldn't
the UAPI have to change as well? In that case it might be better to add
the TODO somewhere in the UAPI header, or in a separate TODO file in the
driver's directory.

> +	}
> +
> +	mapping->iova_end = mapping->iova + gem->size;
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
> +	if (err < 0)
> +		goto unmap;
> +
> +	args->mapping_id = mapping_id;
> +
> +	return 0;
> +
> +unmap:
> +	if (mapping->sgt) {
> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
> +	}
> +unpin:
> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
> +put_gem:
> +	drm_gem_object_put(gem);
> +	kfree(mapping);
> +unlock:
> +	mutex_unlock(&fpriv->lock);
> +	return err;
> +}
> +
> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
> +				  struct drm_file *file)
> +{
> +	struct tegra_drm_file *fpriv = file->driver_priv;
> +	struct drm_tegra_channel_unmap *args = data;
> +	struct tegra_drm_channel_ctx *ctx;
> +	struct tegra_drm_mapping *mapping;
> +
> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
> +	if (!ctx)
> +		return -EINVAL;
> +
> +	mapping = xa_erase(&ctx->mappings, args->mapping_id);
> +
> +	mutex_unlock(&fpriv->lock);
> +
> +	if (mapping) {
> +		tegra_drm_mapping_put(mapping);
> +		return 0;
> +	} else {
> +		return -EINVAL;
> +	}
> +}
> +
> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
> +			       struct drm_file *file)
> +{
> +	struct drm_tegra_gem_create *args = data;
> +	struct tegra_bo *bo;
> +
> +	if (args->flags)
> +		return -EINVAL;

I'm not sure it's worth doing this, especially because this is now a new
IOCTL that's actually a subset of the original. I think we should just
keep the original and if we want to deprecate the flags, or replace them
with new ones, let's just try and phase out the deprecated ones.

Thierry

> +
> +	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
> +					 &args->handle);
> +	if (IS_ERR(bo))
> +		return PTR_ERR(bo);
> +
> +	return 0;
> +}
> +
> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_tegra_gem_mmap *args = data;
> +	struct drm_gem_object *gem;
> +	struct tegra_bo *bo;
> +
> +	gem = drm_gem_object_lookup(file, args->handle);
> +	if (!gem)
> +		return -EINVAL;
> +
> +	bo = to_tegra_bo(gem);
> +
> +	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
> +
> +	drm_gem_object_put(gem);
> +
> +	return 0;
> +}
> -- 
> 2.30.0
> 

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
  2021-01-11 13:00   ` Mikko Perttunen
@ 2021-03-23 13:38     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 13:38 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 10608 bytes --]

On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
> Implement the job submission IOCTL with a minimum feature set.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Add 16K size limit to copies from userspace.
> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>   to prevent oversized shift on 32-bit platforms.
> v4:
> * Remove all features that are not strictly necessary.
> * Split into two patches.
> v3:
> * Remove WRITE_RELOC. Relocations are now patched implicitly
>   when patching is needed.
> * Directly call PM runtime APIs on devices instead of using
>   power_on/power_off callbacks.
> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> * Accommodate for removal of timeout field and inlining of
>   syncpt_incrs array.
> * Copy entire user arrays at a time instead of going through
>   elements one-by-one.
> * Implement waiting of DMA reservations.
> * Split out gather_bo implementation into a separate file.
> * Fix length parameter passed to sg_init_one in gather_bo
> * Cosmetic cleanup.
> ---
>  drivers/gpu/drm/tegra/Makefile         |   2 +
>  drivers/gpu/drm/tegra/drm.c            |   2 +
>  drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
>  drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
>  drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
>  drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
>  6 files changed, 557 insertions(+)
>  create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
>  create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
>  create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
>  create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
> 
> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> index 0abdb21b38b9..059322e88943 100644
> --- a/drivers/gpu/drm/tegra/Makefile
> +++ b/drivers/gpu/drm/tegra/Makefile
> @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>  tegra-drm-y := \
>  	drm.o \
>  	uapi/uapi.o \
> +	uapi/submit.o \
> +	uapi/gather_bo.o \
>  	gem.o \
>  	fb.o \
>  	dp.o \
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 6a51035ce33f..60eab403ae9b 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>  			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
> +			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> new file mode 100644
> index 000000000000..b487a0d44648
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +
> +#include "gather_bo.h"
> +
> +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	kref_get(&bo->ref);
> +
> +	return host_bo;
> +}
> +
> +static void gather_bo_release(struct kref *ref)
> +{
> +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
> +
> +	kfree(bo->gather_data);
> +	kfree(bo);
> +}
> +
> +void gather_bo_put(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	kref_put(&bo->ref, gather_bo_release);
> +}
> +
> +static struct sg_table *
> +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +	struct sg_table *sgt;
> +	int err;
> +
> +	if (phys) {
> +		*phys = virt_to_phys(bo->gather_data);
> +		return NULL;
> +	}
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
> +	if (err) {
> +		kfree(sgt);
> +		return ERR_PTR(err);
> +	}
> +
> +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
> +
> +	return sgt;
> +}
> +
> +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
> +{
> +	if (sgt) {
> +		sg_free_table(sgt);
> +		kfree(sgt);
> +	}
> +}
> +
> +static void *gather_bo_mmap(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	return bo->gather_data;
> +}
> +
> +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
> +{
> +}
> +
> +const struct host1x_bo_ops gather_bo_ops = {
> +	.get = gather_bo_get,
> +	.put = gather_bo_put,
> +	.pin = gather_bo_pin,
> +	.unpin = gather_bo_unpin,
> +	.mmap = gather_bo_mmap,
> +	.munmap = gather_bo_munmap,
> +};
> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> new file mode 100644
> index 000000000000..6b4c9d83ac91
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
> +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
> +
> +#include <linux/host1x.h>
> +#include <linux/kref.h>
> +
> +struct gather_bo {
> +	struct host1x_bo base;
> +
> +	struct kref ref;
> +
> +	u32 *gather_data;
> +	size_t gather_data_words;
> +};
> +
> +extern const struct host1x_bo_ops gather_bo_ops;
> +void gather_bo_put(struct host1x_bo *host_bo);
> +
> +#endif
> diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
> new file mode 100644
> index 000000000000..398be3065e21
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/submit.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/dma-fence-array.h>
> +#include <linux/file.h>
> +#include <linux/host1x.h>
> +#include <linux/iommu.h>
> +#include <linux/kref.h>
> +#include <linux/list.h>
> +#include <linux/nospec.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/sync_file.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +
> +#include "../uapi.h"
> +#include "../drm.h"
> +#include "../gem.h"
> +
> +#include "gather_bo.h"
> +#include "submit.h"
> +
> +static struct tegra_drm_mapping *
> +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
> +{
> +	struct tegra_drm_mapping *mapping;
> +
> +	xa_lock(&ctx->mappings);
> +	mapping = xa_load(&ctx->mappings, id);
> +	if (mapping)
> +		kref_get(&mapping->ref);
> +	xa_unlock(&ctx->mappings);
> +
> +	return mapping;
> +}
> +
> +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
> +{
> +	unsigned long copy_err;
> +	size_t copy_len;
> +	void *data;
> +
> +	if (check_mul_overflow(count, size, &copy_len))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (copy_len > 0x4000)
> +		return ERR_PTR(-E2BIG);
> +
> +	data = kvmalloc(copy_len, GFP_KERNEL);
> +	if (!data)
> +		return ERR_PTR(-ENOMEM);
> +
> +	copy_err = copy_from_user(data, from, copy_len);
> +	if (copy_err) {
> +		kvfree(data);
> +		return ERR_PTR(-EFAULT);
> +	}
> +
> +	return data;
> +}
> +
> +static int submit_copy_gather_data(struct drm_device *drm,
> +				   struct gather_bo **pbo,
> +				   struct drm_tegra_channel_submit *args)
> +{
> +	unsigned long copy_err;
> +	struct gather_bo *bo;
> +	size_t copy_len;
> +
> +	if (args->gather_data_words == 0) {
> +		drm_info(drm, "gather_data_words can't be 0");
> +		return -EINVAL;
> +	}
> +
> +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
> +		return -EINVAL;
> +
> +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
> +	if (!bo)
> +		return -ENOMEM;
> +
> +	kref_init(&bo->ref);
> +	host1x_bo_init(&bo->base, &gather_bo_ops);
> +
> +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
> +	if (!bo->gather_data) {
> +		kfree(bo);
> +		return -ENOMEM;
> +	}
> +
> +	copy_err = copy_from_user(bo->gather_data,
> +				  u64_to_user_ptr(args->gather_data_ptr),
> +				  copy_len);
> +	if (copy_err) {
> +		kfree(bo->gather_data);
> +		kfree(bo);
> +		return -EFAULT;
> +	}
> +
> +	bo->gather_data_words = args->gather_data_words;
> +
> +	*pbo = bo;
> +
> +	return 0;
> +}
> +
> +static int submit_write_reloc(struct gather_bo *bo,
> +			      struct drm_tegra_submit_buf *buf,
> +			      struct tegra_drm_mapping *mapping)
> +{
> +	/* TODO check that target_offset is within bounds */
> +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
> +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
> +
> +#ifdef CONFIG_ARM64
> +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> +		written_ptr |= BIT(39);
> +#endif

Sorry, but this still isn't correct. written_ptr is still only 32-bits
wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
idiomatic way to do this is to make written_ptr dma_addr_t and use a
CONFIG_ARCH_DMA_ADDR_T_64BIT guard.

But even with that this looks wrong because you're OR'ing this in after
shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
Should you perhaps be doing this instead:

	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
			iova |= BIT(39);
	#endif

	written_ptr = (u32)(iova >> buf->reloc_shift);

?

Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
recently dealt with this for display (though I haven't sent out that
patch yet) and this is actually a bit that selects which sector layout
swizzling is being applied. That's independent of block linear format
and I think you can have different sector layouts irrespective of the
block linear format (though I don't think that's usually done).

That said, I wonder if a better interface here would be to reuse format
modifiers here. That would allow us to more fully describe the format of
a surface in case we ever need it, and it already includes the sector
layout information as well.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
@ 2021-03-23 13:38     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 13:38 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx


[-- Attachment #1.1: Type: text/plain, Size: 10608 bytes --]

On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
> Implement the job submission IOCTL with a minimum feature set.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
> v5:
> * Add 16K size limit to copies from userspace.
> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>   to prevent oversized shift on 32-bit platforms.
> v4:
> * Remove all features that are not strictly necessary.
> * Split into two patches.
> v3:
> * Remove WRITE_RELOC. Relocations are now patched implicitly
>   when patching is needed.
> * Directly call PM runtime APIs on devices instead of using
>   power_on/power_off callbacks.
> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> * Accommodate for removal of timeout field and inlining of
>   syncpt_incrs array.
> * Copy entire user arrays at a time instead of going through
>   elements one-by-one.
> * Implement waiting of DMA reservations.
> * Split out gather_bo implementation into a separate file.
> * Fix length parameter passed to sg_init_one in gather_bo
> * Cosmetic cleanup.
> ---
>  drivers/gpu/drm/tegra/Makefile         |   2 +
>  drivers/gpu/drm/tegra/drm.c            |   2 +
>  drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
>  drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
>  drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
>  drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
>  6 files changed, 557 insertions(+)
>  create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
>  create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
>  create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
>  create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
> 
> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> index 0abdb21b38b9..059322e88943 100644
> --- a/drivers/gpu/drm/tegra/Makefile
> +++ b/drivers/gpu/drm/tegra/Makefile
> @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>  tegra-drm-y := \
>  	drm.o \
>  	uapi/uapi.o \
> +	uapi/submit.o \
> +	uapi/gather_bo.o \
>  	gem.o \
>  	fb.o \
>  	dp.o \
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 6a51035ce33f..60eab403ae9b 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>  			  DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
> +			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>  			  DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> new file mode 100644
> index 000000000000..b487a0d44648
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> @@ -0,0 +1,86 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/scatterlist.h>
> +#include <linux/slab.h>
> +
> +#include "gather_bo.h"
> +
> +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	kref_get(&bo->ref);
> +
> +	return host_bo;
> +}
> +
> +static void gather_bo_release(struct kref *ref)
> +{
> +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
> +
> +	kfree(bo->gather_data);
> +	kfree(bo);
> +}
> +
> +void gather_bo_put(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	kref_put(&bo->ref, gather_bo_release);
> +}
> +
> +static struct sg_table *
> +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +	struct sg_table *sgt;
> +	int err;
> +
> +	if (phys) {
> +		*phys = virt_to_phys(bo->gather_data);
> +		return NULL;
> +	}
> +
> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> +	if (!sgt)
> +		return ERR_PTR(-ENOMEM);
> +
> +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
> +	if (err) {
> +		kfree(sgt);
> +		return ERR_PTR(err);
> +	}
> +
> +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
> +
> +	return sgt;
> +}
> +
> +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
> +{
> +	if (sgt) {
> +		sg_free_table(sgt);
> +		kfree(sgt);
> +	}
> +}
> +
> +static void *gather_bo_mmap(struct host1x_bo *host_bo)
> +{
> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> +
> +	return bo->gather_data;
> +}
> +
> +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
> +{
> +}
> +
> +const struct host1x_bo_ops gather_bo_ops = {
> +	.get = gather_bo_get,
> +	.put = gather_bo_put,
> +	.pin = gather_bo_pin,
> +	.unpin = gather_bo_unpin,
> +	.mmap = gather_bo_mmap,
> +	.munmap = gather_bo_munmap,
> +};
> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> new file mode 100644
> index 000000000000..6b4c9d83ac91
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> @@ -0,0 +1,22 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
> +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
> +
> +#include <linux/host1x.h>
> +#include <linux/kref.h>
> +
> +struct gather_bo {
> +	struct host1x_bo base;
> +
> +	struct kref ref;
> +
> +	u32 *gather_data;
> +	size_t gather_data_words;
> +};
> +
> +extern const struct host1x_bo_ops gather_bo_ops;
> +void gather_bo_put(struct host1x_bo *host_bo);
> +
> +#endif
> diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
> new file mode 100644
> index 000000000000..398be3065e21
> --- /dev/null
> +++ b/drivers/gpu/drm/tegra/uapi/submit.c
> @@ -0,0 +1,428 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright (c) 2020 NVIDIA Corporation */
> +
> +#include <linux/dma-fence-array.h>
> +#include <linux/file.h>
> +#include <linux/host1x.h>
> +#include <linux/iommu.h>
> +#include <linux/kref.h>
> +#include <linux/list.h>
> +#include <linux/nospec.h>
> +#include <linux/pm_runtime.h>
> +#include <linux/sync_file.h>
> +
> +#include <drm/drm_drv.h>
> +#include <drm/drm_file.h>
> +
> +#include "../uapi.h"
> +#include "../drm.h"
> +#include "../gem.h"
> +
> +#include "gather_bo.h"
> +#include "submit.h"
> +
> +static struct tegra_drm_mapping *
> +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
> +{
> +	struct tegra_drm_mapping *mapping;
> +
> +	xa_lock(&ctx->mappings);
> +	mapping = xa_load(&ctx->mappings, id);
> +	if (mapping)
> +		kref_get(&mapping->ref);
> +	xa_unlock(&ctx->mappings);
> +
> +	return mapping;
> +}
> +
> +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
> +{
> +	unsigned long copy_err;
> +	size_t copy_len;
> +	void *data;
> +
> +	if (check_mul_overflow(count, size, &copy_len))
> +		return ERR_PTR(-EINVAL);
> +
> +	if (copy_len > 0x4000)
> +		return ERR_PTR(-E2BIG);
> +
> +	data = kvmalloc(copy_len, GFP_KERNEL);
> +	if (!data)
> +		return ERR_PTR(-ENOMEM);
> +
> +	copy_err = copy_from_user(data, from, copy_len);
> +	if (copy_err) {
> +		kvfree(data);
> +		return ERR_PTR(-EFAULT);
> +	}
> +
> +	return data;
> +}
> +
> +static int submit_copy_gather_data(struct drm_device *drm,
> +				   struct gather_bo **pbo,
> +				   struct drm_tegra_channel_submit *args)
> +{
> +	unsigned long copy_err;
> +	struct gather_bo *bo;
> +	size_t copy_len;
> +
> +	if (args->gather_data_words == 0) {
> +		drm_info(drm, "gather_data_words can't be 0");
> +		return -EINVAL;
> +	}
> +
> +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
> +		return -EINVAL;
> +
> +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
> +	if (!bo)
> +		return -ENOMEM;
> +
> +	kref_init(&bo->ref);
> +	host1x_bo_init(&bo->base, &gather_bo_ops);
> +
> +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
> +	if (!bo->gather_data) {
> +		kfree(bo);
> +		return -ENOMEM;
> +	}
> +
> +	copy_err = copy_from_user(bo->gather_data,
> +				  u64_to_user_ptr(args->gather_data_ptr),
> +				  copy_len);
> +	if (copy_err) {
> +		kfree(bo->gather_data);
> +		kfree(bo);
> +		return -EFAULT;
> +	}
> +
> +	bo->gather_data_words = args->gather_data_words;
> +
> +	*pbo = bo;
> +
> +	return 0;
> +}
> +
> +static int submit_write_reloc(struct gather_bo *bo,
> +			      struct drm_tegra_submit_buf *buf,
> +			      struct tegra_drm_mapping *mapping)
> +{
> +	/* TODO check that target_offset is within bounds */
> +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
> +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
> +
> +#ifdef CONFIG_ARM64
> +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> +		written_ptr |= BIT(39);
> +#endif

Sorry, but this still isn't correct. written_ptr is still only 32-bits
wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
idiomatic way to do this is to make written_ptr dma_addr_t and use a
CONFIG_ARCH_DMA_ADDR_T_64BIT guard.

But even with that this looks wrong because you're OR'ing this in after
shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
Should you perhaps be doing this instead:

	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
			iova |= BIT(39);
	#endif

	written_ptr = (u32)(iova >> buf->reloc_shift);

?

Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
recently dealt with this for display (though I haven't sent out that
patch yet) and this is actually a bit that selects which sector layout
swizzling is being applied. That's independent of block linear format
and I think you can have different sector layouts irrespective of the
block linear format (though I don't think that's usually done).

That said, I wonder if a better interface here would be to reuse format
modifiers here. That would allow us to more fully describe the format of
a surface in case we ever need it, and it already includes the sector
layout information as well.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-03-23 12:30             ` Thierry Reding
@ 2021-03-23 14:00               ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 14:00 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: Mikko Perttunen, jonathanh, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

23.03.2021 15:30, Thierry Reding пишет:
> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>> +struct drm_tegra_submit_buf {
>>>>>> +    /**
>>>>>> +     * @mapping_id: [in]
>>>>>> +     *
>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>> +     */
>>>>>> +    __u32 mapping_id;
>>>>>
>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>> becomes the first obstacle.
>>>>>
>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>> particular for T20, which has a small GART.
>>>>>
>>>>> Given that the usefulness of the partial mapping feature is very
>>>>> questionable until it will be proven with a real userspace, we should
>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>
>>>>> DRM already should have everything necessary for creating and managing
>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>> long time now for that.
>>>>>
>>>>> It should be fine to support the static mapping feature, but it should
>>>>> be done separately with the drm_mm integration, IMO.
>>>>>
>>>>> What do think?
>>>>>
>>>>
>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>> any other reasons this would not work on older chips?
>>>
>>> We have all DRM devices in a single address space on T30+, hence having
>>> duplicated mappings for each device should be a bit wasteful.
>>
>> I guess this should be pretty easy to change to only keep one mapping per
>> GEM object.
> 
> The important point here is the semantics: this IOCTL establishes a
> mapping for a given GEM object on a given channel. If the underlying
> implementation is such that the mapping doesn't fit into the GART, then
> that's an implementation detail that the driver needs to take care of.
> Similarly, if multiple devices share a single address space, that's
> something the driver already knows and can take advantage of by simply
> reusing an existing mapping if one already exists. In both cases the
> semantics would be correctly implemented and that's really all that
> matters.
> 
> Overall this interface seems sound from a high-level point of view and
> allows these mappings to be properly created even for the cases we have
> where each channel may have a separate address space. It may not be the
> optimal interface for all use-cases or any one individual case, but the
> very nature of these interfaces is to abstract away certain differences
> in order to provide a unified interface to a common programming model.
> So there will always be certain tradeoffs.

For now this IOCTL isn't useful from a userspace perspective of older
SoCs and I'll need to add a lot of code that won't do anything useful
just to conform to the specific needs of the newer SoCs. Trying to unify
everything into a single API doesn't sound like a good idea at this
point and I already suggested to Mikko to try out variant with a
separated per-SoC code paths in the next version, then the mappings
could be handled separately by the T186+ paths.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-03-23 14:00               ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 14:00 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	Mikko Perttunen

23.03.2021 15:30, Thierry Reding пишет:
> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>> +struct drm_tegra_submit_buf {
>>>>>> +    /**
>>>>>> +     * @mapping_id: [in]
>>>>>> +     *
>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>> +     */
>>>>>> +    __u32 mapping_id;
>>>>>
>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>> becomes the first obstacle.
>>>>>
>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>> particular for T20, which has a small GART.
>>>>>
>>>>> Given that the usefulness of the partial mapping feature is very
>>>>> questionable until it will be proven with a real userspace, we should
>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>
>>>>> DRM already should have everything necessary for creating and managing
>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>> long time now for that.
>>>>>
>>>>> It should be fine to support the static mapping feature, but it should
>>>>> be done separately with the drm_mm integration, IMO.
>>>>>
>>>>> What do think?
>>>>>
>>>>
>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>> any other reasons this would not work on older chips?
>>>
>>> We have all DRM devices in a single address space on T30+, hence having
>>> duplicated mappings for each device should be a bit wasteful.
>>
>> I guess this should be pretty easy to change to only keep one mapping per
>> GEM object.
> 
> The important point here is the semantics: this IOCTL establishes a
> mapping for a given GEM object on a given channel. If the underlying
> implementation is such that the mapping doesn't fit into the GART, then
> that's an implementation detail that the driver needs to take care of.
> Similarly, if multiple devices share a single address space, that's
> something the driver already knows and can take advantage of by simply
> reusing an existing mapping if one already exists. In both cases the
> semantics would be correctly implemented and that's really all that
> matters.
> 
> Overall this interface seems sound from a high-level point of view and
> allows these mappings to be properly created even for the cases we have
> where each channel may have a separate address space. It may not be the
> optimal interface for all use-cases or any one individual case, but the
> very nature of these interfaces is to abstract away certain differences
> in order to provide a unified interface to a common programming model.
> So there will always be certain tradeoffs.

For now this IOCTL isn't useful from a userspace perspective of older
SoCs and I'll need to add a lot of code that won't do anything useful
just to conform to the specific needs of the newer SoCs. Trying to unify
everything into a single API doesn't sound like a good idea at this
point and I already suggested to Mikko to try out variant with a
separated per-SoC code paths in the next version, then the mappings
could be handled separately by the T186+ paths.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
  2021-03-23 13:38     ` Thierry Reding
@ 2021-03-23 14:16       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 14:16 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, talho,
	bhuntsman, dri-devel

On 3/23/21 3:38 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
>> Implement the job submission IOCTL with a minimum feature set.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> * Add 16K size limit to copies from userspace.
>> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>>    to prevent oversized shift on 32-bit platforms.
>> v4:
>> * Remove all features that are not strictly necessary.
>> * Split into two patches.
>> v3:
>> * Remove WRITE_RELOC. Relocations are now patched implicitly
>>    when patching is needed.
>> * Directly call PM runtime APIs on devices instead of using
>>    power_on/power_off callbacks.
>> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
>> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
>> * Accommodate for removal of timeout field and inlining of
>>    syncpt_incrs array.
>> * Copy entire user arrays at a time instead of going through
>>    elements one-by-one.
>> * Implement waiting of DMA reservations.
>> * Split out gather_bo implementation into a separate file.
>> * Fix length parameter passed to sg_init_one in gather_bo
>> * Cosmetic cleanup.
>> ---
>>   drivers/gpu/drm/tegra/Makefile         |   2 +
>>   drivers/gpu/drm/tegra/drm.c            |   2 +
>>   drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
>>   drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
>>   drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
>>   drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
>>   6 files changed, 557 insertions(+)
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
>>
>> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
>> index 0abdb21b38b9..059322e88943 100644
>> --- a/drivers/gpu/drm/tegra/Makefile
>> +++ b/drivers/gpu/drm/tegra/Makefile
>> @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>>   tegra-drm-y := \
>>   	drm.o \
>>   	uapi/uapi.o \
>> +	uapi/submit.o \
>> +	uapi/gather_bo.o \
>>   	gem.o \
>>   	fb.o \
>>   	dp.o \
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 6a51035ce33f..60eab403ae9b 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>>   			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
>> +			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
>> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
>> new file mode 100644
>> index 000000000000..b487a0d44648
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
>> @@ -0,0 +1,86 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/scatterlist.h>
>> +#include <linux/slab.h>
>> +
>> +#include "gather_bo.h"
>> +
>> +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	kref_get(&bo->ref);
>> +
>> +	return host_bo;
>> +}
>> +
>> +static void gather_bo_release(struct kref *ref)
>> +{
>> +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
>> +
>> +	kfree(bo->gather_data);
>> +	kfree(bo);
>> +}
>> +
>> +void gather_bo_put(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	kref_put(&bo->ref, gather_bo_release);
>> +}
>> +
>> +static struct sg_table *
>> +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +	struct sg_table *sgt;
>> +	int err;
>> +
>> +	if (phys) {
>> +		*phys = virt_to_phys(bo->gather_data);
>> +		return NULL;
>> +	}
>> +
>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>> +	if (!sgt)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
>> +	if (err) {
>> +		kfree(sgt);
>> +		return ERR_PTR(err);
>> +	}
>> +
>> +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
>> +
>> +	return sgt;
>> +}
>> +
>> +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
>> +{
>> +	if (sgt) {
>> +		sg_free_table(sgt);
>> +		kfree(sgt);
>> +	}
>> +}
>> +
>> +static void *gather_bo_mmap(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	return bo->gather_data;
>> +}
>> +
>> +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
>> +{
>> +}
>> +
>> +const struct host1x_bo_ops gather_bo_ops = {
>> +	.get = gather_bo_get,
>> +	.put = gather_bo_put,
>> +	.pin = gather_bo_pin,
>> +	.unpin = gather_bo_unpin,
>> +	.mmap = gather_bo_mmap,
>> +	.munmap = gather_bo_munmap,
>> +};
>> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
>> new file mode 100644
>> index 000000000000..6b4c9d83ac91
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
>> @@ -0,0 +1,22 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
>> +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
>> +
>> +#include <linux/host1x.h>
>> +#include <linux/kref.h>
>> +
>> +struct gather_bo {
>> +	struct host1x_bo base;
>> +
>> +	struct kref ref;
>> +
>> +	u32 *gather_data;
>> +	size_t gather_data_words;
>> +};
>> +
>> +extern const struct host1x_bo_ops gather_bo_ops;
>> +void gather_bo_put(struct host1x_bo *host_bo);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
>> new file mode 100644
>> index 000000000000..398be3065e21
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/submit.c
>> @@ -0,0 +1,428 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/dma-fence-array.h>
>> +#include <linux/file.h>
>> +#include <linux/host1x.h>
>> +#include <linux/iommu.h>
>> +#include <linux/kref.h>
>> +#include <linux/list.h>
>> +#include <linux/nospec.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/sync_file.h>
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +
>> +#include "../uapi.h"
>> +#include "../drm.h"
>> +#include "../gem.h"
>> +
>> +#include "gather_bo.h"
>> +#include "submit.h"
>> +
>> +static struct tegra_drm_mapping *
>> +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
>> +{
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	xa_lock(&ctx->mappings);
>> +	mapping = xa_load(&ctx->mappings, id);
>> +	if (mapping)
>> +		kref_get(&mapping->ref);
>> +	xa_unlock(&ctx->mappings);
>> +
>> +	return mapping;
>> +}
>> +
>> +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
>> +{
>> +	unsigned long copy_err;
>> +	size_t copy_len;
>> +	void *data;
>> +
>> +	if (check_mul_overflow(count, size, &copy_len))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	if (copy_len > 0x4000)
>> +		return ERR_PTR(-E2BIG);
>> +
>> +	data = kvmalloc(copy_len, GFP_KERNEL);
>> +	if (!data)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	copy_err = copy_from_user(data, from, copy_len);
>> +	if (copy_err) {
>> +		kvfree(data);
>> +		return ERR_PTR(-EFAULT);
>> +	}
>> +
>> +	return data;
>> +}
>> +
>> +static int submit_copy_gather_data(struct drm_device *drm,
>> +				   struct gather_bo **pbo,
>> +				   struct drm_tegra_channel_submit *args)
>> +{
>> +	unsigned long copy_err;
>> +	struct gather_bo *bo;
>> +	size_t copy_len;
>> +
>> +	if (args->gather_data_words == 0) {
>> +		drm_info(drm, "gather_data_words can't be 0");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
>> +		return -EINVAL;
>> +
>> +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
>> +	if (!bo)
>> +		return -ENOMEM;
>> +
>> +	kref_init(&bo->ref);
>> +	host1x_bo_init(&bo->base, &gather_bo_ops);
>> +
>> +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
>> +	if (!bo->gather_data) {
>> +		kfree(bo);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	copy_err = copy_from_user(bo->gather_data,
>> +				  u64_to_user_ptr(args->gather_data_ptr),
>> +				  copy_len);
>> +	if (copy_err) {
>> +		kfree(bo->gather_data);
>> +		kfree(bo);
>> +		return -EFAULT;
>> +	}
>> +
>> +	bo->gather_data_words = args->gather_data_words;
>> +
>> +	*pbo = bo;
>> +
>> +	return 0;
>> +}
>> +
>> +static int submit_write_reloc(struct gather_bo *bo,
>> +			      struct drm_tegra_submit_buf *buf,
>> +			      struct tegra_drm_mapping *mapping)
>> +{
>> +	/* TODO check that target_offset is within bounds */
>> +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
>> +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
>> +
>> +#ifdef CONFIG_ARM64
>> +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
>> +		written_ptr |= BIT(39);
>> +#endif
> 
> Sorry, but this still isn't correct. written_ptr is still only 32-bits
> wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
> idiomatic way to do this is to make written_ptr dma_addr_t and use a
> CONFIG_ARCH_DMA_ADDR_T_64BIT guard. >
> But even with that this looks wrong because you're OR'ing this in after
> shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
> Should you perhaps be doing this instead:
> 
> 	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> 		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> 			iova |= BIT(39);
> 	#endif
> 
> 	written_ptr = (u32)(iova >> buf->reloc_shift);
> 
> ?

Yes, you are of course right.. will fix this. That might explain some of 
the VIC test failures I've seen.

> 
> Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
> recently dealt with this for display (though I haven't sent out that
> patch yet) and this is actually a bit that selects which sector layout
> swizzling is being applied. That's independent of block linear format
> and I think you can have different sector layouts irrespective of the
> block linear format (though I don't think that's usually done).
> 
> That said, I wonder if a better interface here would be to reuse format
> modifiers here. That would allow us to more fully describe the format of
> a surface in case we ever need it, and it already includes the sector
> layout information as well.

I think having just a flag that enables or disables the swizzling is 
better -- that way it is the responsibility of the userspace, which is 
where all the engine knowledge is as well, to know for each buffer 
whether it wants swizzling or not. Now, in practice at the moment the 
kernel can just lookup the format and set the bit based on that, but 
e.g. if there was an engine that could do the swizzling natively, and we 
had the format modifier here, we'd need to have the knowledge in the 
kernel to decide for each chip/engine whether to apply the bit.

For display it is a bit different since the knowledge is already in the 
kernel.

Mikko

> 
> Thierry
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
@ 2021-03-23 14:16       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 14:16 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 3:38 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
>> Implement the job submission IOCTL with a minimum feature set.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> * Add 16K size limit to copies from userspace.
>> * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
>>    to prevent oversized shift on 32-bit platforms.
>> v4:
>> * Remove all features that are not strictly necessary.
>> * Split into two patches.
>> v3:
>> * Remove WRITE_RELOC. Relocations are now patched implicitly
>>    when patching is needed.
>> * Directly call PM runtime APIs on devices instead of using
>>    power_on/power_off callbacks.
>> * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
>> * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
>> * Accommodate for removal of timeout field and inlining of
>>    syncpt_incrs array.
>> * Copy entire user arrays at a time instead of going through
>>    elements one-by-one.
>> * Implement waiting of DMA reservations.
>> * Split out gather_bo implementation into a separate file.
>> * Fix length parameter passed to sg_init_one in gather_bo
>> * Cosmetic cleanup.
>> ---
>>   drivers/gpu/drm/tegra/Makefile         |   2 +
>>   drivers/gpu/drm/tegra/drm.c            |   2 +
>>   drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
>>   drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
>>   drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
>>   drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
>>   6 files changed, 557 insertions(+)
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
>>
>> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
>> index 0abdb21b38b9..059322e88943 100644
>> --- a/drivers/gpu/drm/tegra/Makefile
>> +++ b/drivers/gpu/drm/tegra/Makefile
>> @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>>   tegra-drm-y := \
>>   	drm.o \
>>   	uapi/uapi.o \
>> +	uapi/submit.o \
>> +	uapi/gather_bo.o \
>>   	gem.o \
>>   	fb.o \
>>   	dp.o \
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 6a51035ce33f..60eab403ae9b 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>>   			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
>> +			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
>> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
>> new file mode 100644
>> index 000000000000..b487a0d44648
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
>> @@ -0,0 +1,86 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/scatterlist.h>
>> +#include <linux/slab.h>
>> +
>> +#include "gather_bo.h"
>> +
>> +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	kref_get(&bo->ref);
>> +
>> +	return host_bo;
>> +}
>> +
>> +static void gather_bo_release(struct kref *ref)
>> +{
>> +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
>> +
>> +	kfree(bo->gather_data);
>> +	kfree(bo);
>> +}
>> +
>> +void gather_bo_put(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	kref_put(&bo->ref, gather_bo_release);
>> +}
>> +
>> +static struct sg_table *
>> +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +	struct sg_table *sgt;
>> +	int err;
>> +
>> +	if (phys) {
>> +		*phys = virt_to_phys(bo->gather_data);
>> +		return NULL;
>> +	}
>> +
>> +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>> +	if (!sgt)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
>> +	if (err) {
>> +		kfree(sgt);
>> +		return ERR_PTR(err);
>> +	}
>> +
>> +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
>> +
>> +	return sgt;
>> +}
>> +
>> +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
>> +{
>> +	if (sgt) {
>> +		sg_free_table(sgt);
>> +		kfree(sgt);
>> +	}
>> +}
>> +
>> +static void *gather_bo_mmap(struct host1x_bo *host_bo)
>> +{
>> +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
>> +
>> +	return bo->gather_data;
>> +}
>> +
>> +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
>> +{
>> +}
>> +
>> +const struct host1x_bo_ops gather_bo_ops = {
>> +	.get = gather_bo_get,
>> +	.put = gather_bo_put,
>> +	.pin = gather_bo_pin,
>> +	.unpin = gather_bo_unpin,
>> +	.mmap = gather_bo_mmap,
>> +	.munmap = gather_bo_munmap,
>> +};
>> diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
>> new file mode 100644
>> index 000000000000..6b4c9d83ac91
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
>> @@ -0,0 +1,22 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
>> +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
>> +
>> +#include <linux/host1x.h>
>> +#include <linux/kref.h>
>> +
>> +struct gather_bo {
>> +	struct host1x_bo base;
>> +
>> +	struct kref ref;
>> +
>> +	u32 *gather_data;
>> +	size_t gather_data_words;
>> +};
>> +
>> +extern const struct host1x_bo_ops gather_bo_ops;
>> +void gather_bo_put(struct host1x_bo *host_bo);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
>> new file mode 100644
>> index 000000000000..398be3065e21
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/submit.c
>> @@ -0,0 +1,428 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/dma-fence-array.h>
>> +#include <linux/file.h>
>> +#include <linux/host1x.h>
>> +#include <linux/iommu.h>
>> +#include <linux/kref.h>
>> +#include <linux/list.h>
>> +#include <linux/nospec.h>
>> +#include <linux/pm_runtime.h>
>> +#include <linux/sync_file.h>
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +
>> +#include "../uapi.h"
>> +#include "../drm.h"
>> +#include "../gem.h"
>> +
>> +#include "gather_bo.h"
>> +#include "submit.h"
>> +
>> +static struct tegra_drm_mapping *
>> +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
>> +{
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	xa_lock(&ctx->mappings);
>> +	mapping = xa_load(&ctx->mappings, id);
>> +	if (mapping)
>> +		kref_get(&mapping->ref);
>> +	xa_unlock(&ctx->mappings);
>> +
>> +	return mapping;
>> +}
>> +
>> +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
>> +{
>> +	unsigned long copy_err;
>> +	size_t copy_len;
>> +	void *data;
>> +
>> +	if (check_mul_overflow(count, size, &copy_len))
>> +		return ERR_PTR(-EINVAL);
>> +
>> +	if (copy_len > 0x4000)
>> +		return ERR_PTR(-E2BIG);
>> +
>> +	data = kvmalloc(copy_len, GFP_KERNEL);
>> +	if (!data)
>> +		return ERR_PTR(-ENOMEM);
>> +
>> +	copy_err = copy_from_user(data, from, copy_len);
>> +	if (copy_err) {
>> +		kvfree(data);
>> +		return ERR_PTR(-EFAULT);
>> +	}
>> +
>> +	return data;
>> +}
>> +
>> +static int submit_copy_gather_data(struct drm_device *drm,
>> +				   struct gather_bo **pbo,
>> +				   struct drm_tegra_channel_submit *args)
>> +{
>> +	unsigned long copy_err;
>> +	struct gather_bo *bo;
>> +	size_t copy_len;
>> +
>> +	if (args->gather_data_words == 0) {
>> +		drm_info(drm, "gather_data_words can't be 0");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
>> +		return -EINVAL;
>> +
>> +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
>> +	if (!bo)
>> +		return -ENOMEM;
>> +
>> +	kref_init(&bo->ref);
>> +	host1x_bo_init(&bo->base, &gather_bo_ops);
>> +
>> +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
>> +	if (!bo->gather_data) {
>> +		kfree(bo);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	copy_err = copy_from_user(bo->gather_data,
>> +				  u64_to_user_ptr(args->gather_data_ptr),
>> +				  copy_len);
>> +	if (copy_err) {
>> +		kfree(bo->gather_data);
>> +		kfree(bo);
>> +		return -EFAULT;
>> +	}
>> +
>> +	bo->gather_data_words = args->gather_data_words;
>> +
>> +	*pbo = bo;
>> +
>> +	return 0;
>> +}
>> +
>> +static int submit_write_reloc(struct gather_bo *bo,
>> +			      struct drm_tegra_submit_buf *buf,
>> +			      struct tegra_drm_mapping *mapping)
>> +{
>> +	/* TODO check that target_offset is within bounds */
>> +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
>> +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
>> +
>> +#ifdef CONFIG_ARM64
>> +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
>> +		written_ptr |= BIT(39);
>> +#endif
> 
> Sorry, but this still isn't correct. written_ptr is still only 32-bits
> wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
> idiomatic way to do this is to make written_ptr dma_addr_t and use a
> CONFIG_ARCH_DMA_ADDR_T_64BIT guard. >
> But even with that this looks wrong because you're OR'ing this in after
> shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
> Should you perhaps be doing this instead:
> 
> 	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> 		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> 			iova |= BIT(39);
> 	#endif
> 
> 	written_ptr = (u32)(iova >> buf->reloc_shift);
> 
> ?

Yes, you are of course right.. will fix this. That might explain some of 
the VIC test failures I've seen.

> 
> Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
> recently dealt with this for display (though I haven't sent out that
> patch yet) and this is actually a bit that selects which sector layout
> swizzling is being applied. That's independent of block linear format
> and I think you can have different sector layouts irrespective of the
> block linear format (though I don't think that's usually done).
> 
> That said, I wonder if a better interface here would be to reuse format
> modifiers here. That would allow us to more fully describe the format of
> a surface in case we ever need it, and it already includes the sector
> layout information as well.

I think having just a flag that enables or disables the swizzling is 
better -- that way it is the responsibility of the userspace, which is 
where all the engine knowledge is as well, to know for each buffer 
whether it wants swizzling or not. Now, in practice at the moment the 
kernel can just lookup the format and set the bit based on that, but 
e.g. if there was an engine that could do the swizzling natively, and we 
had the format modifier here, we'd need to have the knowledge in the 
kernel to decide for each chip/engine whether to apply the bit.

For display it is a bit different since the knowledge is already in the 
kernel.

Mikko

> 
> Thierry
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-03-23 13:25     ` Thierry Reding
@ 2021-03-23 14:43       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 14:43 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

On 3/23/21 3:25 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:17PM +0200, Mikko Perttunen wrote:
>> Implement the non-submission parts of the new UAPI, including
>> channel management and memory mapping. The UAPI is under the
>> CONFIG_DRM_TEGRA_STAGING config flag for now.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> * Set iova_end in both mapping paths
>> v4:
>> * New patch, split out from combined UAPI + submit patch.
>> ---
>>   drivers/gpu/drm/tegra/Makefile    |   1 +
>>   drivers/gpu/drm/tegra/drm.c       |  41 ++--
>>   drivers/gpu/drm/tegra/drm.h       |   5 +
>>   drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
>>   drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++
> 
> I'd prefer if we kept the directory structure flat. There's something
> like 19 pairs of files in the top-level directory, which is reasonably
> manageable. Also, it looks like there's going to be a couple more files
> in this new subdirectory. I'd prefer if that was all merged into the
> single uapi.c source file to keep things simpler. These are all really
> small files, so there's no need to aggressively split things up. Helps
> with compilation time, too.

Will do, although I think having plenty of subdirectories makes things 
more organized :)

> 
> FWIW, I would've been fine with stashing all of this into drm.c as well
> since the rest of the UAPI is in that already. The churn in this patch
> is reasonably small, but it would've been even less if this was just all
> in drm.c.

I think we shouldn't have the uapi in drm.c -- it just makes the file a 
bit of a dumping ground. I think drm.c should have the code that relates 
to initialization and initial registration with DRM.

> 
>>   5 files changed, 401 insertions(+), 16 deletions(-)
>>   create mode 100644 drivers/gpu/drm/tegra/uapi.h
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
>>
>> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
>> index d6cf202414f0..0abdb21b38b9 100644
>> --- a/drivers/gpu/drm/tegra/Makefile
>> +++ b/drivers/gpu/drm/tegra/Makefile
>> @@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>>   
>>   tegra-drm-y := \
>>   	drm.o \
>> +	uapi/uapi.o \
>>   	gem.o \
>>   	fb.o \
>>   	dp.o \
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index afd3f143c5e0..6a51035ce33f 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -20,6 +20,7 @@
>>   #include <drm/drm_prime.h>
>>   #include <drm/drm_vblank.h>
>>   
>> +#include "uapi.h"
>>   #include "drm.h"
>>   #include "gem.h"
>>   
>> @@ -33,11 +34,6 @@
>>   #define CARVEOUT_SZ SZ_64M
>>   #define CDMA_GATHER_FETCHES_MAX_NB 16383
>>   
>> -struct tegra_drm_file {
>> -	struct idr contexts;
>> -	struct mutex lock;
>> -};
>> -
>>   static int tegra_atomic_check(struct drm_device *drm,
>>   			      struct drm_atomic_state *state)
>>   {
>> @@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>   	if (!fpriv)
>>   		return -ENOMEM;
>>   
>> -	idr_init_base(&fpriv->contexts, 1);
>> +	idr_init_base(&fpriv->legacy_contexts, 1);
>> +	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
>>   	mutex_init(&fpriv->lock);
>>   	filp->driver_priv = fpriv;
>>   
>> @@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
>>   	if (err < 0)
>>   		return err;
>>   
>> -	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
>> +	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
>>   	if (err < 0) {
>>   		client->ops->close_channel(context);
>>   		return err;
>> @@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -EINVAL;
>>   		goto unlock;
>>   	}
>>   
>> -	idr_remove(&fpriv->contexts, context->id);
>> +	idr_remove(&fpriv->legacy_contexts, context->id);
>>   	tegra_drm_context_free(context);
>>   
>>   unlock:
>> @@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
>>   
>>   static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>>   #ifdef CONFIG_DRM_TEGRA_STAGING
>> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
>> +			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
>> +			  DRM_RENDER_ALLOW),
> 
> I'd prefer to keep call these TEGRA_OPEN_CHANNEL and TEGRA_CLOSE_CHANNEL
> because I find that easier to think of. My reasoning goes: the TEGRA_
> prefix means we're operating at a global context and then we perform the
> OPEN_CHANNEL and CLOSE_CHANNEL operations. Whereas by the same reasoning
> TEGRA_CHANNEL_OPEN and TEGRA_CHANNEL_CLOSE suggest we're operating at
> the channel context and perform OPEN and CLOSE operations. For close you
> could make the argument that it makes sense, but you can't open a
> channel that you don't have yet.

I go by the same argument but consider TEGRA_CHANNEL_OPEN a bit of a 
"static method" of channels, and as such acceptable :p But I do see your 
point -- I can change it.

> 
> And if that doesn't convince you, I think appending _LEGACY here like we
> do for CREATE and MMAP would be more consistent. Who's going to remember
> which one is new: TEGRA_CHANNEL_OPEN or TEGRA_OPEN_CHANNEL?
> 
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
>>   			  DRM_RENDER_ALLOW),
>> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>>   			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>> +			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
>> +			  DRM_RENDER_ALLOW),
>> +
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
>> @@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
>>   	struct tegra_drm_file *fpriv = file->driver_priv;
>>   
>>   	mutex_lock(&fpriv->lock);
>> -	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
>> +	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
>> +	tegra_drm_uapi_close_file(fpriv);
>>   	mutex_unlock(&fpriv->lock);
>>   
>> -	idr_destroy(&fpriv->contexts);
>> +	idr_destroy(&fpriv->legacy_contexts);
>>   	mutex_destroy(&fpriv->lock);
>>   	kfree(fpriv);
>>   }
>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>> index 0f38f159aa8e..1af57c2016eb 100644
>> --- a/drivers/gpu/drm/tegra/drm.h
>> +++ b/drivers/gpu/drm/tegra/drm.h
>> @@ -59,6 +59,11 @@ struct tegra_drm {
>>   	struct tegra_display_hub *hub;
>>   };
>>   
>> +static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
>> +{
>> +	return dev_get_drvdata(tegra->drm->dev->parent);
>> +}
>> +
>>   struct tegra_drm_client;
>>   
>>   struct tegra_drm_context {
>> diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
>> new file mode 100644
>> index 000000000000..5c422607e8fa
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _TEGRA_DRM_UAPI_H
>> +#define _TEGRA_DRM_UAPI_H
>> +
>> +#include <linux/dma-mapping.h>
>> +#include <linux/idr.h>
>> +#include <linux/kref.h>
>> +#include <linux/xarray.h>
>> +
>> +#include <drm/drm.h>
>> +
>> +struct drm_file;
>> +struct drm_device;
>> +
>> +struct tegra_drm_file {
>> +	/* Legacy UAPI state */
>> +	struct idr legacy_contexts;
>> +	struct mutex lock;
>> +
>> +	/* New UAPI state */
>> +	struct xarray contexts;
>> +};
>> +
>> +struct tegra_drm_channel_ctx {
>> +	struct tegra_drm_client *client;
>> +	struct host1x_channel *channel;
>> +	struct xarray mappings;
>> +};
> 
> This is mostly the same as tegra_drm_context, so can't we just merge the
> two? There's going to be slight overlap, but overall things are going to
> be less confusing to follow.
> 
> Even more so because I think we should consider phasing out the old UAPI
> eventually and then we can just remove the unneeded fields from this.

Okay.

> 
>> +
>> +struct tegra_drm_mapping {
>> +	struct kref ref;
>> +
>> +	struct device *dev;
>> +	struct host1x_bo *bo;
>> +	struct sg_table *sgt;
>> +	enum dma_data_direction direction;
>> +	dma_addr_t iova;
>> +	dma_addr_t iova_end;
> 
> iova_end seems to never be used. Do we need it?

It is used in the firewall.

> 
>> +};
>> +
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file);
>> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
>> +				  struct drm_file *file);
>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
>> +				   struct drm_file *file);
>> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +
>> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
>> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
>> +struct tegra_drm_channel_ctx *
>> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
>> new file mode 100644
>> index 000000000000..d503b5e817c4
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/uapi.c
>> @@ -0,0 +1,307 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/host1x.h>
>> +#include <linux/iommu.h>
>> +#include <linux/list.h>
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +
>> +#include "../uapi.h"
>> +#include "../drm.h"
>> +
>> +struct tegra_drm_channel_ctx *
>> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
>> +{
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	mutex_lock(&file->lock);
>> +	ctx = xa_load(&file->contexts, id);
>> +	if (!ctx)
>> +		mutex_unlock(&file->lock);
>> +
>> +	return ctx;
>> +}
> 
> This interface seems slightly odd. Looking at how this is used I see how
> doing it this way saves a couple of lines. However, it also make this
> difficult to understand, so I wonder if it wouldn't be better to just
> open-code this in the three callsites to make the code flow a bit more
> idiomatic.

Ok, will do. (Another option may be to add a 
tegra_drm_channel_ctx_unlock that just unlocks file->lock -- that'd 
abstract it out even better, which I quite like -- but I'll go with your 
preference)

> 
>> +
>> +static void tegra_drm_mapping_release(struct kref *ref)
>> +{
>> +	struct tegra_drm_mapping *mapping =
>> +		container_of(ref, struct tegra_drm_mapping, ref);
>> +
>> +	if (mapping->sgt)
>> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
>> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
>> +
>> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
>> +	host1x_bo_put(mapping->bo);
>> +
>> +	kfree(mapping);
>> +}
>> +
>> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
>> +{
>> +	kref_put(&mapping->ref, tegra_drm_mapping_release);
>> +}
>> +
>> +static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
> 
> Yeah, the more often I read it, the more I'm in favour of just
> collapsing tegra_drm_channel_ctx into tegra_drm_channel if for nothing
> else but to get rid of that annoying _ctx suffix that's there for no
> other reason than to differentiate it from "legacy" contexts.
> 
>> +{
>> +	unsigned long mapping_id;
> 
> It's clear from the context that this is a mapping ID, so I think you
> can just leave out the "mapping_" prefix to save a bit on screen space.

Sure.

> 
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	xa_for_each(&ctx->mappings, mapping_id, mapping)
>> +		tegra_drm_mapping_put(mapping);
>> +
>> +	xa_destroy(&ctx->mappings);
>> +
>> +	host1x_channel_put(ctx->channel);
>> +
>> +	kfree(ctx);
>> +}
>> +
>> +int close_channel_ctx(int id, void *p, void *data)
>> +{
>> +	struct tegra_drm_channel_ctx *ctx = p;
>> +
>> +	tegra_drm_channel_ctx_close(ctx);
>> +
>> +	return 0;
>> +}
> 
> The signature looked strange, so I went looking for where this is called
> from and turns out I can't find any place where this is used. Do we need
> it?

Ah, maybe I have left it from some previous version. Will fix.

> 
>> +
>> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
>> +{
>> +	unsigned long ctx_id;
> 
> Just like for mappings above, I think it's fine to leave out the ctx_
> prefix here.

Yep.

> 
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	xa_for_each(&file->contexts, ctx_id, ctx)
>> +		tegra_drm_channel_ctx_close(ctx);
>> +
>> +	xa_destroy(&file->contexts);
>> +}
>> +
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct tegra_drm *tegra = drm->dev_private;
>> +	struct drm_tegra_channel_open *args = data;
>> +	struct tegra_drm_client *client = NULL;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	int err;
>> +
>> +	if (args->flags)
>> +		return -EINVAL;
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	err = -ENODEV;
>> +	list_for_each_entry(client, &tegra->clients, list) {
>> +		if (client->base.class == args->host1x_class) {
>> +			err = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (err)
>> +		goto free_ctx;
> 
> This type of construct looks weird. I found that a good way around this
> is to split this off into a separate function that does the lookup and
> just returns NULL when it doesn't find one, which is very elegant:
> 
> 	struct tegra_drm_client *tegra_drm_find_client(struct tegra_drm *tegra, u32 class)
> 	{
> 		struct tegra_drm_client *client;
> 
> 		list_for_each_entry(client, &tegra->clients, list)
> 			if (client->base.class == class)
> 				return client;
> 
> 		return NULL;
> 	}
> 
> and then all of a sudden, the very cumbersome construct above becomes
> this pretty piece of code:
> 
> 	client = tegra_drm_find_client(tegra, args->host1x_class);
> 	if (!client) {
> 		err = -ENODEV;
> 		goto free_ctx;
> 	}
> 
> No need for initializing client to NULL or preventatively setting err =
> -ENODEV or anything.

Yep.

> 
>> +
>> +	if (client->shared_channel) {
>> +		ctx->channel = host1x_channel_get(client->shared_channel);
>> +	} else {
>> +		ctx->channel = host1x_channel_request(&client->base);
>> +		if (!ctx->channel) {
>> +			err = -EBUSY;
> 
> I -EBUSY really appropriate here? Can host1x_channel_request() fail for
> other reasons?

It could also fail due to being out of memory (failing to allocate space 
for CDMA) - I guess we should plumb the error code here. But -EBUSY is 
really the most likely thing to happen anyway. Perhaps that can be done 
separately.

> 
>> +			goto free_ctx;
>> +		}
>> +	}
>> +
>> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0)
>> +		goto put_channel;
>> +
>> +	ctx->client = client;
>> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
>> +
>> +	args->hardware_version = client->version;
>> +
>> +	return 0;
>> +
>> +put_channel:
>> +	host1x_channel_put(ctx->channel);
>> +free_ctx:
>> +	kfree(ctx);
>> +
>> +	return err;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
>> +				  struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_close *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	xa_erase(&fpriv->contexts, args->channel_ctx);
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	tegra_drm_channel_ctx_close(ctx);
>> +
>> +	return 0;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>> +				struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_map *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	struct tegra_drm_mapping *mapping;
>> +	struct drm_gem_object *gem;
>> +	u32 mapping_id;
>> +	int err = 0;
>> +
>> +	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
>> +		return -EINVAL;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
>> +	if (!mapping) {
>> +		err = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	kref_init(&mapping->ref);
>> +
>> +	gem = drm_gem_object_lookup(file, args->handle);
>> +	if (!gem) {
>> +		err = -EINVAL;
>> +		goto unlock;
>> +	}
>> +
>> +	mapping->dev = ctx->client->base.dev;
>> +	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
> 
> We already have host1x_bo_lookup() in drm.c that you can use to avoid
> this strange cast.

Okay, will fix.

> 
>> +
>> +	if (!iommu_get_domain_for_dev(mapping->dev) ||
>> +	    ctx->client->base.group) {
> 
> This expression is now used in at least two places, so I wonder if we
> should have a helper for it along with some documentation about why this
> is the right thing to do. I have a local patch that adds a comment to
> the other instance of this because I had forgotten why this was correct,
> so I can pick that up and refactor later on.

Actually, just last week I found out that the condition here is wrong 
(at least for this particular instance) -- with the current condition, 
if IOMMU is disabled we end up in the first branch, but if the BO in 
question was imported through DMA-BUF the iova will be NULL - 
host1x_bo_pin returns an SGT instead, so we need to go to the else path, 
which works fine. (If ctx->client->base.group is set, this is not a 
problem since import will IOMMU map the BO and set iova). I have a local 
fix for this which I'll add to v6.

> 
>> +		host1x_bo_pin(mapping->dev, mapping->bo,
>> +			      &mapping->iova);
>> +	} else {
>> +		mapping->direction = DMA_TO_DEVICE;
>> +		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
>> +			mapping->direction = DMA_BIDIRECTIONAL;
>> +
>> +		mapping->sgt =
>> +			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
>> +		if (IS_ERR(mapping->sgt)) {
>> +			err = PTR_ERR(mapping->sgt);
>> +			goto put_gem;
>> +		}
>> +
>> +		err = dma_map_sgtable(mapping->dev, mapping->sgt,
>> +				      mapping->direction,
>> +				      DMA_ATTR_SKIP_CPU_SYNC);
>> +		if (err)
>> +			goto unpin;
>> +
>> +		/* TODO only map the requested part */
>> +		mapping->iova = sg_dma_address(mapping->sgt->sgl);
> 
> That comment seems misplaced here since the mapping already happens
> above. Also, wouldn't the same TODO apply to the host1x_bo_pin() path in
> the if block? Maybe the TODO should be at the top of the function?
> 
> Alternatively, if this isn't implemented in this patch anyway, maybe
> just drop the comment altogether. In order to implement this, wouldn't
> the UAPI have to change as well? In that case it might be better to add
> the TODO somewhere in the UAPI header, or in a separate TODO file in the
> driver's directory.

Yeah, I'll drop the comment. The UAPI originally had support for this 
but I dropped it upon Dmitry's objection.

> 
>> +	}
>> +
>> +	mapping->iova_end = mapping->iova + gem->size;
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0)
>> +		goto unmap;
>> +
>> +	args->mapping_id = mapping_id;
>> +
>> +	return 0;
>> +
>> +unmap:
>> +	if (mapping->sgt) {
>> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
>> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
>> +	}
>> +unpin:
>> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
>> +put_gem:
>> +	drm_gem_object_put(gem);
>> +	kfree(mapping);
>> +unlock:
>> +	mutex_unlock(&fpriv->lock);
>> +	return err;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
>> +				  struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_unmap *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	mapping = xa_erase(&ctx->mappings, args->mapping_id);
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	if (mapping) {
>> +		tegra_drm_mapping_put(mapping);
>> +		return 0;
>> +	} else {
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
>> +			       struct drm_file *file)
>> +{
>> +	struct drm_tegra_gem_create *args = data;
>> +	struct tegra_bo *bo;
>> +
>> +	if (args->flags)
>> +		return -EINVAL;
> 
> I'm not sure it's worth doing this, especially because this is now a new
> IOCTL that's actually a subset of the original. I think we should just
> keep the original and if we want to deprecate the flags, or replace them
> with new ones, let's just try and phase out the deprecated ones.

Ok, I'll look into it.

> 
> Thierry

Thanks,
Mikko

> 
>> +
>> +	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
>> +					 &args->handle);
>> +	if (IS_ERR(bo))
>> +		return PTR_ERR(bo);
>> +
>> +	return 0;
>> +}
>> +
>> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
>> +			     struct drm_file *file)
>> +{
>> +	struct drm_tegra_gem_mmap *args = data;
>> +	struct drm_gem_object *gem;
>> +	struct tegra_bo *bo;
>> +
>> +	gem = drm_gem_object_lookup(file, args->handle);
>> +	if (!gem)
>> +		return -EINVAL;
>> +
>> +	bo = to_tegra_bo(gem);
>> +
>> +	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
>> +
>> +	drm_gem_object_put(gem);
>> +
>> +	return 0;
>> +}
>> -- 
>> 2.30.0
>>

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-03-23 14:43       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-23 14:43 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 3:25 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:17PM +0200, Mikko Perttunen wrote:
>> Implement the non-submission parts of the new UAPI, including
>> channel management and memory mapping. The UAPI is under the
>> CONFIG_DRM_TEGRA_STAGING config flag for now.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>> v5:
>> * Set iova_end in both mapping paths
>> v4:
>> * New patch, split out from combined UAPI + submit patch.
>> ---
>>   drivers/gpu/drm/tegra/Makefile    |   1 +
>>   drivers/gpu/drm/tegra/drm.c       |  41 ++--
>>   drivers/gpu/drm/tegra/drm.h       |   5 +
>>   drivers/gpu/drm/tegra/uapi.h      |  63 ++++++
>>   drivers/gpu/drm/tegra/uapi/uapi.c | 307 ++++++++++++++++++++++++++++++
> 
> I'd prefer if we kept the directory structure flat. There's something
> like 19 pairs of files in the top-level directory, which is reasonably
> manageable. Also, it looks like there's going to be a couple more files
> in this new subdirectory. I'd prefer if that was all merged into the
> single uapi.c source file to keep things simpler. These are all really
> small files, so there's no need to aggressively split things up. Helps
> with compilation time, too.

Will do, although I think having plenty of subdirectories makes things 
more organized :)

> 
> FWIW, I would've been fine with stashing all of this into drm.c as well
> since the rest of the UAPI is in that already. The churn in this patch
> is reasonably small, but it would've been even less if this was just all
> in drm.c.

I think we shouldn't have the uapi in drm.c -- it just makes the file a 
bit of a dumping ground. I think drm.c should have the code that relates 
to initialization and initial registration with DRM.

> 
>>   5 files changed, 401 insertions(+), 16 deletions(-)
>>   create mode 100644 drivers/gpu/drm/tegra/uapi.h
>>   create mode 100644 drivers/gpu/drm/tegra/uapi/uapi.c
>>
>> diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
>> index d6cf202414f0..0abdb21b38b9 100644
>> --- a/drivers/gpu/drm/tegra/Makefile
>> +++ b/drivers/gpu/drm/tegra/Makefile
>> @@ -3,6 +3,7 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>>   
>>   tegra-drm-y := \
>>   	drm.o \
>> +	uapi/uapi.o \
>>   	gem.o \
>>   	fb.o \
>>   	dp.o \
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index afd3f143c5e0..6a51035ce33f 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -20,6 +20,7 @@
>>   #include <drm/drm_prime.h>
>>   #include <drm/drm_vblank.h>
>>   
>> +#include "uapi.h"
>>   #include "drm.h"
>>   #include "gem.h"
>>   
>> @@ -33,11 +34,6 @@
>>   #define CARVEOUT_SZ SZ_64M
>>   #define CDMA_GATHER_FETCHES_MAX_NB 16383
>>   
>> -struct tegra_drm_file {
>> -	struct idr contexts;
>> -	struct mutex lock;
>> -};
>> -
>>   static int tegra_atomic_check(struct drm_device *drm,
>>   			      struct drm_atomic_state *state)
>>   {
>> @@ -90,7 +86,8 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>   	if (!fpriv)
>>   		return -ENOMEM;
>>   
>> -	idr_init_base(&fpriv->contexts, 1);
>> +	idr_init_base(&fpriv->legacy_contexts, 1);
>> +	xa_init_flags(&fpriv->contexts, XA_FLAGS_ALLOC1);
>>   	mutex_init(&fpriv->lock);
>>   	filp->driver_priv = fpriv;
>>   
>> @@ -429,7 +426,7 @@ static int tegra_client_open(struct tegra_drm_file *fpriv,
>>   	if (err < 0)
>>   		return err;
>>   
>> -	err = idr_alloc(&fpriv->contexts, context, 1, 0, GFP_KERNEL);
>> +	err = idr_alloc(&fpriv->legacy_contexts, context, 1, 0, GFP_KERNEL);
>>   	if (err < 0) {
>>   		client->ops->close_channel(context);
>>   		return err;
>> @@ -484,13 +481,13 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -EINVAL;
>>   		goto unlock;
>>   	}
>>   
>> -	idr_remove(&fpriv->contexts, context->id);
>> +	idr_remove(&fpriv->legacy_contexts, context->id);
>>   	tegra_drm_context_free(context);
>>   
>>   unlock:
>> @@ -509,7 +506,7 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -538,7 +535,7 @@ static int tegra_submit(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -563,7 +560,7 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
>>   
>>   	mutex_lock(&fpriv->lock);
>>   
>> -	context = idr_find(&fpriv->contexts, args->context);
>> +	context = idr_find(&fpriv->legacy_contexts, args->context);
>>   	if (!context) {
>>   		err = -ENODEV;
>>   		goto unlock;
>> @@ -732,10 +729,21 @@ static int tegra_gem_get_flags(struct drm_device *drm, void *data,
>>   
>>   static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
>>   #ifdef CONFIG_DRM_TEGRA_STAGING
>> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_gem_create,
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_OPEN, tegra_drm_ioctl_channel_open,
>> +			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_CLOSE, tegra_drm_ioctl_channel_close,
>> +			  DRM_RENDER_ALLOW),
> 
> I'd prefer to keep call these TEGRA_OPEN_CHANNEL and TEGRA_CLOSE_CHANNEL
> because I find that easier to think of. My reasoning goes: the TEGRA_
> prefix means we're operating at a global context and then we perform the
> OPEN_CHANNEL and CLOSE_CHANNEL operations. Whereas by the same reasoning
> TEGRA_CHANNEL_OPEN and TEGRA_CHANNEL_CLOSE suggest we're operating at
> the channel context and perform OPEN and CLOSE operations. For close you
> could make the argument that it makes sense, but you can't open a
> channel that you don't have yet.

I go by the same argument but consider TEGRA_CHANNEL_OPEN a bit of a 
"static method" of channels, and as such acceptable :p But I do see your 
point -- I can change it.

> 
> And if that doesn't convince you, I think appending _LEGACY here like we
> do for CREATE and MMAP would be more consistent. Who's going to remember
> which one is new: TEGRA_CHANNEL_OPEN or TEGRA_OPEN_CHANNEL?
> 
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_MAP, tegra_drm_ioctl_channel_map,
>>   			  DRM_RENDER_ALLOW),
>> -	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_gem_mmap,
>> +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
>>   			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
>> +			  DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
>> +			  DRM_RENDER_ALLOW),
>> +
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE_LEGACY, tegra_gem_create, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP_LEGACY, tegra_gem_mmap, DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_READ, tegra_syncpt_read,
>>   			  DRM_RENDER_ALLOW),
>>   	DRM_IOCTL_DEF_DRV(TEGRA_SYNCPT_INCR, tegra_syncpt_incr,
>> @@ -789,10 +797,11 @@ static void tegra_drm_postclose(struct drm_device *drm, struct drm_file *file)
>>   	struct tegra_drm_file *fpriv = file->driver_priv;
>>   
>>   	mutex_lock(&fpriv->lock);
>> -	idr_for_each(&fpriv->contexts, tegra_drm_context_cleanup, NULL);
>> +	idr_for_each(&fpriv->legacy_contexts, tegra_drm_context_cleanup, NULL);
>> +	tegra_drm_uapi_close_file(fpriv);
>>   	mutex_unlock(&fpriv->lock);
>>   
>> -	idr_destroy(&fpriv->contexts);
>> +	idr_destroy(&fpriv->legacy_contexts);
>>   	mutex_destroy(&fpriv->lock);
>>   	kfree(fpriv);
>>   }
>> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
>> index 0f38f159aa8e..1af57c2016eb 100644
>> --- a/drivers/gpu/drm/tegra/drm.h
>> +++ b/drivers/gpu/drm/tegra/drm.h
>> @@ -59,6 +59,11 @@ struct tegra_drm {
>>   	struct tegra_display_hub *hub;
>>   };
>>   
>> +static inline struct host1x *tegra_drm_to_host1x(struct tegra_drm *tegra)
>> +{
>> +	return dev_get_drvdata(tegra->drm->dev->parent);
>> +}
>> +
>>   struct tegra_drm_client;
>>   
>>   struct tegra_drm_context {
>> diff --git a/drivers/gpu/drm/tegra/uapi.h b/drivers/gpu/drm/tegra/uapi.h
>> new file mode 100644
>> index 000000000000..5c422607e8fa
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi.h
>> @@ -0,0 +1,63 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#ifndef _TEGRA_DRM_UAPI_H
>> +#define _TEGRA_DRM_UAPI_H
>> +
>> +#include <linux/dma-mapping.h>
>> +#include <linux/idr.h>
>> +#include <linux/kref.h>
>> +#include <linux/xarray.h>
>> +
>> +#include <drm/drm.h>
>> +
>> +struct drm_file;
>> +struct drm_device;
>> +
>> +struct tegra_drm_file {
>> +	/* Legacy UAPI state */
>> +	struct idr legacy_contexts;
>> +	struct mutex lock;
>> +
>> +	/* New UAPI state */
>> +	struct xarray contexts;
>> +};
>> +
>> +struct tegra_drm_channel_ctx {
>> +	struct tegra_drm_client *client;
>> +	struct host1x_channel *channel;
>> +	struct xarray mappings;
>> +};
> 
> This is mostly the same as tegra_drm_context, so can't we just merge the
> two? There's going to be slight overlap, but overall things are going to
> be less confusing to follow.
> 
> Even more so because I think we should consider phasing out the old UAPI
> eventually and then we can just remove the unneeded fields from this.

Okay.

> 
>> +
>> +struct tegra_drm_mapping {
>> +	struct kref ref;
>> +
>> +	struct device *dev;
>> +	struct host1x_bo *bo;
>> +	struct sg_table *sgt;
>> +	enum dma_data_direction direction;
>> +	dma_addr_t iova;
>> +	dma_addr_t iova_end;
> 
> iova_end seems to never be used. Do we need it?

It is used in the firewall.

> 
>> +};
>> +
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file);
>> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
>> +				  struct drm_file *file);
>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_channel_submit(struct drm_device *drm, void *data,
>> +				   struct drm_file *file);
>> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
>> +				struct drm_file *file);
>> +
>> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file);
>> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping);
>> +struct tegra_drm_channel_ctx *
>> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id);
>> +
>> +#endif
>> diff --git a/drivers/gpu/drm/tegra/uapi/uapi.c b/drivers/gpu/drm/tegra/uapi/uapi.c
>> new file mode 100644
>> index 000000000000..d503b5e817c4
>> --- /dev/null
>> +++ b/drivers/gpu/drm/tegra/uapi/uapi.c
>> @@ -0,0 +1,307 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/* Copyright (c) 2020 NVIDIA Corporation */
>> +
>> +#include <linux/host1x.h>
>> +#include <linux/iommu.h>
>> +#include <linux/list.h>
>> +
>> +#include <drm/drm_drv.h>
>> +#include <drm/drm_file.h>
>> +
>> +#include "../uapi.h"
>> +#include "../drm.h"
>> +
>> +struct tegra_drm_channel_ctx *
>> +tegra_drm_channel_ctx_lock(struct tegra_drm_file *file, u32 id)
>> +{
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	mutex_lock(&file->lock);
>> +	ctx = xa_load(&file->contexts, id);
>> +	if (!ctx)
>> +		mutex_unlock(&file->lock);
>> +
>> +	return ctx;
>> +}
> 
> This interface seems slightly odd. Looking at how this is used I see how
> doing it this way saves a couple of lines. However, it also make this
> difficult to understand, so I wonder if it wouldn't be better to just
> open-code this in the three callsites to make the code flow a bit more
> idiomatic.

Ok, will do. (Another option may be to add a 
tegra_drm_channel_ctx_unlock that just unlocks file->lock -- that'd 
abstract it out even better, which I quite like -- but I'll go with your 
preference)

> 
>> +
>> +static void tegra_drm_mapping_release(struct kref *ref)
>> +{
>> +	struct tegra_drm_mapping *mapping =
>> +		container_of(ref, struct tegra_drm_mapping, ref);
>> +
>> +	if (mapping->sgt)
>> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
>> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
>> +
>> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
>> +	host1x_bo_put(mapping->bo);
>> +
>> +	kfree(mapping);
>> +}
>> +
>> +void tegra_drm_mapping_put(struct tegra_drm_mapping *mapping)
>> +{
>> +	kref_put(&mapping->ref, tegra_drm_mapping_release);
>> +}
>> +
>> +static void tegra_drm_channel_ctx_close(struct tegra_drm_channel_ctx *ctx)
> 
> Yeah, the more often I read it, the more I'm in favour of just
> collapsing tegra_drm_channel_ctx into tegra_drm_channel if for nothing
> else but to get rid of that annoying _ctx suffix that's there for no
> other reason than to differentiate it from "legacy" contexts.
> 
>> +{
>> +	unsigned long mapping_id;
> 
> It's clear from the context that this is a mapping ID, so I think you
> can just leave out the "mapping_" prefix to save a bit on screen space.

Sure.

> 
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	xa_for_each(&ctx->mappings, mapping_id, mapping)
>> +		tegra_drm_mapping_put(mapping);
>> +
>> +	xa_destroy(&ctx->mappings);
>> +
>> +	host1x_channel_put(ctx->channel);
>> +
>> +	kfree(ctx);
>> +}
>> +
>> +int close_channel_ctx(int id, void *p, void *data)
>> +{
>> +	struct tegra_drm_channel_ctx *ctx = p;
>> +
>> +	tegra_drm_channel_ctx_close(ctx);
>> +
>> +	return 0;
>> +}
> 
> The signature looked strange, so I went looking for where this is called
> from and turns out I can't find any place where this is used. Do we need
> it?

Ah, maybe I have left it from some previous version. Will fix.

> 
>> +
>> +void tegra_drm_uapi_close_file(struct tegra_drm_file *file)
>> +{
>> +	unsigned long ctx_id;
> 
> Just like for mappings above, I think it's fine to leave out the ctx_
> prefix here.

Yep.

> 
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	xa_for_each(&file->contexts, ctx_id, ctx)
>> +		tegra_drm_channel_ctx_close(ctx);
>> +
>> +	xa_destroy(&file->contexts);
>> +}
>> +
>> +int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data,
>> +				 struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct tegra_drm *tegra = drm->dev_private;
>> +	struct drm_tegra_channel_open *args = data;
>> +	struct tegra_drm_client *client = NULL;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	int err;
>> +
>> +	if (args->flags)
>> +		return -EINVAL;
>> +
>> +	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	err = -ENODEV;
>> +	list_for_each_entry(client, &tegra->clients, list) {
>> +		if (client->base.class == args->host1x_class) {
>> +			err = 0;
>> +			break;
>> +		}
>> +	}
>> +	if (err)
>> +		goto free_ctx;
> 
> This type of construct looks weird. I found that a good way around this
> is to split this off into a separate function that does the lookup and
> just returns NULL when it doesn't find one, which is very elegant:
> 
> 	struct tegra_drm_client *tegra_drm_find_client(struct tegra_drm *tegra, u32 class)
> 	{
> 		struct tegra_drm_client *client;
> 
> 		list_for_each_entry(client, &tegra->clients, list)
> 			if (client->base.class == class)
> 				return client;
> 
> 		return NULL;
> 	}
> 
> and then all of a sudden, the very cumbersome construct above becomes
> this pretty piece of code:
> 
> 	client = tegra_drm_find_client(tegra, args->host1x_class);
> 	if (!client) {
> 		err = -ENODEV;
> 		goto free_ctx;
> 	}
> 
> No need for initializing client to NULL or preventatively setting err =
> -ENODEV or anything.

Yep.

> 
>> +
>> +	if (client->shared_channel) {
>> +		ctx->channel = host1x_channel_get(client->shared_channel);
>> +	} else {
>> +		ctx->channel = host1x_channel_request(&client->base);
>> +		if (!ctx->channel) {
>> +			err = -EBUSY;
> 
> I -EBUSY really appropriate here? Can host1x_channel_request() fail for
> other reasons?

It could also fail due to being out of memory (failing to allocate space 
for CDMA) - I guess we should plumb the error code here. But -EBUSY is 
really the most likely thing to happen anyway. Perhaps that can be done 
separately.

> 
>> +			goto free_ctx;
>> +		}
>> +	}
>> +
>> +	err = xa_alloc(&fpriv->contexts, &args->channel_ctx, ctx,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0)
>> +		goto put_channel;
>> +
>> +	ctx->client = client;
>> +	xa_init_flags(&ctx->mappings, XA_FLAGS_ALLOC1);
>> +
>> +	args->hardware_version = client->version;
>> +
>> +	return 0;
>> +
>> +put_channel:
>> +	host1x_channel_put(ctx->channel);
>> +free_ctx:
>> +	kfree(ctx);
>> +
>> +	return err;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_close(struct drm_device *drm, void *data,
>> +				  struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_close *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	xa_erase(&fpriv->contexts, args->channel_ctx);
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	tegra_drm_channel_ctx_close(ctx);
>> +
>> +	return 0;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_map(struct drm_device *drm, void *data,
>> +				struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_map *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	struct tegra_drm_mapping *mapping;
>> +	struct drm_gem_object *gem;
>> +	u32 mapping_id;
>> +	int err = 0;
>> +
>> +	if (args->flags & ~DRM_TEGRA_CHANNEL_MAP_READWRITE)
>> +		return -EINVAL;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	mapping = kzalloc(sizeof(*mapping), GFP_KERNEL);
>> +	if (!mapping) {
>> +		err = -ENOMEM;
>> +		goto unlock;
>> +	}
>> +
>> +	kref_init(&mapping->ref);
>> +
>> +	gem = drm_gem_object_lookup(file, args->handle);
>> +	if (!gem) {
>> +		err = -EINVAL;
>> +		goto unlock;
>> +	}
>> +
>> +	mapping->dev = ctx->client->base.dev;
>> +	mapping->bo = &container_of(gem, struct tegra_bo, gem)->base;
> 
> We already have host1x_bo_lookup() in drm.c that you can use to avoid
> this strange cast.

Okay, will fix.

> 
>> +
>> +	if (!iommu_get_domain_for_dev(mapping->dev) ||
>> +	    ctx->client->base.group) {
> 
> This expression is now used in at least two places, so I wonder if we
> should have a helper for it along with some documentation about why this
> is the right thing to do. I have a local patch that adds a comment to
> the other instance of this because I had forgotten why this was correct,
> so I can pick that up and refactor later on.

Actually, just last week I found out that the condition here is wrong 
(at least for this particular instance) -- with the current condition, 
if IOMMU is disabled we end up in the first branch, but if the BO in 
question was imported through DMA-BUF the iova will be NULL - 
host1x_bo_pin returns an SGT instead, so we need to go to the else path, 
which works fine. (If ctx->client->base.group is set, this is not a 
problem since import will IOMMU map the BO and set iova). I have a local 
fix for this which I'll add to v6.

> 
>> +		host1x_bo_pin(mapping->dev, mapping->bo,
>> +			      &mapping->iova);
>> +	} else {
>> +		mapping->direction = DMA_TO_DEVICE;
>> +		if (args->flags & DRM_TEGRA_CHANNEL_MAP_READWRITE)
>> +			mapping->direction = DMA_BIDIRECTIONAL;
>> +
>> +		mapping->sgt =
>> +			host1x_bo_pin(mapping->dev, mapping->bo, NULL);
>> +		if (IS_ERR(mapping->sgt)) {
>> +			err = PTR_ERR(mapping->sgt);
>> +			goto put_gem;
>> +		}
>> +
>> +		err = dma_map_sgtable(mapping->dev, mapping->sgt,
>> +				      mapping->direction,
>> +				      DMA_ATTR_SKIP_CPU_SYNC);
>> +		if (err)
>> +			goto unpin;
>> +
>> +		/* TODO only map the requested part */
>> +		mapping->iova = sg_dma_address(mapping->sgt->sgl);
> 
> That comment seems misplaced here since the mapping already happens
> above. Also, wouldn't the same TODO apply to the host1x_bo_pin() path in
> the if block? Maybe the TODO should be at the top of the function?
> 
> Alternatively, if this isn't implemented in this patch anyway, maybe
> just drop the comment altogether. In order to implement this, wouldn't
> the UAPI have to change as well? In that case it might be better to add
> the TODO somewhere in the UAPI header, or in a separate TODO file in the
> driver's directory.

Yeah, I'll drop the comment. The UAPI originally had support for this 
but I dropped it upon Dmitry's objection.

> 
>> +	}
>> +
>> +	mapping->iova_end = mapping->iova + gem->size;
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	err = xa_alloc(&ctx->mappings, &mapping_id, mapping,
>> +		       XA_LIMIT(1, U32_MAX), GFP_KERNEL);
>> +	if (err < 0)
>> +		goto unmap;
>> +
>> +	args->mapping_id = mapping_id;
>> +
>> +	return 0;
>> +
>> +unmap:
>> +	if (mapping->sgt) {
>> +		dma_unmap_sgtable(mapping->dev, mapping->sgt,
>> +				  mapping->direction, DMA_ATTR_SKIP_CPU_SYNC);
>> +	}
>> +unpin:
>> +	host1x_bo_unpin(mapping->dev, mapping->bo, mapping->sgt);
>> +put_gem:
>> +	drm_gem_object_put(gem);
>> +	kfree(mapping);
>> +unlock:
>> +	mutex_unlock(&fpriv->lock);
>> +	return err;
>> +}
>> +
>> +int tegra_drm_ioctl_channel_unmap(struct drm_device *drm, void *data,
>> +				  struct drm_file *file)
>> +{
>> +	struct tegra_drm_file *fpriv = file->driver_priv;
>> +	struct drm_tegra_channel_unmap *args = data;
>> +	struct tegra_drm_channel_ctx *ctx;
>> +	struct tegra_drm_mapping *mapping;
>> +
>> +	ctx = tegra_drm_channel_ctx_lock(fpriv, args->channel_ctx);
>> +	if (!ctx)
>> +		return -EINVAL;
>> +
>> +	mapping = xa_erase(&ctx->mappings, args->mapping_id);
>> +
>> +	mutex_unlock(&fpriv->lock);
>> +
>> +	if (mapping) {
>> +		tegra_drm_mapping_put(mapping);
>> +		return 0;
>> +	} else {
>> +		return -EINVAL;
>> +	}
>> +}
>> +
>> +int tegra_drm_ioctl_gem_create(struct drm_device *drm, void *data,
>> +			       struct drm_file *file)
>> +{
>> +	struct drm_tegra_gem_create *args = data;
>> +	struct tegra_bo *bo;
>> +
>> +	if (args->flags)
>> +		return -EINVAL;
> 
> I'm not sure it's worth doing this, especially because this is now a new
> IOCTL that's actually a subset of the original. I think we should just
> keep the original and if we want to deprecate the flags, or replace them
> with new ones, let's just try and phase out the deprecated ones.

Ok, I'll look into it.

> 
> Thierry

Thanks,
Mikko

> 
>> +
>> +	bo = tegra_bo_create_with_handle(file, drm, args->size, args->flags,
>> +					 &args->handle);
>> +	if (IS_ERR(bo))
>> +		return PTR_ERR(bo);
>> +
>> +	return 0;
>> +}
>> +
>> +int tegra_drm_ioctl_gem_mmap(struct drm_device *drm, void *data,
>> +			     struct drm_file *file)
>> +{
>> +	struct drm_tegra_gem_mmap *args = data;
>> +	struct drm_gem_object *gem;
>> +	struct tegra_bo *bo;
>> +
>> +	gem = drm_gem_object_lookup(file, args->handle);
>> +	if (!gem)
>> +		return -EINVAL;
>> +
>> +	bo = to_tegra_bo(gem);
>> +
>> +	args->offset = drm_vma_node_offset_addr(&bo->gem.vma_node);
>> +
>> +	drm_gem_object_put(gem);
>> +
>> +	return 0;
>> +}
>> -- 
>> 2.30.0
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-03-23 14:43       ` Mikko Perttunen
@ 2021-03-23 15:00         ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 15:00 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding, Mikko Perttunen
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

23.03.2021 17:43, Mikko Perttunen пишет:
>>
>> FWIW, I would've been fine with stashing all of this into drm.c as well
>> since the rest of the UAPI is in that already. The churn in this patch
>> is reasonably small, but it would've been even less if this was just all
>> in drm.c.
> 
> I think we shouldn't have the uapi in drm.c -- it just makes the file a
> bit of a dumping ground. I think drm.c should have the code that relates
> to initialization and initial registration with DRM.

+1 drm.c is already very unmanageable / difficult to follow, it
absolutely must be split up.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-03-23 15:00         ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 15:00 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

23.03.2021 17:43, Mikko Perttunen пишет:
>>
>> FWIW, I would've been fine with stashing all of this into drm.c as well
>> since the rest of the UAPI is in that already. The churn in this patch
>> is reasonably small, but it would've been even less if this was just all
>> in drm.c.
> 
> I think we shouldn't have the uapi in drm.c -- it just makes the file a
> bit of a dumping ground. I think drm.c should have the code that relates
> to initialization and initial registration with DRM.

+1 drm.c is already very unmanageable / difficult to follow, it
absolutely must be split up.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-03-23 14:00               ` Dmitry Osipenko
@ 2021-03-23 16:44                 ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 16:44 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 3832 bytes --]

On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 15:30, Thierry Reding пишет:
> > On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>> +struct drm_tegra_submit_buf {
> >>>>>> +    /**
> >>>>>> +     * @mapping_id: [in]
> >>>>>> +     *
> >>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>> +     */
> >>>>>> +    __u32 mapping_id;
> >>>>>
> >>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>> becomes the first obstacle.
> >>>>>
> >>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>> particular for T20, which has a small GART.
> >>>>>
> >>>>> Given that the usefulness of the partial mapping feature is very
> >>>>> questionable until it will be proven with a real userspace, we should
> >>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>
> >>>>> DRM already should have everything necessary for creating and managing
> >>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>> long time now for that.
> >>>>>
> >>>>> It should be fine to support the static mapping feature, but it should
> >>>>> be done separately with the drm_mm integration, IMO.
> >>>>>
> >>>>> What do think?
> >>>>>
> >>>>
> >>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>> any other reasons this would not work on older chips?
> >>>
> >>> We have all DRM devices in a single address space on T30+, hence having
> >>> duplicated mappings for each device should be a bit wasteful.
> >>
> >> I guess this should be pretty easy to change to only keep one mapping per
> >> GEM object.
> > 
> > The important point here is the semantics: this IOCTL establishes a
> > mapping for a given GEM object on a given channel. If the underlying
> > implementation is such that the mapping doesn't fit into the GART, then
> > that's an implementation detail that the driver needs to take care of.
> > Similarly, if multiple devices share a single address space, that's
> > something the driver already knows and can take advantage of by simply
> > reusing an existing mapping if one already exists. In both cases the
> > semantics would be correctly implemented and that's really all that
> > matters.
> > 
> > Overall this interface seems sound from a high-level point of view and
> > allows these mappings to be properly created even for the cases we have
> > where each channel may have a separate address space. It may not be the
> > optimal interface for all use-cases or any one individual case, but the
> > very nature of these interfaces is to abstract away certain differences
> > in order to provide a unified interface to a common programming model.
> > So there will always be certain tradeoffs.
> 
> For now this IOCTL isn't useful from a userspace perspective of older
> SoCs and I'll need to add a lot of code that won't do anything useful
> just to conform to the specific needs of the newer SoCs. Trying to unify
> everything into a single API doesn't sound like a good idea at this
> point and I already suggested to Mikko to try out variant with a
> separated per-SoC code paths in the next version, then the mappings
> could be handled separately by the T186+ paths.

I'm not sure I understand what you're saying. Obviously the underlying
implementation of this might have to differ depending on SoC generation.
But it sounds like you're suggesting having different UAPIs depending on
SoC generation.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-03-23 16:44                 ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 16:44 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 3832 bytes --]

On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 15:30, Thierry Reding пишет:
> > On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>> +struct drm_tegra_submit_buf {
> >>>>>> +    /**
> >>>>>> +     * @mapping_id: [in]
> >>>>>> +     *
> >>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>> +     */
> >>>>>> +    __u32 mapping_id;
> >>>>>
> >>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>> becomes the first obstacle.
> >>>>>
> >>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>> particular for T20, which has a small GART.
> >>>>>
> >>>>> Given that the usefulness of the partial mapping feature is very
> >>>>> questionable until it will be proven with a real userspace, we should
> >>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>
> >>>>> DRM already should have everything necessary for creating and managing
> >>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>> long time now for that.
> >>>>>
> >>>>> It should be fine to support the static mapping feature, but it should
> >>>>> be done separately with the drm_mm integration, IMO.
> >>>>>
> >>>>> What do think?
> >>>>>
> >>>>
> >>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>> any other reasons this would not work on older chips?
> >>>
> >>> We have all DRM devices in a single address space on T30+, hence having
> >>> duplicated mappings for each device should be a bit wasteful.
> >>
> >> I guess this should be pretty easy to change to only keep one mapping per
> >> GEM object.
> > 
> > The important point here is the semantics: this IOCTL establishes a
> > mapping for a given GEM object on a given channel. If the underlying
> > implementation is such that the mapping doesn't fit into the GART, then
> > that's an implementation detail that the driver needs to take care of.
> > Similarly, if multiple devices share a single address space, that's
> > something the driver already knows and can take advantage of by simply
> > reusing an existing mapping if one already exists. In both cases the
> > semantics would be correctly implemented and that's really all that
> > matters.
> > 
> > Overall this interface seems sound from a high-level point of view and
> > allows these mappings to be properly created even for the cases we have
> > where each channel may have a separate address space. It may not be the
> > optimal interface for all use-cases or any one individual case, but the
> > very nature of these interfaces is to abstract away certain differences
> > in order to provide a unified interface to a common programming model.
> > So there will always be certain tradeoffs.
> 
> For now this IOCTL isn't useful from a userspace perspective of older
> SoCs and I'll need to add a lot of code that won't do anything useful
> just to conform to the specific needs of the newer SoCs. Trying to unify
> everything into a single API doesn't sound like a good idea at this
> point and I already suggested to Mikko to try out variant with a
> separated per-SoC code paths in the next version, then the mappings
> could be handled separately by the T186+ paths.

I'm not sure I understand what you're saying. Obviously the underlying
implementation of this might have to differ depending on SoC generation.
But it sounds like you're suggesting having different UAPIs depending on
SoC generation.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
  2021-03-23 15:00         ` Dmitry Osipenko
@ 2021-03-23 16:59           ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 16:59 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 1171 bytes --]

On Tue, Mar 23, 2021 at 06:00:34PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 17:43, Mikko Perttunen пишет:
> >>
> >> FWIW, I would've been fine with stashing all of this into drm.c as well
> >> since the rest of the UAPI is in that already. The churn in this patch
> >> is reasonably small, but it would've been even less if this was just all
> >> in drm.c.
> > 
> > I think we shouldn't have the uapi in drm.c -- it just makes the file a
> > bit of a dumping ground. I think drm.c should have the code that relates
> > to initialization and initial registration with DRM.
> 
> +1 drm.c is already very unmanageable / difficult to follow, it
> absolutely must be split up.

I guess this comes down to personal preference. I don't find it
difficult to navigate large files. On the contrary, I find it much more
challenging to navigate a code base spread over lots and lots of files.
I don't feel strongly about moving this code into a separate file,
though, but let's maybe compromise and leave it at that. No need to
split this out into 5 (or whatever it was) different tiny files that
the end result of this series seems to yield.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 19/21] drm/tegra: Implement new UAPI
@ 2021-03-23 16:59           ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 16:59 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 1171 bytes --]

On Tue, Mar 23, 2021 at 06:00:34PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 17:43, Mikko Perttunen пишет:
> >>
> >> FWIW, I would've been fine with stashing all of this into drm.c as well
> >> since the rest of the UAPI is in that already. The churn in this patch
> >> is reasonably small, but it would've been even less if this was just all
> >> in drm.c.
> > 
> > I think we shouldn't have the uapi in drm.c -- it just makes the file a
> > bit of a dumping ground. I think drm.c should have the code that relates
> > to initialization and initial registration with DRM.
> 
> +1 drm.c is already very unmanageable / difficult to follow, it
> absolutely must be split up.

I guess this comes down to personal preference. I don't find it
difficult to navigate large files. On the contrary, I find it much more
challenging to navigate a code base spread over lots and lots of files.
I don't feel strongly about moving this code into a separate file,
though, but let's maybe compromise and leave it at that. No need to
split this out into 5 (or whatever it was) different tiny files that
the end result of this series seems to yield.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
  2021-03-23 14:16       ` Mikko Perttunen
@ 2021-03-23 17:04         ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 17:04 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Mikko Perttunen, jonathanh, digetx, airlied, daniel, linux-tegra,
	talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 13141 bytes --]

On Tue, Mar 23, 2021 at 04:16:00PM +0200, Mikko Perttunen wrote:
> On 3/23/21 3:38 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
> > > Implement the job submission IOCTL with a minimum feature set.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > > v5:
> > > * Add 16K size limit to copies from userspace.
> > > * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
> > >    to prevent oversized shift on 32-bit platforms.
> > > v4:
> > > * Remove all features that are not strictly necessary.
> > > * Split into two patches.
> > > v3:
> > > * Remove WRITE_RELOC. Relocations are now patched implicitly
> > >    when patching is needed.
> > > * Directly call PM runtime APIs on devices instead of using
> > >    power_on/power_off callbacks.
> > > * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> > > * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> > > * Accommodate for removal of timeout field and inlining of
> > >    syncpt_incrs array.
> > > * Copy entire user arrays at a time instead of going through
> > >    elements one-by-one.
> > > * Implement waiting of DMA reservations.
> > > * Split out gather_bo implementation into a separate file.
> > > * Fix length parameter passed to sg_init_one in gather_bo
> > > * Cosmetic cleanup.
> > > ---
> > >   drivers/gpu/drm/tegra/Makefile         |   2 +
> > >   drivers/gpu/drm/tegra/drm.c            |   2 +
> > >   drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
> > >   drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
> > >   drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
> > >   drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
> > >   6 files changed, 557 insertions(+)
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
> > > 
> > > diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> > > index 0abdb21b38b9..059322e88943 100644
> > > --- a/drivers/gpu/drm/tegra/Makefile
> > > +++ b/drivers/gpu/drm/tegra/Makefile
> > > @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
> > >   tegra-drm-y := \
> > >   	drm.o \
> > >   	uapi/uapi.o \
> > > +	uapi/submit.o \
> > > +	uapi/gather_bo.o \
> > >   	gem.o \
> > >   	fb.o \
> > >   	dp.o \
> > > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > > index 6a51035ce33f..60eab403ae9b 100644
> > > --- a/drivers/gpu/drm/tegra/drm.c
> > > +++ b/drivers/gpu/drm/tegra/drm.c
> > > @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
> > >   			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
> > >   			  DRM_RENDER_ALLOW),
> > > +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
> > > +			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
> > >   			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> > > diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> > > new file mode 100644
> > > index 000000000000..b487a0d44648
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> > > @@ -0,0 +1,86 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#include <linux/scatterlist.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#include "gather_bo.h"
> > > +
> > > +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	kref_get(&bo->ref);
> > > +
> > > +	return host_bo;
> > > +}
> > > +
> > > +static void gather_bo_release(struct kref *ref)
> > > +{
> > > +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
> > > +
> > > +	kfree(bo->gather_data);
> > > +	kfree(bo);
> > > +}
> > > +
> > > +void gather_bo_put(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	kref_put(&bo->ref, gather_bo_release);
> > > +}
> > > +
> > > +static struct sg_table *
> > > +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +	struct sg_table *sgt;
> > > +	int err;
> > > +
> > > +	if (phys) {
> > > +		*phys = virt_to_phys(bo->gather_data);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > > +	if (!sgt)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
> > > +	if (err) {
> > > +		kfree(sgt);
> > > +		return ERR_PTR(err);
> > > +	}
> > > +
> > > +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
> > > +
> > > +	return sgt;
> > > +}
> > > +
> > > +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
> > > +{
> > > +	if (sgt) {
> > > +		sg_free_table(sgt);
> > > +		kfree(sgt);
> > > +	}
> > > +}
> > > +
> > > +static void *gather_bo_mmap(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	return bo->gather_data;
> > > +}
> > > +
> > > +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
> > > +{
> > > +}
> > > +
> > > +const struct host1x_bo_ops gather_bo_ops = {
> > > +	.get = gather_bo_get,
> > > +	.put = gather_bo_put,
> > > +	.pin = gather_bo_pin,
> > > +	.unpin = gather_bo_unpin,
> > > +	.mmap = gather_bo_mmap,
> > > +	.munmap = gather_bo_munmap,
> > > +};
> > > diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> > > new file mode 100644
> > > index 000000000000..6b4c9d83ac91
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> > > @@ -0,0 +1,22 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
> > > +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
> > > +
> > > +#include <linux/host1x.h>
> > > +#include <linux/kref.h>
> > > +
> > > +struct gather_bo {
> > > +	struct host1x_bo base;
> > > +
> > > +	struct kref ref;
> > > +
> > > +	u32 *gather_data;
> > > +	size_t gather_data_words;
> > > +};
> > > +
> > > +extern const struct host1x_bo_ops gather_bo_ops;
> > > +void gather_bo_put(struct host1x_bo *host_bo);
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
> > > new file mode 100644
> > > index 000000000000..398be3065e21
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/submit.c
> > > @@ -0,0 +1,428 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#include <linux/dma-fence-array.h>
> > > +#include <linux/file.h>
> > > +#include <linux/host1x.h>
> > > +#include <linux/iommu.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/list.h>
> > > +#include <linux/nospec.h>
> > > +#include <linux/pm_runtime.h>
> > > +#include <linux/sync_file.h>
> > > +
> > > +#include <drm/drm_drv.h>
> > > +#include <drm/drm_file.h>
> > > +
> > > +#include "../uapi.h"
> > > +#include "../drm.h"
> > > +#include "../gem.h"
> > > +
> > > +#include "gather_bo.h"
> > > +#include "submit.h"
> > > +
> > > +static struct tegra_drm_mapping *
> > > +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
> > > +{
> > > +	struct tegra_drm_mapping *mapping;
> > > +
> > > +	xa_lock(&ctx->mappings);
> > > +	mapping = xa_load(&ctx->mappings, id);
> > > +	if (mapping)
> > > +		kref_get(&mapping->ref);
> > > +	xa_unlock(&ctx->mappings);
> > > +
> > > +	return mapping;
> > > +}
> > > +
> > > +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
> > > +{
> > > +	unsigned long copy_err;
> > > +	size_t copy_len;
> > > +	void *data;
> > > +
> > > +	if (check_mul_overflow(count, size, &copy_len))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	if (copy_len > 0x4000)
> > > +		return ERR_PTR(-E2BIG);
> > > +
> > > +	data = kvmalloc(copy_len, GFP_KERNEL);
> > > +	if (!data)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	copy_err = copy_from_user(data, from, copy_len);
> > > +	if (copy_err) {
> > > +		kvfree(data);
> > > +		return ERR_PTR(-EFAULT);
> > > +	}
> > > +
> > > +	return data;
> > > +}
> > > +
> > > +static int submit_copy_gather_data(struct drm_device *drm,
> > > +				   struct gather_bo **pbo,
> > > +				   struct drm_tegra_channel_submit *args)
> > > +{
> > > +	unsigned long copy_err;
> > > +	struct gather_bo *bo;
> > > +	size_t copy_len;
> > > +
> > > +	if (args->gather_data_words == 0) {
> > > +		drm_info(drm, "gather_data_words can't be 0");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
> > > +		return -EINVAL;
> > > +
> > > +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
> > > +	if (!bo)
> > > +		return -ENOMEM;
> > > +
> > > +	kref_init(&bo->ref);
> > > +	host1x_bo_init(&bo->base, &gather_bo_ops);
> > > +
> > > +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
> > > +	if (!bo->gather_data) {
> > > +		kfree(bo);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	copy_err = copy_from_user(bo->gather_data,
> > > +				  u64_to_user_ptr(args->gather_data_ptr),
> > > +				  copy_len);
> > > +	if (copy_err) {
> > > +		kfree(bo->gather_data);
> > > +		kfree(bo);
> > > +		return -EFAULT;
> > > +	}
> > > +
> > > +	bo->gather_data_words = args->gather_data_words;
> > > +
> > > +	*pbo = bo;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int submit_write_reloc(struct gather_bo *bo,
> > > +			      struct drm_tegra_submit_buf *buf,
> > > +			      struct tegra_drm_mapping *mapping)
> > > +{
> > > +	/* TODO check that target_offset is within bounds */
> > > +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
> > > +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
> > > +
> > > +#ifdef CONFIG_ARM64
> > > +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> > > +		written_ptr |= BIT(39);
> > > +#endif
> > 
> > Sorry, but this still isn't correct. written_ptr is still only 32-bits
> > wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
> > idiomatic way to do this is to make written_ptr dma_addr_t and use a
> > CONFIG_ARCH_DMA_ADDR_T_64BIT guard. >
> > But even with that this looks wrong because you're OR'ing this in after
> > shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
> > Should you perhaps be doing this instead:
> > 
> > 	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> > 		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> > 			iova |= BIT(39);
> > 	#endif
> > 
> > 	written_ptr = (u32)(iova >> buf->reloc_shift);
> > 
> > ?
> 
> Yes, you are of course right.. will fix this. That might explain some of the
> VIC test failures I've seen.
> 
> > 
> > Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
> > recently dealt with this for display (though I haven't sent out that
> > patch yet) and this is actually a bit that selects which sector layout
> > swizzling is being applied. That's independent of block linear format
> > and I think you can have different sector layouts irrespective of the
> > block linear format (though I don't think that's usually done).
> > 
> > That said, I wonder if a better interface here would be to reuse format
> > modifiers here. That would allow us to more fully describe the format of
> > a surface in case we ever need it, and it already includes the sector
> > layout information as well.
> 
> I think having just a flag that enables or disables the swizzling is better
> -- that way it is the responsibility of the userspace, which is where all
> the engine knowledge is as well, to know for each buffer whether it wants
> swizzling or not. Now, in practice at the moment the kernel can just lookup
> the format and set the bit based on that, but e.g. if there was an engine
> that could do the swizzling natively, and we had the format modifier here,
> we'd need to have the knowledge in the kernel to decide for each chip/engine
> whether to apply the bit.

Fine, let's try it this way. I'm just slightly worried that we'll end up
duplicating a lot of the same information that we already have in the
framebuffer modifiers. We made the same mistake a long time ago with
those odd flags in the CREATE IOCTL and that turned out not to be usable
at all, and also completely insufficient.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 20/21] drm/tegra: Implement job submission part of new UAPI
@ 2021-03-23 17:04         ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 17:04 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra,
	digetx, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 13141 bytes --]

On Tue, Mar 23, 2021 at 04:16:00PM +0200, Mikko Perttunen wrote:
> On 3/23/21 3:38 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:18PM +0200, Mikko Perttunen wrote:
> > > Implement the job submission IOCTL with a minimum feature set.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > > v5:
> > > * Add 16K size limit to copies from userspace.
> > > * Guard RELOC_BLOCKLINEAR flag handling to only exist in ARM64
> > >    to prevent oversized shift on 32-bit platforms.
> > > v4:
> > > * Remove all features that are not strictly necessary.
> > > * Split into two patches.
> > > v3:
> > > * Remove WRITE_RELOC. Relocations are now patched implicitly
> > >    when patching is needed.
> > > * Directly call PM runtime APIs on devices instead of using
> > >    power_on/power_off callbacks.
> > > * Remove incorrect mutex unlock in tegra_drm_ioctl_channel_open
> > > * Use XA_FLAGS_ALLOC1 instead of XA_FLAGS_ALLOC
> > > * Accommodate for removal of timeout field and inlining of
> > >    syncpt_incrs array.
> > > * Copy entire user arrays at a time instead of going through
> > >    elements one-by-one.
> > > * Implement waiting of DMA reservations.
> > > * Split out gather_bo implementation into a separate file.
> > > * Fix length parameter passed to sg_init_one in gather_bo
> > > * Cosmetic cleanup.
> > > ---
> > >   drivers/gpu/drm/tegra/Makefile         |   2 +
> > >   drivers/gpu/drm/tegra/drm.c            |   2 +
> > >   drivers/gpu/drm/tegra/uapi/gather_bo.c |  86 +++++
> > >   drivers/gpu/drm/tegra/uapi/gather_bo.h |  22 ++
> > >   drivers/gpu/drm/tegra/uapi/submit.c    | 428 +++++++++++++++++++++++++
> > >   drivers/gpu/drm/tegra/uapi/submit.h    |  17 +
> > >   6 files changed, 557 insertions(+)
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.c
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/gather_bo.h
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.c
> > >   create mode 100644 drivers/gpu/drm/tegra/uapi/submit.h
> > > 
> > > diff --git a/drivers/gpu/drm/tegra/Makefile b/drivers/gpu/drm/tegra/Makefile
> > > index 0abdb21b38b9..059322e88943 100644
> > > --- a/drivers/gpu/drm/tegra/Makefile
> > > +++ b/drivers/gpu/drm/tegra/Makefile
> > > @@ -4,6 +4,8 @@ ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
> > >   tegra-drm-y := \
> > >   	drm.o \
> > >   	uapi/uapi.o \
> > > +	uapi/submit.o \
> > > +	uapi/gather_bo.o \
> > >   	gem.o \
> > >   	fb.o \
> > >   	dp.o \
> > > diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> > > index 6a51035ce33f..60eab403ae9b 100644
> > > --- a/drivers/gpu/drm/tegra/drm.c
> > > +++ b/drivers/gpu/drm/tegra/drm.c
> > > @@ -737,6 +737,8 @@ static const struct drm_ioctl_desc tegra_drm_ioctls[] = {
> > >   			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_UNMAP, tegra_drm_ioctl_channel_unmap,
> > >   			  DRM_RENDER_ALLOW),
> > > +	DRM_IOCTL_DEF_DRV(TEGRA_CHANNEL_SUBMIT, tegra_drm_ioctl_channel_submit,
> > > +			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_CREATE, tegra_drm_ioctl_gem_create,
> > >   			  DRM_RENDER_ALLOW),
> > >   	DRM_IOCTL_DEF_DRV(TEGRA_GEM_MMAP, tegra_drm_ioctl_gem_mmap,
> > > diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.c b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> > > new file mode 100644
> > > index 000000000000..b487a0d44648
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.c
> > > @@ -0,0 +1,86 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#include <linux/scatterlist.h>
> > > +#include <linux/slab.h>
> > > +
> > > +#include "gather_bo.h"
> > > +
> > > +static struct host1x_bo *gather_bo_get(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	kref_get(&bo->ref);
> > > +
> > > +	return host_bo;
> > > +}
> > > +
> > > +static void gather_bo_release(struct kref *ref)
> > > +{
> > > +	struct gather_bo *bo = container_of(ref, struct gather_bo, ref);
> > > +
> > > +	kfree(bo->gather_data);
> > > +	kfree(bo);
> > > +}
> > > +
> > > +void gather_bo_put(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	kref_put(&bo->ref, gather_bo_release);
> > > +}
> > > +
> > > +static struct sg_table *
> > > +gather_bo_pin(struct device *dev, struct host1x_bo *host_bo, dma_addr_t *phys)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +	struct sg_table *sgt;
> > > +	int err;
> > > +
> > > +	if (phys) {
> > > +		*phys = virt_to_phys(bo->gather_data);
> > > +		return NULL;
> > > +	}
> > > +
> > > +	sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
> > > +	if (!sgt)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	err = sg_alloc_table(sgt, 1, GFP_KERNEL);
> > > +	if (err) {
> > > +		kfree(sgt);
> > > +		return ERR_PTR(err);
> > > +	}
> > > +
> > > +	sg_init_one(sgt->sgl, bo->gather_data, bo->gather_data_words*4);
> > > +
> > > +	return sgt;
> > > +}
> > > +
> > > +static void gather_bo_unpin(struct device *dev, struct sg_table *sgt)
> > > +{
> > > +	if (sgt) {
> > > +		sg_free_table(sgt);
> > > +		kfree(sgt);
> > > +	}
> > > +}
> > > +
> > > +static void *gather_bo_mmap(struct host1x_bo *host_bo)
> > > +{
> > > +	struct gather_bo *bo = container_of(host_bo, struct gather_bo, base);
> > > +
> > > +	return bo->gather_data;
> > > +}
> > > +
> > > +static void gather_bo_munmap(struct host1x_bo *host_bo, void *addr)
> > > +{
> > > +}
> > > +
> > > +const struct host1x_bo_ops gather_bo_ops = {
> > > +	.get = gather_bo_get,
> > > +	.put = gather_bo_put,
> > > +	.pin = gather_bo_pin,
> > > +	.unpin = gather_bo_unpin,
> > > +	.mmap = gather_bo_mmap,
> > > +	.munmap = gather_bo_munmap,
> > > +};
> > > diff --git a/drivers/gpu/drm/tegra/uapi/gather_bo.h b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> > > new file mode 100644
> > > index 000000000000..6b4c9d83ac91
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/gather_bo.h
> > > @@ -0,0 +1,22 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#ifndef _TEGRA_DRM_SUBMIT_GATHER_BO_H
> > > +#define _TEGRA_DRM_SUBMIT_GATHER_BO_H
> > > +
> > > +#include <linux/host1x.h>
> > > +#include <linux/kref.h>
> > > +
> > > +struct gather_bo {
> > > +	struct host1x_bo base;
> > > +
> > > +	struct kref ref;
> > > +
> > > +	u32 *gather_data;
> > > +	size_t gather_data_words;
> > > +};
> > > +
> > > +extern const struct host1x_bo_ops gather_bo_ops;
> > > +void gather_bo_put(struct host1x_bo *host_bo);
> > > +
> > > +#endif
> > > diff --git a/drivers/gpu/drm/tegra/uapi/submit.c b/drivers/gpu/drm/tegra/uapi/submit.c
> > > new file mode 100644
> > > index 000000000000..398be3065e21
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/tegra/uapi/submit.c
> > > @@ -0,0 +1,428 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/* Copyright (c) 2020 NVIDIA Corporation */
> > > +
> > > +#include <linux/dma-fence-array.h>
> > > +#include <linux/file.h>
> > > +#include <linux/host1x.h>
> > > +#include <linux/iommu.h>
> > > +#include <linux/kref.h>
> > > +#include <linux/list.h>
> > > +#include <linux/nospec.h>
> > > +#include <linux/pm_runtime.h>
> > > +#include <linux/sync_file.h>
> > > +
> > > +#include <drm/drm_drv.h>
> > > +#include <drm/drm_file.h>
> > > +
> > > +#include "../uapi.h"
> > > +#include "../drm.h"
> > > +#include "../gem.h"
> > > +
> > > +#include "gather_bo.h"
> > > +#include "submit.h"
> > > +
> > > +static struct tegra_drm_mapping *
> > > +tegra_drm_mapping_get(struct tegra_drm_channel_ctx *ctx, u32 id)
> > > +{
> > > +	struct tegra_drm_mapping *mapping;
> > > +
> > > +	xa_lock(&ctx->mappings);
> > > +	mapping = xa_load(&ctx->mappings, id);
> > > +	if (mapping)
> > > +		kref_get(&mapping->ref);
> > > +	xa_unlock(&ctx->mappings);
> > > +
> > > +	return mapping;
> > > +}
> > > +
> > > +static void *alloc_copy_user_array(void __user *from, size_t count, size_t size)
> > > +{
> > > +	unsigned long copy_err;
> > > +	size_t copy_len;
> > > +	void *data;
> > > +
> > > +	if (check_mul_overflow(count, size, &copy_len))
> > > +		return ERR_PTR(-EINVAL);
> > > +
> > > +	if (copy_len > 0x4000)
> > > +		return ERR_PTR(-E2BIG);
> > > +
> > > +	data = kvmalloc(copy_len, GFP_KERNEL);
> > > +	if (!data)
> > > +		return ERR_PTR(-ENOMEM);
> > > +
> > > +	copy_err = copy_from_user(data, from, copy_len);
> > > +	if (copy_err) {
> > > +		kvfree(data);
> > > +		return ERR_PTR(-EFAULT);
> > > +	}
> > > +
> > > +	return data;
> > > +}
> > > +
> > > +static int submit_copy_gather_data(struct drm_device *drm,
> > > +				   struct gather_bo **pbo,
> > > +				   struct drm_tegra_channel_submit *args)
> > > +{
> > > +	unsigned long copy_err;
> > > +	struct gather_bo *bo;
> > > +	size_t copy_len;
> > > +
> > > +	if (args->gather_data_words == 0) {
> > > +		drm_info(drm, "gather_data_words can't be 0");
> > > +		return -EINVAL;
> > > +	}
> > > +
> > > +	if (check_mul_overflow((size_t)args->gather_data_words, (size_t)4, &copy_len))
> > > +		return -EINVAL;
> > > +
> > > +	bo = kzalloc(sizeof(*bo), GFP_KERNEL);
> > > +	if (!bo)
> > > +		return -ENOMEM;
> > > +
> > > +	kref_init(&bo->ref);
> > > +	host1x_bo_init(&bo->base, &gather_bo_ops);
> > > +
> > > +	bo->gather_data = kmalloc(copy_len, GFP_KERNEL | __GFP_NOWARN);
> > > +	if (!bo->gather_data) {
> > > +		kfree(bo);
> > > +		return -ENOMEM;
> > > +	}
> > > +
> > > +	copy_err = copy_from_user(bo->gather_data,
> > > +				  u64_to_user_ptr(args->gather_data_ptr),
> > > +				  copy_len);
> > > +	if (copy_err) {
> > > +		kfree(bo->gather_data);
> > > +		kfree(bo);
> > > +		return -EFAULT;
> > > +	}
> > > +
> > > +	bo->gather_data_words = args->gather_data_words;
> > > +
> > > +	*pbo = bo;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int submit_write_reloc(struct gather_bo *bo,
> > > +			      struct drm_tegra_submit_buf *buf,
> > > +			      struct tegra_drm_mapping *mapping)
> > > +{
> > > +	/* TODO check that target_offset is within bounds */
> > > +	dma_addr_t iova = mapping->iova + buf->reloc.target_offset;
> > > +	u32 written_ptr = (u32)(iova >> buf->reloc.shift);
> > > +
> > > +#ifdef CONFIG_ARM64
> > > +	if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> > > +		written_ptr |= BIT(39);
> > > +#endif
> > 
> > Sorry, but this still isn't correct. written_ptr is still only 32-bits
> > wide, so your BIT(39) is going to get discarded even on 64-bit ARM. The
> > idiomatic way to do this is to make written_ptr dma_addr_t and use a
> > CONFIG_ARCH_DMA_ADDR_T_64BIT guard. >
> > But even with that this looks wrong because you're OR'ing this in after
> > shifting by buf->reloc.shift. Doesn't that OR it in at the wrong offset?
> > Should you perhaps be doing this instead:
> > 
> > 	#ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT
> > 		if (buf->flags & DRM_TEGRA_SUBMIT_BUF_RELOC_BLOCKLINEAR)
> > 			iova |= BIT(39);
> > 	#endif
> > 
> > 	written_ptr = (u32)(iova >> buf->reloc_shift);
> > 
> > ?
> 
> Yes, you are of course right.. will fix this. That might explain some of the
> VIC test failures I've seen.
> 
> > 
> > Also, on a side-note: BLOCKLINEAR really isn't the right term here. I
> > recently dealt with this for display (though I haven't sent out that
> > patch yet) and this is actually a bit that selects which sector layout
> > swizzling is being applied. That's independent of block linear format
> > and I think you can have different sector layouts irrespective of the
> > block linear format (though I don't think that's usually done).
> > 
> > That said, I wonder if a better interface here would be to reuse format
> > modifiers here. That would allow us to more fully describe the format of
> > a surface in case we ever need it, and it already includes the sector
> > layout information as well.
> 
> I think having just a flag that enables or disables the swizzling is better
> -- that way it is the responsibility of the userspace, which is where all
> the engine knowledge is as well, to know for each buffer whether it wants
> swizzling or not. Now, in practice at the moment the kernel can just lookup
> the format and set the bit based on that, but e.g. if there was an engine
> that could do the swizzling natively, and we had the format modifier here,
> we'd need to have the knowledge in the kernel to decide for each chip/engine
> whether to apply the bit.

Fine, let's try it this way. I'm just slightly worried that we'll end up
duplicating a lot of the same information that we already have in the
framebuffer modifiers. We made the same mistake a long time ago with
those odd flags in the CREATE IOCTL and that turned out not to be usable
at all, and also completely insufficient.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-03-23 16:44                 ` Thierry Reding
@ 2021-03-23 17:32                   ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 17:32 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, talho, bhuntsman, dri-devel

23.03.2021 19:44, Thierry Reding пишет:
> On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
>> 23.03.2021 15:30, Thierry Reding пишет:
>>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>>>> +struct drm_tegra_submit_buf {
>>>>>>>> +    /**
>>>>>>>> +     * @mapping_id: [in]
>>>>>>>> +     *
>>>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>>>> +     */
>>>>>>>> +    __u32 mapping_id;
>>>>>>>
>>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>>>> becomes the first obstacle.
>>>>>>>
>>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>>>> particular for T20, which has a small GART.
>>>>>>>
>>>>>>> Given that the usefulness of the partial mapping feature is very
>>>>>>> questionable until it will be proven with a real userspace, we should
>>>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>>>
>>>>>>> DRM already should have everything necessary for creating and managing
>>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>>>> long time now for that.
>>>>>>>
>>>>>>> It should be fine to support the static mapping feature, but it should
>>>>>>> be done separately with the drm_mm integration, IMO.
>>>>>>>
>>>>>>> What do think?
>>>>>>>
>>>>>>
>>>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>>>> any other reasons this would not work on older chips?
>>>>>
>>>>> We have all DRM devices in a single address space on T30+, hence having
>>>>> duplicated mappings for each device should be a bit wasteful.
>>>>
>>>> I guess this should be pretty easy to change to only keep one mapping per
>>>> GEM object.
>>>
>>> The important point here is the semantics: this IOCTL establishes a
>>> mapping for a given GEM object on a given channel. If the underlying
>>> implementation is such that the mapping doesn't fit into the GART, then
>>> that's an implementation detail that the driver needs to take care of.
>>> Similarly, if multiple devices share a single address space, that's
>>> something the driver already knows and can take advantage of by simply
>>> reusing an existing mapping if one already exists. In both cases the
>>> semantics would be correctly implemented and that's really all that
>>> matters.
>>>
>>> Overall this interface seems sound from a high-level point of view and
>>> allows these mappings to be properly created even for the cases we have
>>> where each channel may have a separate address space. It may not be the
>>> optimal interface for all use-cases or any one individual case, but the
>>> very nature of these interfaces is to abstract away certain differences
>>> in order to provide a unified interface to a common programming model.
>>> So there will always be certain tradeoffs.
>>
>> For now this IOCTL isn't useful from a userspace perspective of older
>> SoCs and I'll need to add a lot of code that won't do anything useful
>> just to conform to the specific needs of the newer SoCs. Trying to unify
>> everything into a single API doesn't sound like a good idea at this
>> point and I already suggested to Mikko to try out variant with a
>> separated per-SoC code paths in the next version, then the mappings
>> could be handled separately by the T186+ paths.
> 
> I'm not sure I understand what you're saying. Obviously the underlying
> implementation of this might have to differ depending on SoC generation.
> But it sounds like you're suggesting having different UAPIs depending on
> SoC generation.

I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
we should to have anyways for preserving the older UAPI. Yes, this makes
UAPI different and I want to see how it will look like in the next
version since the current variant was sub-optimal.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-03-23 17:32                   ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 17:32 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen

23.03.2021 19:44, Thierry Reding пишет:
> On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
>> 23.03.2021 15:30, Thierry Reding пишет:
>>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
>>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
>>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
>>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
>>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
>>>>>>>> +struct drm_tegra_submit_buf {
>>>>>>>> +    /**
>>>>>>>> +     * @mapping_id: [in]
>>>>>>>> +     *
>>>>>>>> +     * Identifier of the mapping to use in the submission.
>>>>>>>> +     */
>>>>>>>> +    __u32 mapping_id;
>>>>>>>
>>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
>>>>>>> becomes the first obstacle.
>>>>>>>
>>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
>>>>>>> particular for T20, which has a small GART.
>>>>>>>
>>>>>>> Given that the usefulness of the partial mapping feature is very
>>>>>>> questionable until it will be proven with a real userspace, we should
>>>>>>> start with a dynamic mappings that are done at a time of job submission.
>>>>>>>
>>>>>>> DRM already should have everything necessary for creating and managing
>>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
>>>>>>> long time now for that.
>>>>>>>
>>>>>>> It should be fine to support the static mapping feature, but it should
>>>>>>> be done separately with the drm_mm integration, IMO.
>>>>>>>
>>>>>>> What do think?
>>>>>>>
>>>>>>
>>>>>> Can you elaborate on the requirements to be able to use GART? Are there
>>>>>> any other reasons this would not work on older chips?
>>>>>
>>>>> We have all DRM devices in a single address space on T30+, hence having
>>>>> duplicated mappings for each device should be a bit wasteful.
>>>>
>>>> I guess this should be pretty easy to change to only keep one mapping per
>>>> GEM object.
>>>
>>> The important point here is the semantics: this IOCTL establishes a
>>> mapping for a given GEM object on a given channel. If the underlying
>>> implementation is such that the mapping doesn't fit into the GART, then
>>> that's an implementation detail that the driver needs to take care of.
>>> Similarly, if multiple devices share a single address space, that's
>>> something the driver already knows and can take advantage of by simply
>>> reusing an existing mapping if one already exists. In both cases the
>>> semantics would be correctly implemented and that's really all that
>>> matters.
>>>
>>> Overall this interface seems sound from a high-level point of view and
>>> allows these mappings to be properly created even for the cases we have
>>> where each channel may have a separate address space. It may not be the
>>> optimal interface for all use-cases or any one individual case, but the
>>> very nature of these interfaces is to abstract away certain differences
>>> in order to provide a unified interface to a common programming model.
>>> So there will always be certain tradeoffs.
>>
>> For now this IOCTL isn't useful from a userspace perspective of older
>> SoCs and I'll need to add a lot of code that won't do anything useful
>> just to conform to the specific needs of the newer SoCs. Trying to unify
>> everything into a single API doesn't sound like a good idea at this
>> point and I already suggested to Mikko to try out variant with a
>> separated per-SoC code paths in the next version, then the mappings
>> could be handled separately by the T186+ paths.
> 
> I'm not sure I understand what you're saying. Obviously the underlying
> implementation of this might have to differ depending on SoC generation.
> But it sounds like you're suggesting having different UAPIs depending on
> SoC generation.

I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
we should to have anyways for preserving the older UAPI. Yes, this makes
UAPI different and I want to see how it will look like in the next
version since the current variant was sub-optimal.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
  2021-03-23 17:32                   ` Dmitry Osipenko
@ 2021-03-23 17:57                     ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 17:57 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, talho, bhuntsman, dri-devel

[-- Attachment #1: Type: text/plain, Size: 6669 bytes --]

On Tue, Mar 23, 2021 at 08:32:50PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 19:44, Thierry Reding пишет:
> > On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> >> 23.03.2021 15:30, Thierry Reding пишет:
> >>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>>>> +struct drm_tegra_submit_buf {
> >>>>>>>> +    /**
> >>>>>>>> +     * @mapping_id: [in]
> >>>>>>>> +     *
> >>>>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>>>> +     */
> >>>>>>>> +    __u32 mapping_id;
> >>>>>>>
> >>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>>>> becomes the first obstacle.
> >>>>>>>
> >>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>>>> particular for T20, which has a small GART.
> >>>>>>>
> >>>>>>> Given that the usefulness of the partial mapping feature is very
> >>>>>>> questionable until it will be proven with a real userspace, we should
> >>>>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>>>
> >>>>>>> DRM already should have everything necessary for creating and managing
> >>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>>>> long time now for that.
> >>>>>>>
> >>>>>>> It should be fine to support the static mapping feature, but it should
> >>>>>>> be done separately with the drm_mm integration, IMO.
> >>>>>>>
> >>>>>>> What do think?
> >>>>>>>
> >>>>>>
> >>>>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>>>> any other reasons this would not work on older chips?
> >>>>>
> >>>>> We have all DRM devices in a single address space on T30+, hence having
> >>>>> duplicated mappings for each device should be a bit wasteful.
> >>>>
> >>>> I guess this should be pretty easy to change to only keep one mapping per
> >>>> GEM object.
> >>>
> >>> The important point here is the semantics: this IOCTL establishes a
> >>> mapping for a given GEM object on a given channel. If the underlying
> >>> implementation is such that the mapping doesn't fit into the GART, then
> >>> that's an implementation detail that the driver needs to take care of.
> >>> Similarly, if multiple devices share a single address space, that's
> >>> something the driver already knows and can take advantage of by simply
> >>> reusing an existing mapping if one already exists. In both cases the
> >>> semantics would be correctly implemented and that's really all that
> >>> matters.
> >>>
> >>> Overall this interface seems sound from a high-level point of view and
> >>> allows these mappings to be properly created even for the cases we have
> >>> where each channel may have a separate address space. It may not be the
> >>> optimal interface for all use-cases or any one individual case, but the
> >>> very nature of these interfaces is to abstract away certain differences
> >>> in order to provide a unified interface to a common programming model.
> >>> So there will always be certain tradeoffs.
> >>
> >> For now this IOCTL isn't useful from a userspace perspective of older
> >> SoCs and I'll need to add a lot of code that won't do anything useful
> >> just to conform to the specific needs of the newer SoCs. Trying to unify
> >> everything into a single API doesn't sound like a good idea at this
> >> point and I already suggested to Mikko to try out variant with a
> >> separated per-SoC code paths in the next version, then the mappings
> >> could be handled separately by the T186+ paths.
> > 
> > I'm not sure I understand what you're saying. Obviously the underlying
> > implementation of this might have to differ depending on SoC generation.
> > But it sounds like you're suggesting having different UAPIs depending on
> > SoC generation.
> 
> I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
> we should to have anyways for preserving the older UAPI. Yes, this makes
> UAPI different and I want to see how it will look like in the next
> version since the current variant was sub-optimal.

What exactly is sub-optimal about the current variant? And what would an
alternative look like? Like what we have in the old ABI where we pass in
GEM handles directly during submissions?

I can see how this new variant would be a bit more work than the
alternative, but even on older SoCs, wouldn't the explicit mapping be
much better for performance than having to constantly remap GEM objects
for every job?

In general I don't think it's useful to have separate UAPIs for what's
basically the same hardware. I mean from a high-level point of view what
we need to do for each job remains exactly the same whether the job is
executed on Tegra20 or Tegra194. We have to map a bunch of buffers so
that they can be accessed by hardware and then we have a command stream
that references the mappings and does something to the memory that they
represent. The only thing that's different between the SoC generations
is how these mappings are created.

The difference between the old UABI and this is that we create mappings
upfront, and I'm not sure I understand how that could be suboptimal. If
anything it should increase the efficiency of job submissions by
reducing per-submit overhead. It should be fairly easy to compare this
in terms of performance with implicit mappings by running tests against
the old UABI and the new one. If there's a significant impact we may
need to take a closer look.

Yes, this will require a bit of work in userspace to adapt to these
changes, but those are a one-time cost, so I don't think it's wise to
ignore potential performance improvements because we don't want to
update userspace.

In either case, I don't think we're quite done yet. There's still a bit
of work left to do on the userspace side to get a couple of use-cases up
with this new UABI and it's not entirely clear yet what the results will
be. However, we have to move forward somehow or this will end up being
yet another attempt that didn't go anywhere. We were in a similar place
a few years back and I vividly remember how frustrating that was for me
personally to spend all of that time working through this stuff and then
seeing it all go to waste.

So can we please keep going for a little longer while there's still
momentum?

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 15/21] drm/tegra: Add new UAPI to header
@ 2021-03-23 17:57                     ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 17:57 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 6669 bytes --]

On Tue, Mar 23, 2021 at 08:32:50PM +0300, Dmitry Osipenko wrote:
> 23.03.2021 19:44, Thierry Reding пишет:
> > On Tue, Mar 23, 2021 at 05:00:30PM +0300, Dmitry Osipenko wrote:
> >> 23.03.2021 15:30, Thierry Reding пишет:
> >>> On Thu, Jan 14, 2021 at 12:34:22PM +0200, Mikko Perttunen wrote:
> >>>> On 1/14/21 10:36 AM, Dmitry Osipenko wrote:
> >>>>> 13.01.2021 21:56, Mikko Perttunen пишет:
> >>>>>> On 1/13/21 8:14 PM, Dmitry Osipenko wrote:
> >>>>>>> 11.01.2021 16:00, Mikko Perttunen пишет:
> >>>>>>>> +struct drm_tegra_submit_buf {
> >>>>>>>> +    /**
> >>>>>>>> +     * @mapping_id: [in]
> >>>>>>>> +     *
> >>>>>>>> +     * Identifier of the mapping to use in the submission.
> >>>>>>>> +     */
> >>>>>>>> +    __u32 mapping_id;
> >>>>>>>
> >>>>>>> I'm now in process of trying out the UAPI using grate drivers and this
> >>>>>>> becomes the first obstacle.
> >>>>>>>
> >>>>>>> Looks like this is not going to work well for older Tegra SoCs, in
> >>>>>>> particular for T20, which has a small GART.
> >>>>>>>
> >>>>>>> Given that the usefulness of the partial mapping feature is very
> >>>>>>> questionable until it will be proven with a real userspace, we should
> >>>>>>> start with a dynamic mappings that are done at a time of job submission.
> >>>>>>>
> >>>>>>> DRM already should have everything necessary for creating and managing
> >>>>>>> caches of mappings, grate kernel driver has been using drm_mm_scan for a
> >>>>>>> long time now for that.
> >>>>>>>
> >>>>>>> It should be fine to support the static mapping feature, but it should
> >>>>>>> be done separately with the drm_mm integration, IMO.
> >>>>>>>
> >>>>>>> What do think?
> >>>>>>>
> >>>>>>
> >>>>>> Can you elaborate on the requirements to be able to use GART? Are there
> >>>>>> any other reasons this would not work on older chips?
> >>>>>
> >>>>> We have all DRM devices in a single address space on T30+, hence having
> >>>>> duplicated mappings for each device should be a bit wasteful.
> >>>>
> >>>> I guess this should be pretty easy to change to only keep one mapping per
> >>>> GEM object.
> >>>
> >>> The important point here is the semantics: this IOCTL establishes a
> >>> mapping for a given GEM object on a given channel. If the underlying
> >>> implementation is such that the mapping doesn't fit into the GART, then
> >>> that's an implementation detail that the driver needs to take care of.
> >>> Similarly, if multiple devices share a single address space, that's
> >>> something the driver already knows and can take advantage of by simply
> >>> reusing an existing mapping if one already exists. In both cases the
> >>> semantics would be correctly implemented and that's really all that
> >>> matters.
> >>>
> >>> Overall this interface seems sound from a high-level point of view and
> >>> allows these mappings to be properly created even for the cases we have
> >>> where each channel may have a separate address space. It may not be the
> >>> optimal interface for all use-cases or any one individual case, but the
> >>> very nature of these interfaces is to abstract away certain differences
> >>> in order to provide a unified interface to a common programming model.
> >>> So there will always be certain tradeoffs.
> >>
> >> For now this IOCTL isn't useful from a userspace perspective of older
> >> SoCs and I'll need to add a lot of code that won't do anything useful
> >> just to conform to the specific needs of the newer SoCs. Trying to unify
> >> everything into a single API doesn't sound like a good idea at this
> >> point and I already suggested to Mikko to try out variant with a
> >> separated per-SoC code paths in the next version, then the mappings
> >> could be handled separately by the T186+ paths.
> > 
> > I'm not sure I understand what you're saying. Obviously the underlying
> > implementation of this might have to differ depending on SoC generation.
> > But it sounds like you're suggesting having different UAPIs depending on
> > SoC generation.
> 
> I suggested that this IOCTL shouldn't be mandatory for older SoCs, which
> we should to have anyways for preserving the older UAPI. Yes, this makes
> UAPI different and I want to see how it will look like in the next
> version since the current variant was sub-optimal.

What exactly is sub-optimal about the current variant? And what would an
alternative look like? Like what we have in the old ABI where we pass in
GEM handles directly during submissions?

I can see how this new variant would be a bit more work than the
alternative, but even on older SoCs, wouldn't the explicit mapping be
much better for performance than having to constantly remap GEM objects
for every job?

In general I don't think it's useful to have separate UAPIs for what's
basically the same hardware. I mean from a high-level point of view what
we need to do for each job remains exactly the same whether the job is
executed on Tegra20 or Tegra194. We have to map a bunch of buffers so
that they can be accessed by hardware and then we have a command stream
that references the mappings and does something to the memory that they
represent. The only thing that's different between the SoC generations
is how these mappings are created.

The difference between the old UABI and this is that we create mappings
upfront, and I'm not sure I understand how that could be suboptimal. If
anything it should increase the efficiency of job submissions by
reducing per-submit overhead. It should be fairly easy to compare this
in terms of performance with implicit mappings by running tests against
the old UABI and the new one. If there's a significant impact we may
need to take a closer look.

Yes, this will require a bit of work in userspace to adapt to these
changes, but those are a one-time cost, so I don't think it's wise to
ignore potential performance improvements because we don't want to
update userspace.

In either case, I don't think we're quite done yet. There's still a bit
of work left to do on the userspace side to get a couple of use-cases up
with this new UABI and it's not entirely clear yet what the results will
be. However, we have to move forward somehow or this will end up being
yet another attempt that didn't go anywhere. We were in a similar place
a few years back and I vividly remember how frustrating that was for me
personally to spend all of that time working through this stuff and then
seeing it all go to waste.

So can we please keep going for a little longer while there's still
momentum?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-02-27 11:19                 ` Dmitry Osipenko
@ 2021-03-23 18:21                   ` Thierry Reding
  -1 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 18:21 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

[-- Attachment #1: Type: text/plain, Size: 3509 bytes --]

On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
> 03.02.2021 14:18, Mikko Perttunen пишет:
> ...
> >> I'll need more time to think about it.
> >>
> > 
> > How about something like this:
> > 
> > Turn the syncpt_incr field back into an array of structs like
> > 
> > #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
> > #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
> > 
> > struct drm_tegra_submit_syncpt_incr {
> >     /* can be left as zero if using dynamic syncpt */
> >     __u32 syncpt_id;
> >     __u32 flags;
> > 
> >     struct {
> >         __u32 syncobj;
> >         __u32 value;
> >     } fence;
> > 
> >     /* patch word as such:
> >          * *word = *word | (syncpt_id << shift)
> >          */
> >     struct {
> >         __u32 gather_offset_words;
> >         __u32 shift;
> >     } patch;
> > };
> > 
> > So this will work similarly to the buffer reloc system; the kernel
> > driver will allocate a job syncpoint and patch in the syncpoint ID if
> > requested, and allows outputting syncobjs for each increment.
> 
> I haven't got any great ideas so far, but it feels that will be easier
> and cleaner if we could have separate job paths (and job IOCTLS) based
> on hardware generation since the workloads a too different. The needs of
> a newer h/w are too obscure for me and absence of userspace code,
> firmware sources and full h/w documentation do not help.
> 
> There still should be quite a lot to share, but things like
> mapping-to-channel and VM sync points are too far away from older h/w,
> IMO. This means that code path before drm-sched and path for job-timeout
> handling should be separate.
> 
> Maybe later on it will become cleaner that we actually could unify it
> all nicely, but for now it doesn't look like a good idea to me.

Sorry for jumping in rather randomly here and elsewhere, but it's been a
long time since the discussion and I just want to share my thoughts on a
couple of topics in order to hopefully help move this forward somehow.

For UAPI, "unifying it later" doesn't really work. So I think the only
realistic option is to make a best attempt at getting the UABI right so
that it works for all existing use-cases and ideally perhaps even as of
yet unknown use-cases in the future. As with all APIs this means that
there's going to be a need to abstract away some of the hardware details
so that we don't have to deal with too many low-level details in
userspace, because otherwise the UAPI is just going to be a useless
mess.

I think a proposal such as the above to allow both implicit and explicit
syncpoints makes sense. For what it's worth, it's fairly similar to what
we had come up with last time we tried destaging the ABI, although back
at the time I'm not sure we had even considered explicit syncpoint usage
yet. I think where reasonably possible this kind of optional behaviour
is acceptable, but I don't think having two completely separate paths is
going to help in any way. If anything it's just going to make it more
difficult to maintain the code and get it to a usable state in the first
place.

Like I said elsewhere, the programming model for host1x hasn't changed
since Tegra20. It's rather evolved and gained a couple more features,
but that doesn't change anything about how userspace uses it.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-03-23 18:21                   ` Thierry Reding
  0 siblings, 0 replies; 195+ messages in thread
From: Thierry Reding @ 2021-03-23 18:21 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 3509 bytes --]

On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
> 03.02.2021 14:18, Mikko Perttunen пишет:
> ...
> >> I'll need more time to think about it.
> >>
> > 
> > How about something like this:
> > 
> > Turn the syncpt_incr field back into an array of structs like
> > 
> > #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
> > #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
> > 
> > struct drm_tegra_submit_syncpt_incr {
> >     /* can be left as zero if using dynamic syncpt */
> >     __u32 syncpt_id;
> >     __u32 flags;
> > 
> >     struct {
> >         __u32 syncobj;
> >         __u32 value;
> >     } fence;
> > 
> >     /* patch word as such:
> >          * *word = *word | (syncpt_id << shift)
> >          */
> >     struct {
> >         __u32 gather_offset_words;
> >         __u32 shift;
> >     } patch;
> > };
> > 
> > So this will work similarly to the buffer reloc system; the kernel
> > driver will allocate a job syncpoint and patch in the syncpoint ID if
> > requested, and allows outputting syncobjs for each increment.
> 
> I haven't got any great ideas so far, but it feels that will be easier
> and cleaner if we could have separate job paths (and job IOCTLS) based
> on hardware generation since the workloads a too different. The needs of
> a newer h/w are too obscure for me and absence of userspace code,
> firmware sources and full h/w documentation do not help.
> 
> There still should be quite a lot to share, but things like
> mapping-to-channel and VM sync points are too far away from older h/w,
> IMO. This means that code path before drm-sched and path for job-timeout
> handling should be separate.
> 
> Maybe later on it will become cleaner that we actually could unify it
> all nicely, but for now it doesn't look like a good idea to me.

Sorry for jumping in rather randomly here and elsewhere, but it's been a
long time since the discussion and I just want to share my thoughts on a
couple of topics in order to hopefully help move this forward somehow.

For UAPI, "unifying it later" doesn't really work. So I think the only
realistic option is to make a best attempt at getting the UABI right so
that it works for all existing use-cases and ideally perhaps even as of
yet unknown use-cases in the future. As with all APIs this means that
there's going to be a need to abstract away some of the hardware details
so that we don't have to deal with too many low-level details in
userspace, because otherwise the UAPI is just going to be a useless
mess.

I think a proposal such as the above to allow both implicit and explicit
syncpoints makes sense. For what it's worth, it's fairly similar to what
we had come up with last time we tried destaging the ABI, although back
at the time I'm not sure we had even considered explicit syncpoint usage
yet. I think where reasonably possible this kind of optional behaviour
is acceptable, but I don't think having two completely separate paths is
going to help in any way. If anything it's just going to make it more
difficult to maintain the code and get it to a usable state in the first
place.

Like I said elsewhere, the programming model for host1x hasn't changed
since Tegra20. It's rather evolved and gained a couple more features,
but that doesn't change anything about how userspace uses it.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-03-23 18:21                   ` Thierry Reding
@ 2021-03-23 19:57                     ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 19:57 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

23.03.2021 21:21, Thierry Reding пишет:
> On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
>> 03.02.2021 14:18, Mikko Perttunen пишет:
>> ...
>>>> I'll need more time to think about it.
>>>>
>>>
>>> How about something like this:
>>>
>>> Turn the syncpt_incr field back into an array of structs like
>>>
>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>>
>>> struct drm_tegra_submit_syncpt_incr {
>>>     /* can be left as zero if using dynamic syncpt */
>>>     __u32 syncpt_id;
>>>     __u32 flags;
>>>
>>>     struct {
>>>         __u32 syncobj;
>>>         __u32 value;
>>>     } fence;
>>>
>>>     /* patch word as such:
>>>          * *word = *word | (syncpt_id << shift)
>>>          */
>>>     struct {
>>>         __u32 gather_offset_words;
>>>         __u32 shift;
>>>     } patch;
>>> };
>>>
>>> So this will work similarly to the buffer reloc system; the kernel
>>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>>> requested, and allows outputting syncobjs for each increment.
>>
>> I haven't got any great ideas so far, but it feels that will be easier
>> and cleaner if we could have separate job paths (and job IOCTLS) based
>> on hardware generation since the workloads a too different. The needs of
>> a newer h/w are too obscure for me and absence of userspace code,
>> firmware sources and full h/w documentation do not help.
>>
>> There still should be quite a lot to share, but things like
>> mapping-to-channel and VM sync points are too far away from older h/w,
>> IMO. This means that code path before drm-sched and path for job-timeout
>> handling should be separate.
>>
>> Maybe later on it will become cleaner that we actually could unify it
>> all nicely, but for now it doesn't look like a good idea to me.
> 
> Sorry for jumping in rather randomly here and elsewhere, but it's been a
> long time since the discussion and I just want to share my thoughts on a
> couple of topics in order to hopefully help move this forward somehow.
> 
> For UAPI, "unifying it later" doesn't really work.

Of course I meant a "later version of this series" :) Sorry for not
making it clear.

> So I think the only
> realistic option is to make a best attempt at getting the UABI right so
> that it works for all existing use-cases and ideally perhaps even as of
> yet unknown use-cases in the future. As with all APIs this means that
> there's going to be a need to abstract away some of the hardware details
> so that we don't have to deal with too many low-level details in
> userspace, because otherwise the UAPI is just going to be a useless
> mess.
> 
> I think a proposal such as the above to allow both implicit and explicit
> syncpoints makes sense. For what it's worth, it's fairly similar to what
> we had come up with last time we tried destaging the ABI, although back
> at the time I'm not sure we had even considered explicit syncpoint usage
> yet. I think where reasonably possible this kind of optional behaviour
> is acceptable, but I don't think having two completely separate paths is
> going to help in any way. If anything it's just going to make it more
> difficult to maintain the code and get it to a usable state in the first
> place.
> 
> Like I said elsewhere, the programming model for host1x hasn't changed
> since Tegra20. It's rather evolved and gained a couple more features,
> but that doesn't change anything about how userspace uses it.

Not having a complete control over sync points state is a radical
change, IMO. I prefer not to use this legacy and error-prone way of sync
points handling at least for older h/w where it's possible to do. This
is what downstream driver did 10 years ago and what it still continues
to do. I was very happy to move away from this unnecessary complication
in the experimental grate-kernel driver and I think this will be great
to do in the mainline as well.

The need to map buffers explicitly is also a big difference. The need to
map BO for each channel is a quite big over-complication as we already
found out in the current version of UAPI.

Alright, perhaps the mapping could improved for older userspace if we
will move away from a per-channel contexts to a single DRM context like
I already was suggesting to Mikko before. I.e. instead of mapping BO "to
a channel", we will need to map BO "to h/w clients" within the DRM
context. This should allow older userspace to create a single mapping
for all channels/clients using a single IOCTL and then to have a single
"mapping handle" to care about. Objections?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-03-23 19:57                     ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 19:57 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen

23.03.2021 21:21, Thierry Reding пишет:
> On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
>> 03.02.2021 14:18, Mikko Perttunen пишет:
>> ...
>>>> I'll need more time to think about it.
>>>>
>>>
>>> How about something like this:
>>>
>>> Turn the syncpt_incr field back into an array of structs like
>>>
>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>>
>>> struct drm_tegra_submit_syncpt_incr {
>>>     /* can be left as zero if using dynamic syncpt */
>>>     __u32 syncpt_id;
>>>     __u32 flags;
>>>
>>>     struct {
>>>         __u32 syncobj;
>>>         __u32 value;
>>>     } fence;
>>>
>>>     /* patch word as such:
>>>          * *word = *word | (syncpt_id << shift)
>>>          */
>>>     struct {
>>>         __u32 gather_offset_words;
>>>         __u32 shift;
>>>     } patch;
>>> };
>>>
>>> So this will work similarly to the buffer reloc system; the kernel
>>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>>> requested, and allows outputting syncobjs for each increment.
>>
>> I haven't got any great ideas so far, but it feels that will be easier
>> and cleaner if we could have separate job paths (and job IOCTLS) based
>> on hardware generation since the workloads a too different. The needs of
>> a newer h/w are too obscure for me and absence of userspace code,
>> firmware sources and full h/w documentation do not help.
>>
>> There still should be quite a lot to share, but things like
>> mapping-to-channel and VM sync points are too far away from older h/w,
>> IMO. This means that code path before drm-sched and path for job-timeout
>> handling should be separate.
>>
>> Maybe later on it will become cleaner that we actually could unify it
>> all nicely, but for now it doesn't look like a good idea to me.
> 
> Sorry for jumping in rather randomly here and elsewhere, but it's been a
> long time since the discussion and I just want to share my thoughts on a
> couple of topics in order to hopefully help move this forward somehow.
> 
> For UAPI, "unifying it later" doesn't really work.

Of course I meant a "later version of this series" :) Sorry for not
making it clear.

> So I think the only
> realistic option is to make a best attempt at getting the UABI right so
> that it works for all existing use-cases and ideally perhaps even as of
> yet unknown use-cases in the future. As with all APIs this means that
> there's going to be a need to abstract away some of the hardware details
> so that we don't have to deal with too many low-level details in
> userspace, because otherwise the UAPI is just going to be a useless
> mess.
> 
> I think a proposal such as the above to allow both implicit and explicit
> syncpoints makes sense. For what it's worth, it's fairly similar to what
> we had come up with last time we tried destaging the ABI, although back
> at the time I'm not sure we had even considered explicit syncpoint usage
> yet. I think where reasonably possible this kind of optional behaviour
> is acceptable, but I don't think having two completely separate paths is
> going to help in any way. If anything it's just going to make it more
> difficult to maintain the code and get it to a usable state in the first
> place.
> 
> Like I said elsewhere, the programming model for host1x hasn't changed
> since Tegra20. It's rather evolved and gained a couple more features,
> but that doesn't change anything about how userspace uses it.

Not having a complete control over sync points state is a radical
change, IMO. I prefer not to use this legacy and error-prone way of sync
points handling at least for older h/w where it's possible to do. This
is what downstream driver did 10 years ago and what it still continues
to do. I was very happy to move away from this unnecessary complication
in the experimental grate-kernel driver and I think this will be great
to do in the mainline as well.

The need to map buffers explicitly is also a big difference. The need to
map BO for each channel is a quite big over-complication as we already
found out in the current version of UAPI.

Alright, perhaps the mapping could improved for older userspace if we
will move away from a per-channel contexts to a single DRM context like
I already was suggesting to Mikko before. I.e. instead of mapping BO "to
a channel", we will need to map BO "to h/w clients" within the DRM
context. This should allow older userspace to create a single mapping
for all channels/clients using a single IOCTL and then to have a single
"mapping handle" to care about. Objections?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
  2021-03-23 19:57                     ` Dmitry Osipenko
@ 2021-03-23 20:13                       ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 20:13 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

23.03.2021 22:57, Dmitry Osipenko пишет:
> 23.03.2021 21:21, Thierry Reding пишет:
>> On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
>>> 03.02.2021 14:18, Mikko Perttunen пишет:
>>> ...
>>>>> I'll need more time to think about it.
>>>>>
>>>>
>>>> How about something like this:
>>>>
>>>> Turn the syncpt_incr field back into an array of structs like
>>>>
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>>>
>>>> struct drm_tegra_submit_syncpt_incr {
>>>>     /* can be left as zero if using dynamic syncpt */
>>>>     __u32 syncpt_id;
>>>>     __u32 flags;
>>>>
>>>>     struct {
>>>>         __u32 syncobj;
>>>>         __u32 value;
>>>>     } fence;
>>>>
>>>>     /* patch word as such:
>>>>          * *word = *word | (syncpt_id << shift)
>>>>          */
>>>>     struct {
>>>>         __u32 gather_offset_words;
>>>>         __u32 shift;
>>>>     } patch;
>>>> };
>>>>
>>>> So this will work similarly to the buffer reloc system; the kernel
>>>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>>>> requested, and allows outputting syncobjs for each increment.
>>>
>>> I haven't got any great ideas so far, but it feels that will be easier
>>> and cleaner if we could have separate job paths (and job IOCTLS) based
>>> on hardware generation since the workloads a too different. The needs of
>>> a newer h/w are too obscure for me and absence of userspace code,
>>> firmware sources and full h/w documentation do not help.
>>>
>>> There still should be quite a lot to share, but things like
>>> mapping-to-channel and VM sync points are too far away from older h/w,
>>> IMO. This means that code path before drm-sched and path for job-timeout
>>> handling should be separate.
>>>
>>> Maybe later on it will become cleaner that we actually could unify it
>>> all nicely, but for now it doesn't look like a good idea to me.
>>
>> Sorry for jumping in rather randomly here and elsewhere, but it's been a
>> long time since the discussion and I just want to share my thoughts on a
>> couple of topics in order to hopefully help move this forward somehow.
>>
>> For UAPI, "unifying it later" doesn't really work.
> 
> Of course I meant a "later version of this series" :) Sorry for not
> making it clear.
> 
>> So I think the only
>> realistic option is to make a best attempt at getting the UABI right so
>> that it works for all existing use-cases and ideally perhaps even as of
>> yet unknown use-cases in the future. As with all APIs this means that
>> there's going to be a need to abstract away some of the hardware details
>> so that we don't have to deal with too many low-level details in
>> userspace, because otherwise the UAPI is just going to be a useless
>> mess.
>>
>> I think a proposal such as the above to allow both implicit and explicit
>> syncpoints makes sense. For what it's worth, it's fairly similar to what
>> we had come up with last time we tried destaging the ABI, although back
>> at the time I'm not sure we had even considered explicit syncpoint usage
>> yet. I think where reasonably possible this kind of optional behaviour
>> is acceptable, but I don't think having two completely separate paths is
>> going to help in any way. If anything it's just going to make it more
>> difficult to maintain the code and get it to a usable state in the first
>> place.
>>
>> Like I said elsewhere, the programming model for host1x hasn't changed
>> since Tegra20. It's rather evolved and gained a couple more features,
>> but that doesn't change anything about how userspace uses it.
> 
> Not having a complete control over sync points state is a radical
> change, IMO. I prefer not to use this legacy and error-prone way of sync
> points handling at least for older h/w where it's possible to do. This
> is what downstream driver did 10 years ago and what it still continues
> to do. I was very happy to move away from this unnecessary complication
> in the experimental grate-kernel driver and I think this will be great
> to do in the mainline as well.
> 
> The need to map buffers explicitly is also a big difference. The need to
> map BO for each channel is a quite big over-complication as we already
> found out in the current version of UAPI.
> 
> Alright, perhaps the mapping could improved for older userspace if we
> will move away from a per-channel contexts to a single DRM context like
> I already was suggesting to Mikko before. I.e. instead of mapping BO "to
> a channel", we will need to map BO "to h/w clients" within the DRM
> context. This should allow older userspace to create a single mapping
> for all channels/clients using a single IOCTL and then to have a single
> "mapping handle" to care about. Objections?
> 

I just recalled that Mikko didn't like *not* to have the per-channel
contexts, saying that it should be good to have for later SoCs.

We could have a DRM context and channel contexts within the DRM context.
Then the mapping will need to be done to the channel contexts of the DRM
context, this should allow us to retain the per-channel contexts and
have a single mapping for older userspace.

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs
@ 2021-03-23 20:13                       ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-23 20:13 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	linux-tegra, Mikko Perttunen

23.03.2021 22:57, Dmitry Osipenko пишет:
> 23.03.2021 21:21, Thierry Reding пишет:
>> On Sat, Feb 27, 2021 at 02:19:39PM +0300, Dmitry Osipenko wrote:
>>> 03.02.2021 14:18, Mikko Perttunen пишет:
>>> ...
>>>>> I'll need more time to think about it.
>>>>>
>>>>
>>>> How about something like this:
>>>>
>>>> Turn the syncpt_incr field back into an array of structs like
>>>>
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_REPLACE_SYNCOBJ        (1<<0)
>>>> #define DRM_TEGRA_SUBMIT_SYNCPT_INCR_PATCH_DYNAMIC_SYNCPT    (1<<1)
>>>>
>>>> struct drm_tegra_submit_syncpt_incr {
>>>>     /* can be left as zero if using dynamic syncpt */
>>>>     __u32 syncpt_id;
>>>>     __u32 flags;
>>>>
>>>>     struct {
>>>>         __u32 syncobj;
>>>>         __u32 value;
>>>>     } fence;
>>>>
>>>>     /* patch word as such:
>>>>          * *word = *word | (syncpt_id << shift)
>>>>          */
>>>>     struct {
>>>>         __u32 gather_offset_words;
>>>>         __u32 shift;
>>>>     } patch;
>>>> };
>>>>
>>>> So this will work similarly to the buffer reloc system; the kernel
>>>> driver will allocate a job syncpoint and patch in the syncpoint ID if
>>>> requested, and allows outputting syncobjs for each increment.
>>>
>>> I haven't got any great ideas so far, but it feels that will be easier
>>> and cleaner if we could have separate job paths (and job IOCTLS) based
>>> on hardware generation since the workloads a too different. The needs of
>>> a newer h/w are too obscure for me and absence of userspace code,
>>> firmware sources and full h/w documentation do not help.
>>>
>>> There still should be quite a lot to share, but things like
>>> mapping-to-channel and VM sync points are too far away from older h/w,
>>> IMO. This means that code path before drm-sched and path for job-timeout
>>> handling should be separate.
>>>
>>> Maybe later on it will become cleaner that we actually could unify it
>>> all nicely, but for now it doesn't look like a good idea to me.
>>
>> Sorry for jumping in rather randomly here and elsewhere, but it's been a
>> long time since the discussion and I just want to share my thoughts on a
>> couple of topics in order to hopefully help move this forward somehow.
>>
>> For UAPI, "unifying it later" doesn't really work.
> 
> Of course I meant a "later version of this series" :) Sorry for not
> making it clear.
> 
>> So I think the only
>> realistic option is to make a best attempt at getting the UABI right so
>> that it works for all existing use-cases and ideally perhaps even as of
>> yet unknown use-cases in the future. As with all APIs this means that
>> there's going to be a need to abstract away some of the hardware details
>> so that we don't have to deal with too many low-level details in
>> userspace, because otherwise the UAPI is just going to be a useless
>> mess.
>>
>> I think a proposal such as the above to allow both implicit and explicit
>> syncpoints makes sense. For what it's worth, it's fairly similar to what
>> we had come up with last time we tried destaging the ABI, although back
>> at the time I'm not sure we had even considered explicit syncpoint usage
>> yet. I think where reasonably possible this kind of optional behaviour
>> is acceptable, but I don't think having two completely separate paths is
>> going to help in any way. If anything it's just going to make it more
>> difficult to maintain the code and get it to a usable state in the first
>> place.
>>
>> Like I said elsewhere, the programming model for host1x hasn't changed
>> since Tegra20. It's rather evolved and gained a couple more features,
>> but that doesn't change anything about how userspace uses it.
> 
> Not having a complete control over sync points state is a radical
> change, IMO. I prefer not to use this legacy and error-prone way of sync
> points handling at least for older h/w where it's possible to do. This
> is what downstream driver did 10 years ago and what it still continues
> to do. I was very happy to move away from this unnecessary complication
> in the experimental grate-kernel driver and I think this will be great
> to do in the mainline as well.
> 
> The need to map buffers explicitly is also a big difference. The need to
> map BO for each channel is a quite big over-complication as we already
> found out in the current version of UAPI.
> 
> Alright, perhaps the mapping could improved for older userspace if we
> will move away from a per-channel contexts to a single DRM context like
> I already was suggesting to Mikko before. I.e. instead of mapping BO "to
> a channel", we will need to map BO "to h/w clients" within the DRM
> context. This should allow older userspace to create a single mapping
> for all channels/clients using a single IOCTL and then to have a single
> "mapping handle" to care about. Objections?
> 

I just recalled that Mikko didn't like *not* to have the per-channel
contexts, saying that it should be good to have for later SoCs.

We could have a DRM context and channel contexts within the DRM context.
Then the mapping will need to be done to the channel contexts of the DRM
context, this should allow us to retain the per-channel contexts and
have a single mapping for older userspace.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-03-23 10:16     ` Thierry Reding
@ 2021-03-26 14:34       ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 14:34 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: jonathanh, digetx, airlied, daniel, linux-tegra, dri-devel,
	talho, bhuntsman

On 3/23/21 12:16 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>> Show the number of pending waiters in the debugfs status file.
>> This is useful for testing to verify that waiters do not leak
>> or accumulate incorrectly.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>> index 1b4997bda1c7..8a14880c61bb 100644
>> --- a/drivers/gpu/host1x/debug.c
>> +++ b/drivers/gpu/host1x/debug.c
>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>   
>>   static void show_syncpts(struct host1x *m, struct output *o)
>>   {
>> +	struct list_head *pos;
>>   	unsigned int i;
>>   
>>   	host1x_debug_output(o, "---- syncpts ----\n");
>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
>> +		unsigned int waiters = 0;
>>   
>> -		if (!min && !max)
>> +		spin_lock(&m->syncpt[i].intr.lock);
>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>> +			waiters++;
>> +		spin_unlock(&m->syncpt[i].intr.lock);
> 
> Would it make sense to keep a running count so that we don't have to
> compute it here?

Considering this is just a debug facility, I think I prefer not adding a 
new field just for it.

> 
>> +
>> +		if (!min && !max && !waiters)
>>   			continue;
>>   
>> -		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
>> -				    i, m->syncpt[i].name, min, max);
>> +		host1x_debug_output(o,
>> +				    "id %u (%s) min %d max %d (%d waiters)\n",
>> +				    i, m->syncpt[i].name, min, max, waiters);
> 
> Or alternatively, would it be useful to collect a bit more information
> about waiters so that when they leak we get a better understanding of
> which ones leak?
> 
> It doesn't look like we currently have much information in struct
> host1x_waitlist to identify waiters, but perhaps that can be extended?

I added this patch mainly for use with integration tests, so they can 
verify no waiters leaked in negative tests. I think let's put off adding 
other information until there's some need for it.

Mikko

> 
> Thierry
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-03-26 14:34       ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 14:34 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra, digetx

On 3/23/21 12:16 PM, Thierry Reding wrote:
> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>> Show the number of pending waiters in the debugfs status file.
>> This is useful for testing to verify that waiters do not leak
>> or accumulate incorrectly.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>> index 1b4997bda1c7..8a14880c61bb 100644
>> --- a/drivers/gpu/host1x/debug.c
>> +++ b/drivers/gpu/host1x/debug.c
>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>   
>>   static void show_syncpts(struct host1x *m, struct output *o)
>>   {
>> +	struct list_head *pos;
>>   	unsigned int i;
>>   
>>   	host1x_debug_output(o, "---- syncpts ----\n");
>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
>> +		unsigned int waiters = 0;
>>   
>> -		if (!min && !max)
>> +		spin_lock(&m->syncpt[i].intr.lock);
>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>> +			waiters++;
>> +		spin_unlock(&m->syncpt[i].intr.lock);
> 
> Would it make sense to keep a running count so that we don't have to
> compute it here?

Considering this is just a debug facility, I think I prefer not adding a 
new field just for it.

> 
>> +
>> +		if (!min && !max && !waiters)
>>   			continue;
>>   
>> -		host1x_debug_output(o, "id %u (%s) min %d max %d\n",
>> -				    i, m->syncpt[i].name, min, max);
>> +		host1x_debug_output(o,
>> +				    "id %u (%s) min %d max %d (%d waiters)\n",
>> +				    i, m->syncpt[i].name, min, max, waiters);
> 
> Or alternatively, would it be useful to collect a bit more information
> about waiters so that when they leak we get a better understanding of
> which ones leak?
> 
> It doesn't look like we currently have much information in struct
> host1x_waitlist to identify waiters, but perhaps that can be extended?

I added this patch mainly for use with integration tests, so they can 
verify no waiters leaked in negative tests. I think let's put off adding 
other information until there's some need for it.

Mikko

> 
> Thierry
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-22 15:19         ` Mikko Perttunen
@ 2021-03-26 14:54           ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 14:54 UTC (permalink / raw)
  To: Mikko Perttunen, Dmitry Osipenko, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

On 3/22/21 5:19 PM, Mikko Perttunen wrote:
> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>> 22.03.2021 17:46, Thierry Reding пишет:
>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>> To avoid false lockdep warnings, give each client lock a different
>>>> lock class, passed from the initialization site by macro.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>> --- a/drivers/gpu/host1x/bus.c
>>>> +++ b/drivers/gpu/host1x/bus.c
>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>    * device and call host1x_device_init(), which will in turn call 
>>>> each client's
>>>>    * &host1x_client_ops.init implementation.
>>>>    */
>>>> -int host1x_client_register(struct host1x_client *client)
>>>> +int __host1x_client_register(struct host1x_client *client,
>>>> +               struct lock_class_key *key)
>>>
>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>> out of date.
>>>
>>>>   {
>>>>       struct host1x *host1x;
>>>>       int err;
>>>>       INIT_LIST_HEAD(&client->list);
>>>> -    mutex_init(&client->lock);
>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>
>>> Should we maybe attempt to make this unique? Could we use something like
>>> dev_name(client->dev) for this?
>>
>> I'm curious who the lockdep warning could be triggered at all, I don't
>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>> the warning?
>>
> 
> This is pretty difficult to read but I guess it's some interaction 
> related to the delayed initialization of host1x clients? In any case, I 
> consistently get it at boot (though it may be triggered by vic probe 
> instead of nvdec).
> 
> I'll fix the kbuild robot warnings and see if I can add a 
> client-specific lock name for v6.

Lockdep doesn't seem to be liking dev_name() for the name, and I think 
allocating a string for this purpose seems a bit overkill, so I'll keep 
the lock name as is if there are no objections.

Mikko

> 
> Mikko
> 
> [   38.128257] WARNING: possible recursive locking detected
> [   38.133567] 5.11.0-rc2-next-20210108+ #102 Tainted: G S
> [   38.140089] --------------------------------------------
> [   38.145395] systemd-udevd/239 is trying to acquire lock:
> [   38.150703] ffff0000997aa218 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.160142]
> [   38.160142] but task is already holding lock:
> [   38.165968] ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.175398]
> [   38.175398] other info that might help us debug this:
> [   38.181918]  Possible unsafe locking scenario:
> [   38.181918]
> [   38.187830]        CPU0
> [   38.190275]        ----
> [   38.192719]   lock(&client->lock);
> [   38.196129]   lock(&client->lock);
> [   38.199537]
> [   38.199537]  *** DEADLOCK ***
> [   38.199537]
> [   38.205449]  May be due to missing lock nesting notation
> [   38.205449]
> [   38.212228] 6 locks held by systemd-udevd/239:
> [   38.216669]  #0: ffff00009261c188 (&dev->mutex){....}-{3:3}, at: 
> device_driver_attach+0x60/0x130
> [   38.225487]  #1: ffff800009a17168 (devices_lock){+.+.}-{3:3}, at: 
> host1x_client_register+0x7c/0x220 [host1x]
> [   38.235441]  #2: ffff000083f94bb8 (&host->devices_lock){+.+.}-{3:3}, 
> at: host1x_client_register+0xac/0x220 [host1x]
> [   38.245996]  #3: ffff0000a2267190 (&dev->mutex){....}-{3:3}, at: 
> __device_attach+0x8c/0x230
> [   38.254372]  #4: ffff000092c880f0 (&wgrp->lock){+.+.}-{3:3}, at: 
> tegra_display_hub_prepare+0xd8/0x170 [tegra_drm]
> [   38.264788]  #5: ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.274658]
> [   38.274658] stack backtrace:
> [   38.279012] CPU: 0 PID: 239 Comm: systemd-udevd Tainted: G S       
> 5.11.0-rc2-next-20210108+ #102
> [   38.288660] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
> [   38.294577] Call trace:
> [   38.297022]  dump_backtrace+0x0/0x2c0
> [   38.300695]  show_stack+0x18/0x6c
> [   38.304013]  dump_stack+0x120/0x19c
> [   38.307507]  __lock_acquire+0x171c/0x2c34
> [   38.311521]  lock_acquire.part.0+0x230/0x490
> [   38.315793]  lock_acquire+0x70/0x90
> [   38.319285]  __mutex_lock+0x11c/0x6d0
> [   38.322952]  mutex_lock_nested+0x58/0x90
> [   38.326877]  host1x_client_resume+0x30/0x100 [host1x]
> [   38.332047]  host1x_client_resume+0x44/0x100 [host1x]
> [   38.337200]  tegra_display_hub_prepare+0xf8/0x170 [tegra_drm]
> [   38.343084]  host1x_drm_probe+0x1fc/0x4f0 [tegra_drm]
> [   38.348256]  host1x_device_probe+0x3c/0x50 [host1x]
> [   38.353240]  really_probe+0x148/0x6f0
> [   38.356906]  driver_probe_device+0x78/0xe4
> [   38.361005]  __device_attach_driver+0x10c/0x170
> [   38.365536]  bus_for_each_drv+0xf0/0x160
> [   38.369461]  __device_attach+0x168/0x230
> [   38.373385]  device_initial_probe+0x14/0x20
> [   38.377571]  bus_probe_device+0xec/0x100
> [   38.381494]  device_add+0x580/0xbcc
> [   38.384985]  host1x_subdev_register+0x178/0x1cc [host1x]
> [   38.390397]  host1x_client_register+0x138/0x220 [host1x]
> [   38.395808]  nvdec_probe+0x240/0x3ec [tegra_drm]
> [   38.400549]  platform_probe+0x8c/0x110
> [   38.404302]  really_probe+0x148/0x6f0
> [   38.407966]  driver_probe_device+0x78/0xe4
> [   38.412065]  device_driver_attach+0x120/0x130
> [   38.416423]  __driver_attach+0xb4/0x190
> [   38.420261]  bus_for_each_dev+0xe8/0x160
> [   38.424185]  driver_attach+0x34/0x44
> [   38.427761]  bus_add_driver+0x1a4/0x2b0
> [   38.431598]  driver_register+0xe0/0x210
> [   38.435437]  __platform_register_drivers+0x6c/0x104
> [   38.440318]  host1x_drm_init+0x54/0x1000 [tegra_drm]
> [   38.445405]  do_one_initcall+0xec/0x5e0
> [   38.449244]  do_init_module+0xe0/0x384
> [   38.453000]  load_module+0x32d8/0x3c60

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-26 14:54           ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 14:54 UTC (permalink / raw)
  To: Mikko Perttunen, Dmitry Osipenko, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

On 3/22/21 5:19 PM, Mikko Perttunen wrote:
> On 22.3.2021 16.48, Dmitry Osipenko wrote:
>> 22.03.2021 17:46, Thierry Reding пишет:
>>> On Mon, Jan 11, 2021 at 02:59:59PM +0200, Mikko Perttunen wrote:
>>>> To avoid false lockdep warnings, give each client lock a different
>>>> lock class, passed from the initialization site by macro.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/bus.c | 7 ++++---
>>>>   include/linux/host1x.h   | 9 ++++++++-
>>>>   2 files changed, 12 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/bus.c b/drivers/gpu/host1x/bus.c
>>>> index 347fb962b6c9..8fc79e9cb652 100644
>>>> --- a/drivers/gpu/host1x/bus.c
>>>> +++ b/drivers/gpu/host1x/bus.c
>>>> @@ -715,13 +715,14 @@ EXPORT_SYMBOL(host1x_driver_unregister);
>>>>    * device and call host1x_device_init(), which will in turn call 
>>>> each client's
>>>>    * &host1x_client_ops.init implementation.
>>>>    */
>>>> -int host1x_client_register(struct host1x_client *client)
>>>> +int __host1x_client_register(struct host1x_client *client,
>>>> +               struct lock_class_key *key)
>>>
>>> I've seen the kbuild robot warn about this because the kerneldoc is now
>>> out of date.
>>>
>>>>   {
>>>>       struct host1x *host1x;
>>>>       int err;
>>>>       INIT_LIST_HEAD(&client->list);
>>>> -    mutex_init(&client->lock);
>>>> +    __mutex_init(&client->lock, "host1x client lock", key);
>>>
>>> Should we maybe attempt to make this unique? Could we use something like
>>> dev_name(client->dev) for this?
>>
>> I'm curious who the lockdep warning could be triggered at all, I don't
>> recall ever seeing it. Mikko, could you please clarify how to reproduce
>> the warning?
>>
> 
> This is pretty difficult to read but I guess it's some interaction 
> related to the delayed initialization of host1x clients? In any case, I 
> consistently get it at boot (though it may be triggered by vic probe 
> instead of nvdec).
> 
> I'll fix the kbuild robot warnings and see if I can add a 
> client-specific lock name for v6.

Lockdep doesn't seem to be liking dev_name() for the name, and I think 
allocating a string for this purpose seems a bit overkill, so I'll keep 
the lock name as is if there are no objections.

Mikko

> 
> Mikko
> 
> [   38.128257] WARNING: possible recursive locking detected
> [   38.133567] 5.11.0-rc2-next-20210108+ #102 Tainted: G S
> [   38.140089] --------------------------------------------
> [   38.145395] systemd-udevd/239 is trying to acquire lock:
> [   38.150703] ffff0000997aa218 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.160142]
> [   38.160142] but task is already holding lock:
> [   38.165968] ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.175398]
> [   38.175398] other info that might help us debug this:
> [   38.181918]  Possible unsafe locking scenario:
> [   38.181918]
> [   38.187830]        CPU0
> [   38.190275]        ----
> [   38.192719]   lock(&client->lock);
> [   38.196129]   lock(&client->lock);
> [   38.199537]
> [   38.199537]  *** DEADLOCK ***
> [   38.199537]
> [   38.205449]  May be due to missing lock nesting notation
> [   38.205449]
> [   38.212228] 6 locks held by systemd-udevd/239:
> [   38.216669]  #0: ffff00009261c188 (&dev->mutex){....}-{3:3}, at: 
> device_driver_attach+0x60/0x130
> [   38.225487]  #1: ffff800009a17168 (devices_lock){+.+.}-{3:3}, at: 
> host1x_client_register+0x7c/0x220 [host1x]
> [   38.235441]  #2: ffff000083f94bb8 (&host->devices_lock){+.+.}-{3:3}, 
> at: host1x_client_register+0xac/0x220 [host1x]
> [   38.245996]  #3: ffff0000a2267190 (&dev->mutex){....}-{3:3}, at: 
> __device_attach+0x8c/0x230
> [   38.254372]  #4: ffff000092c880f0 (&wgrp->lock){+.+.}-{3:3}, at: 
> tegra_display_hub_prepare+0xd8/0x170 [tegra_drm]
> [   38.264788]  #5: ffff000080c3b148 (&client->lock){+.+.}-{3:3}, at: 
> host1x_client_resume+0x30/0x100 [host1x]
> [   38.274658]
> [   38.274658] stack backtrace:
> [   38.279012] CPU: 0 PID: 239 Comm: systemd-udevd Tainted: G S       
> 5.11.0-rc2-next-20210108+ #102
> [   38.288660] Hardware name: NVIDIA Jetson TX2 Developer Kit (DT)
> [   38.294577] Call trace:
> [   38.297022]  dump_backtrace+0x0/0x2c0
> [   38.300695]  show_stack+0x18/0x6c
> [   38.304013]  dump_stack+0x120/0x19c
> [   38.307507]  __lock_acquire+0x171c/0x2c34
> [   38.311521]  lock_acquire.part.0+0x230/0x490
> [   38.315793]  lock_acquire+0x70/0x90
> [   38.319285]  __mutex_lock+0x11c/0x6d0
> [   38.322952]  mutex_lock_nested+0x58/0x90
> [   38.326877]  host1x_client_resume+0x30/0x100 [host1x]
> [   38.332047]  host1x_client_resume+0x44/0x100 [host1x]
> [   38.337200]  tegra_display_hub_prepare+0xf8/0x170 [tegra_drm]
> [   38.343084]  host1x_drm_probe+0x1fc/0x4f0 [tegra_drm]
> [   38.348256]  host1x_device_probe+0x3c/0x50 [host1x]
> [   38.353240]  really_probe+0x148/0x6f0
> [   38.356906]  driver_probe_device+0x78/0xe4
> [   38.361005]  __device_attach_driver+0x10c/0x170
> [   38.365536]  bus_for_each_drv+0xf0/0x160
> [   38.369461]  __device_attach+0x168/0x230
> [   38.373385]  device_initial_probe+0x14/0x20
> [   38.377571]  bus_probe_device+0xec/0x100
> [   38.381494]  device_add+0x580/0xbcc
> [   38.384985]  host1x_subdev_register+0x178/0x1cc [host1x]
> [   38.390397]  host1x_client_register+0x138/0x220 [host1x]
> [   38.395808]  nvdec_probe+0x240/0x3ec [tegra_drm]
> [   38.400549]  platform_probe+0x8c/0x110
> [   38.404302]  really_probe+0x148/0x6f0
> [   38.407966]  driver_probe_device+0x78/0xe4
> [   38.412065]  device_driver_attach+0x120/0x130
> [   38.416423]  __driver_attach+0xb4/0x190
> [   38.420261]  bus_for_each_dev+0xe8/0x160
> [   38.424185]  driver_attach+0x34/0x44
> [   38.427761]  bus_add_driver+0x1a4/0x2b0
> [   38.431598]  driver_register+0xe0/0x210
> [   38.435437]  __platform_register_drivers+0x6c/0x104
> [   38.440318]  host1x_drm_init+0x54/0x1000 [tegra_drm]
> [   38.445405]  do_one_initcall+0xec/0x5e0
> [   38.449244]  do_init_module+0xe0/0x384
> [   38.453000]  load_module+0x32d8/0x3c60
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-26 14:54           ` Mikko Perttunen
@ 2021-03-26 18:31             ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-26 18:31 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

26.03.2021 17:54, Mikko Perttunen пишет:
> 
> Lockdep doesn't seem to be liking dev_name() for the name, and I think
> allocating a string for this purpose seems a bit overkill, so I'll keep
> the lock name as is if there are no objections.

What does "liking" mean?

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-26 18:31             ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-26 18:31 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

26.03.2021 17:54, Mikko Perttunen пишет:
> 
> Lockdep doesn't seem to be liking dev_name() for the name, and I think
> allocating a string for this purpose seems a bit overkill, so I'll keep
> the lock name as is if there are no objections.

What does "liking" mean?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-26 18:31             ` Dmitry Osipenko
@ 2021-03-26 19:10               ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 19:10 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

On 3/26/21 8:31 PM, Dmitry Osipenko wrote:
> 26.03.2021 17:54, Mikko Perttunen пишет:
>>
>> Lockdep doesn't seem to be liking dev_name() for the name, and I think
>> allocating a string for this purpose seems a bit overkill, so I'll keep
>> the lock name as is if there are no objections.
> 
> What does "liking" mean?
> 

kernel/locking/lockdep.c:894 (on my local tree):

                        WARN_ON_ONCE(class->name != lock->name &&
                                      lock->key != 
&__lockdep_no_validate__);

Mikko

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-26 19:10               ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-03-26 19:10 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

On 3/26/21 8:31 PM, Dmitry Osipenko wrote:
> 26.03.2021 17:54, Mikko Perttunen пишет:
>>
>> Lockdep doesn't seem to be liking dev_name() for the name, and I think
>> allocating a string for this purpose seems a bit overkill, so I'll keep
>> the lock name as is if there are no objections.
> 
> What does "liking" mean?
> 

kernel/locking/lockdep.c:894 (on my local tree):

                        WARN_ON_ONCE(class->name != lock->name &&
                                      lock->key != 
&__lockdep_no_validate__);

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
  2021-03-26 19:10               ` Mikko Perttunen
@ 2021-03-26 22:47                 ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-26 22:47 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, Thierry Reding
  Cc: jonathanh, airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

26.03.2021 22:10, Mikko Perttunen пишет:
> On 3/26/21 8:31 PM, Dmitry Osipenko wrote:
>> 26.03.2021 17:54, Mikko Perttunen пишет:
>>>
>>> Lockdep doesn't seem to be liking dev_name() for the name, and I think
>>> allocating a string for this purpose seems a bit overkill, so I'll keep
>>> the lock name as is if there are no objections.
>>
>> What does "liking" mean?
>>
> 
> kernel/locking/lockdep.c:894 (on my local tree):
> 
>                        WARN_ON_ONCE(class->name != lock->name &&
>                                      lock->key !=
> &__lockdep_no_validate__);

Alright, I see now that lockdep requires to use the same name.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client
@ 2021-03-26 22:47                 ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-03-26 22:47 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, Thierry Reding
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, linux-tegra

26.03.2021 22:10, Mikko Perttunen пишет:
> On 3/26/21 8:31 PM, Dmitry Osipenko wrote:
>> 26.03.2021 17:54, Mikko Perttunen пишет:
>>>
>>> Lockdep doesn't seem to be liking dev_name() for the name, and I think
>>> allocating a string for this purpose seems a bit overkill, so I'll keep
>>> the lock name as is if there are no objections.
>>
>> What does "liking" mean?
>>
> 
> kernel/locking/lockdep.c:894 (on my local tree):
> 
>                        WARN_ON_ONCE(class->name != lock->name &&
>                                      lock->key !=
> &__lockdep_no_validate__);

Alright, I see now that lockdep requires to use the same name.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-03-26 14:34       ` Mikko Perttunen
@ 2021-04-01 21:19         ` Michał Mirosław
  -1 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-01 21:19 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: Thierry Reding, Mikko Perttunen, jonathanh, digetx, airlied,
	daniel, linux-tegra, dri-devel, talho, bhuntsman

On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:16 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> > > Show the number of pending waiters in the debugfs status file.
> > > This is useful for testing to verify that waiters do not leak
> > > or accumulate incorrectly.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > >   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> > >   1 file changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> > > index 1b4997bda1c7..8a14880c61bb 100644
> > > --- a/drivers/gpu/host1x/debug.c
> > > +++ b/drivers/gpu/host1x/debug.c
> > > @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> > >   static void show_syncpts(struct host1x *m, struct output *o)
> > >   {
> > > +	struct list_head *pos;
> > >   	unsigned int i;
> > >   	host1x_debug_output(o, "---- syncpts ----\n");
> > > @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> > >   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> > >   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> > >   		u32 min = host1x_syncpt_load(m->syncpt + i);
> > > +		unsigned int waiters = 0;
> > > -		if (!min && !max)
> > > +		spin_lock(&m->syncpt[i].intr.lock);
> > > +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> > > +			waiters++;
> > > +		spin_unlock(&m->syncpt[i].intr.lock);
> > 
> > Would it make sense to keep a running count so that we don't have to
> > compute it here?
> 
> Considering this is just a debug facility, I think I prefer not adding a new
> field just for it.

This looks like IRQ-disabled region, so unless only root can trigger
this code, maybe the additional field could save a potential headache?
How many waiters can there be in the worst case?

Best Regards
Michał Mirosław

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-04-01 21:19         ` Michał Mirosław
  0 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-01 21:19 UTC (permalink / raw)
  To: Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, Thierry Reding,
	linux-tegra, digetx, Mikko Perttunen

On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> On 3/23/21 12:16 PM, Thierry Reding wrote:
> > On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> > > Show the number of pending waiters in the debugfs status file.
> > > This is useful for testing to verify that waiters do not leak
> > > or accumulate incorrectly.
> > > 
> > > Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > > ---
> > >   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> > >   1 file changed, 11 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> > > index 1b4997bda1c7..8a14880c61bb 100644
> > > --- a/drivers/gpu/host1x/debug.c
> > > +++ b/drivers/gpu/host1x/debug.c
> > > @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> > >   static void show_syncpts(struct host1x *m, struct output *o)
> > >   {
> > > +	struct list_head *pos;
> > >   	unsigned int i;
> > >   	host1x_debug_output(o, "---- syncpts ----\n");
> > > @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> > >   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> > >   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> > >   		u32 min = host1x_syncpt_load(m->syncpt + i);
> > > +		unsigned int waiters = 0;
> > > -		if (!min && !max)
> > > +		spin_lock(&m->syncpt[i].intr.lock);
> > > +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> > > +			waiters++;
> > > +		spin_unlock(&m->syncpt[i].intr.lock);
> > 
> > Would it make sense to keep a running count so that we don't have to
> > compute it here?
> 
> Considering this is just a debug facility, I think I prefer not adding a new
> field just for it.

This looks like IRQ-disabled region, so unless only root can trigger
this code, maybe the additional field could save a potential headache?
How many waiters can there be in the worst case?

Best Regards
Michał Mirosław
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-04-01 21:19         ` Michał Mirosław
@ 2021-04-02 16:02           ` Dmitry Osipenko
  -1 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-04-02 16:02 UTC (permalink / raw)
  To: Michał Mirosław, Mikko Perttunen
  Cc: Thierry Reding, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

02.04.2021 00:19, Michał Mirosław пишет:
> On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
>> On 3/23/21 12:16 PM, Thierry Reding wrote:
>>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>>>> Show the number of pending waiters in the debugfs status file.
>>>> This is useful for testing to verify that waiters do not leak
>>>> or accumulate incorrectly.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>>>> index 1b4997bda1c7..8a14880c61bb 100644
>>>> --- a/drivers/gpu/host1x/debug.c
>>>> +++ b/drivers/gpu/host1x/debug.c
>>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>>>   static void show_syncpts(struct host1x *m, struct output *o)
>>>>   {
>>>> +	struct list_head *pos;
>>>>   	unsigned int i;
>>>>   	host1x_debug_output(o, "---- syncpts ----\n");
>>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
>>>> +		unsigned int waiters = 0;
>>>> -		if (!min && !max)
>>>> +		spin_lock(&m->syncpt[i].intr.lock);
>>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>>>> +			waiters++;
>>>> +		spin_unlock(&m->syncpt[i].intr.lock);
>>>
>>> Would it make sense to keep a running count so that we don't have to
>>> compute it here?
>>
>> Considering this is just a debug facility, I think I prefer not adding a new
>> field just for it.
> 
> This looks like IRQ-disabled region, so unless only root can trigger
> this code, maybe the additional field could save a potential headache?
> How many waiters can there be in the worst case?

The host1x's IRQ handler runs in a workqueue, so it should be okay.


^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-04-02 16:02           ` Dmitry Osipenko
  0 siblings, 0 replies; 195+ messages in thread
From: Dmitry Osipenko @ 2021-04-02 16:02 UTC (permalink / raw)
  To: Michał Mirosław, Mikko Perttunen
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, Thierry Reding,
	linux-tegra, Mikko Perttunen

02.04.2021 00:19, Michał Mirosław пишет:
> On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
>> On 3/23/21 12:16 PM, Thierry Reding wrote:
>>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>>>> Show the number of pending waiters in the debugfs status file.
>>>> This is useful for testing to verify that waiters do not leak
>>>> or accumulate incorrectly.
>>>>
>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>> ---
>>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>>>   1 file changed, 11 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>>>> index 1b4997bda1c7..8a14880c61bb 100644
>>>> --- a/drivers/gpu/host1x/debug.c
>>>> +++ b/drivers/gpu/host1x/debug.c
>>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>>>   static void show_syncpts(struct host1x *m, struct output *o)
>>>>   {
>>>> +	struct list_head *pos;
>>>>   	unsigned int i;
>>>>   	host1x_debug_output(o, "---- syncpts ----\n");
>>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
>>>> +		unsigned int waiters = 0;
>>>> -		if (!min && !max)
>>>> +		spin_lock(&m->syncpt[i].intr.lock);
>>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>>>> +			waiters++;
>>>> +		spin_unlock(&m->syncpt[i].intr.lock);
>>>
>>> Would it make sense to keep a running count so that we don't have to
>>> compute it here?
>>
>> Considering this is just a debug facility, I think I prefer not adding a new
>> field just for it.
> 
> This looks like IRQ-disabled region, so unless only root can trigger
> this code, maybe the additional field could save a potential headache?
> How many waiters can there be in the worst case?

The host1x's IRQ handler runs in a workqueue, so it should be okay.

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-04-02 16:02           ` Dmitry Osipenko
@ 2021-04-08  4:13             ` Michał Mirosław
  -1 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-08  4:13 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Thierry Reding, Mikko Perttunen, jonathanh,
	airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
> 02.04.2021 00:19, Michał Mirosław пишет:
> > On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> >> On 3/23/21 12:16 PM, Thierry Reding wrote:
> >>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> >>>> Show the number of pending waiters in the debugfs status file.
> >>>> This is useful for testing to verify that waiters do not leak
> >>>> or accumulate incorrectly.
> >>>>
> >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> >>>> ---
> >>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> >>>>   1 file changed, 11 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> >>>> index 1b4997bda1c7..8a14880c61bb 100644
> >>>> --- a/drivers/gpu/host1x/debug.c
> >>>> +++ b/drivers/gpu/host1x/debug.c
> >>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> >>>>   static void show_syncpts(struct host1x *m, struct output *o)
> >>>>   {
> >>>> +	struct list_head *pos;
> >>>>   	unsigned int i;
> >>>>   	host1x_debug_output(o, "---- syncpts ----\n");
> >>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> >>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> >>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> >>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
> >>>> +		unsigned int waiters = 0;
> >>>> -		if (!min && !max)
> >>>> +		spin_lock(&m->syncpt[i].intr.lock);
> >>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> >>>> +			waiters++;
> >>>> +		spin_unlock(&m->syncpt[i].intr.lock);
> >>>
> >>> Would it make sense to keep a running count so that we don't have to
> >>> compute it here?
> >>
> >> Considering this is just a debug facility, I think I prefer not adding a new
> >> field just for it.
> > 
> > This looks like IRQ-disabled region, so unless only root can trigger
> > this code, maybe the additional field could save a potential headache?
> > How many waiters can there be in the worst case?
> 
> The host1x's IRQ handler runs in a workqueue, so it should be okay.

Why, then, this uses a spinlock (and it has 'intr' in its name)?

Best Regards
Michał Mirosław

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-04-08  4:13             ` Michał Mirosław
  0 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-08  4:13 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	Thierry Reding, linux-tegra, Mikko Perttunen

On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
> 02.04.2021 00:19, Michał Mirosław пишет:
> > On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> >> On 3/23/21 12:16 PM, Thierry Reding wrote:
> >>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> >>>> Show the number of pending waiters in the debugfs status file.
> >>>> This is useful for testing to verify that waiters do not leak
> >>>> or accumulate incorrectly.
> >>>>
> >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> >>>> ---
> >>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> >>>>   1 file changed, 11 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> >>>> index 1b4997bda1c7..8a14880c61bb 100644
> >>>> --- a/drivers/gpu/host1x/debug.c
> >>>> +++ b/drivers/gpu/host1x/debug.c
> >>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> >>>>   static void show_syncpts(struct host1x *m, struct output *o)
> >>>>   {
> >>>> +	struct list_head *pos;
> >>>>   	unsigned int i;
> >>>>   	host1x_debug_output(o, "---- syncpts ----\n");
> >>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> >>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> >>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> >>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
> >>>> +		unsigned int waiters = 0;
> >>>> -		if (!min && !max)
> >>>> +		spin_lock(&m->syncpt[i].intr.lock);
> >>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> >>>> +			waiters++;
> >>>> +		spin_unlock(&m->syncpt[i].intr.lock);
> >>>
> >>> Would it make sense to keep a running count so that we don't have to
> >>> compute it here?
> >>
> >> Considering this is just a debug facility, I think I prefer not adding a new
> >> field just for it.
> > 
> > This looks like IRQ-disabled region, so unless only root can trigger
> > this code, maybe the additional field could save a potential headache?
> > How many waiters can there be in the worst case?
> 
> The host1x's IRQ handler runs in a workqueue, so it should be okay.

Why, then, this uses a spinlock (and it has 'intr' in its name)?

Best Regards
Michał Mirosław
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-04-08  4:13             ` Michał Mirosław
@ 2021-04-08  4:25               ` Michał Mirosław
  -1 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-08  4:25 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, Thierry Reding, Mikko Perttunen, jonathanh,
	airlied, daniel, linux-tegra, dri-devel, talho, bhuntsman

On Thu, Apr 08, 2021 at 06:13:44AM +0200, Michał Mirosław wrote:
> On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
> > 02.04.2021 00:19, Michał Mirosław пишет:
> > > On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> > >> On 3/23/21 12:16 PM, Thierry Reding wrote:
> > >>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> > >>>> Show the number of pending waiters in the debugfs status file.
> > >>>> This is useful for testing to verify that waiters do not leak
> > >>>> or accumulate incorrectly.
> > >>>>
> > >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > >>>> ---
> > >>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> > >>>>   1 file changed, 11 insertions(+), 3 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> > >>>> index 1b4997bda1c7..8a14880c61bb 100644
> > >>>> --- a/drivers/gpu/host1x/debug.c
> > >>>> +++ b/drivers/gpu/host1x/debug.c
> > >>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> > >>>>   static void show_syncpts(struct host1x *m, struct output *o)
> > >>>>   {
> > >>>> +	struct list_head *pos;
> > >>>>   	unsigned int i;
> > >>>>   	host1x_debug_output(o, "---- syncpts ----\n");
> > >>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> > >>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> > >>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> > >>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
> > >>>> +		unsigned int waiters = 0;
> > >>>> -		if (!min && !max)
> > >>>> +		spin_lock(&m->syncpt[i].intr.lock);
> > >>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> > >>>> +			waiters++;
> > >>>> +		spin_unlock(&m->syncpt[i].intr.lock);
> > >>>
> > >>> Would it make sense to keep a running count so that we don't have to
> > >>> compute it here?
> > >>
> > >> Considering this is just a debug facility, I think I prefer not adding a new
> > >> field just for it.
> > > 
> > > This looks like IRQ-disabled region, so unless only root can trigger
> > > this code, maybe the additional field could save a potential headache?
> > > How many waiters can there be in the worst case?
> > 
> > The host1x's IRQ handler runs in a workqueue, so it should be okay.
> 
> Why, then, this uses a spinlock (and it has 'intr' in its name)?

The critical sections are already O(n) in number of waiters, so this
patch doesn't make things worse as I previously thought. The questions
remain: What is the expected number and upper bound of workers?
Shouldn't this be a mutex instead?

Best Regards
Michał Mirosław

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-04-08  4:25               ` Michał Mirosław
  0 siblings, 0 replies; 195+ messages in thread
From: Michał Mirosław @ 2021-04-08  4:25 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Mikko Perttunen, airlied, dri-devel, jonathanh, talho, bhuntsman,
	Thierry Reding, linux-tegra, Mikko Perttunen

On Thu, Apr 08, 2021 at 06:13:44AM +0200, Michał Mirosław wrote:
> On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
> > 02.04.2021 00:19, Michał Mirosław пишет:
> > > On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
> > >> On 3/23/21 12:16 PM, Thierry Reding wrote:
> > >>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
> > >>>> Show the number of pending waiters in the debugfs status file.
> > >>>> This is useful for testing to verify that waiters do not leak
> > >>>> or accumulate incorrectly.
> > >>>>
> > >>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> > >>>> ---
> > >>>>   drivers/gpu/host1x/debug.c | 14 +++++++++++---
> > >>>>   1 file changed, 11 insertions(+), 3 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> > >>>> index 1b4997bda1c7..8a14880c61bb 100644
> > >>>> --- a/drivers/gpu/host1x/debug.c
> > >>>> +++ b/drivers/gpu/host1x/debug.c
> > >>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
> > >>>>   static void show_syncpts(struct host1x *m, struct output *o)
> > >>>>   {
> > >>>> +	struct list_head *pos;
> > >>>>   	unsigned int i;
> > >>>>   	host1x_debug_output(o, "---- syncpts ----\n");
> > >>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
> > >>>>   	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
> > >>>>   		u32 max = host1x_syncpt_read_max(m->syncpt + i);
> > >>>>   		u32 min = host1x_syncpt_load(m->syncpt + i);
> > >>>> +		unsigned int waiters = 0;
> > >>>> -		if (!min && !max)
> > >>>> +		spin_lock(&m->syncpt[i].intr.lock);
> > >>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
> > >>>> +			waiters++;
> > >>>> +		spin_unlock(&m->syncpt[i].intr.lock);
> > >>>
> > >>> Would it make sense to keep a running count so that we don't have to
> > >>> compute it here?
> > >>
> > >> Considering this is just a debug facility, I think I prefer not adding a new
> > >> field just for it.
> > > 
> > > This looks like IRQ-disabled region, so unless only root can trigger
> > > this code, maybe the additional field could save a potential headache?
> > > How many waiters can there be in the worst case?
> > 
> > The host1x's IRQ handler runs in a workqueue, so it should be okay.
> 
> Why, then, this uses a spinlock (and it has 'intr' in its name)?

The critical sections are already O(n) in number of waiters, so this
patch doesn't make things worse as I previously thought. The questions
remain: What is the expected number and upper bound of workers?
Shouldn't this be a mutex instead?

Best Regards
Michał Mirosław
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
  2021-04-08  4:25               ` Michał Mirosław
@ 2021-04-08 11:58                 ` Mikko Perttunen
  -1 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-04-08 11:58 UTC (permalink / raw)
  To: Michał Mirosław, Dmitry Osipenko
  Cc: Thierry Reding, Mikko Perttunen, jonathanh, airlied, daniel,
	linux-tegra, dri-devel, talho, bhuntsman

On 4/8/21 7:25 AM, Michał Mirosław wrote:
> On Thu, Apr 08, 2021 at 06:13:44AM +0200, Michał Mirosław wrote:
>> On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
>>> 02.04.2021 00:19, Michał Mirosław пишет:
>>>> On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
>>>>> On 3/23/21 12:16 PM, Thierry Reding wrote:
>>>>>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>>>>>>> Show the number of pending waiters in the debugfs status file.
>>>>>>> This is useful for testing to verify that waiters do not leak
>>>>>>> or accumulate incorrectly.
>>>>>>>
>>>>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>>>>>>    1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>>>>>>> index 1b4997bda1c7..8a14880c61bb 100644
>>>>>>> --- a/drivers/gpu/host1x/debug.c
>>>>>>> +++ b/drivers/gpu/host1x/debug.c
>>>>>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>>>>>>    static void show_syncpts(struct host1x *m, struct output *o)
>>>>>>>    {
>>>>>>> +	struct list_head *pos;
>>>>>>>    	unsigned int i;
>>>>>>>    	host1x_debug_output(o, "---- syncpts ----\n");
>>>>>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>>>>>>    	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>>>>>>    		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>>>>>>    		u32 min = host1x_syncpt_load(m->syncpt + i);
>>>>>>> +		unsigned int waiters = 0;
>>>>>>> -		if (!min && !max)
>>>>>>> +		spin_lock(&m->syncpt[i].intr.lock);
>>>>>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>>>>>>> +			waiters++;
>>>>>>> +		spin_unlock(&m->syncpt[i].intr.lock);
>>>>>>
>>>>>> Would it make sense to keep a running count so that we don't have to
>>>>>> compute it here?
>>>>>
>>>>> Considering this is just a debug facility, I think I prefer not adding a new
>>>>> field just for it.
>>>>
>>>> This looks like IRQ-disabled region, so unless only root can trigger
>>>> this code, maybe the additional field could save a potential headache?
>>>> How many waiters can there be in the worst case?
>>>
>>> The host1x's IRQ handler runs in a workqueue, so it should be okay.
>>
>> Why, then, this uses a spinlock (and it has 'intr' in its name)?
> 
> The critical sections are already O(n) in number of waiters, so this
> patch doesn't make things worse as I previously thought. The questions
> remain: What is the expected number and upper bound of workers?
> Shouldn't this be a mutex instead?

Everything is primarily for historical reasons. The name 'intr' is 
because this is in the part of the host1x driver that handles syncpoint 
threshold interrupts - just some of it is in interrupt context and some not.

In any case, this code is scheduled for a complete redesign once we get 
the UAPI changes done. I'll take this into account at that point.

Cheers,
Mikko

> 
> Best Regards
> Michał Mirosław
> 

^ permalink raw reply	[flat|nested] 195+ messages in thread

* Re: [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs
@ 2021-04-08 11:58                 ` Mikko Perttunen
  0 siblings, 0 replies; 195+ messages in thread
From: Mikko Perttunen @ 2021-04-08 11:58 UTC (permalink / raw)
  To: Michał Mirosław, Dmitry Osipenko
  Cc: airlied, dri-devel, jonathanh, talho, bhuntsman, Thierry Reding,
	linux-tegra, Mikko Perttunen

On 4/8/21 7:25 AM, Michał Mirosław wrote:
> On Thu, Apr 08, 2021 at 06:13:44AM +0200, Michał Mirosław wrote:
>> On Fri, Apr 02, 2021 at 07:02:32PM +0300, Dmitry Osipenko wrote:
>>> 02.04.2021 00:19, Michał Mirosław пишет:
>>>> On Fri, Mar 26, 2021 at 04:34:13PM +0200, Mikko Perttunen wrote:
>>>>> On 3/23/21 12:16 PM, Thierry Reding wrote:
>>>>>> On Mon, Jan 11, 2021 at 03:00:01PM +0200, Mikko Perttunen wrote:
>>>>>>> Show the number of pending waiters in the debugfs status file.
>>>>>>> This is useful for testing to verify that waiters do not leak
>>>>>>> or accumulate incorrectly.
>>>>>>>
>>>>>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>>>>>> ---
>>>>>>>    drivers/gpu/host1x/debug.c | 14 +++++++++++---
>>>>>>>    1 file changed, 11 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
>>>>>>> index 1b4997bda1c7..8a14880c61bb 100644
>>>>>>> --- a/drivers/gpu/host1x/debug.c
>>>>>>> +++ b/drivers/gpu/host1x/debug.c
>>>>>>> @@ -69,6 +69,7 @@ static int show_channel(struct host1x_channel *ch, void *data, bool show_fifo)
>>>>>>>    static void show_syncpts(struct host1x *m, struct output *o)
>>>>>>>    {
>>>>>>> +	struct list_head *pos;
>>>>>>>    	unsigned int i;
>>>>>>>    	host1x_debug_output(o, "---- syncpts ----\n");
>>>>>>> @@ -76,12 +77,19 @@ static void show_syncpts(struct host1x *m, struct output *o)
>>>>>>>    	for (i = 0; i < host1x_syncpt_nb_pts(m); i++) {
>>>>>>>    		u32 max = host1x_syncpt_read_max(m->syncpt + i);
>>>>>>>    		u32 min = host1x_syncpt_load(m->syncpt + i);
>>>>>>> +		unsigned int waiters = 0;
>>>>>>> -		if (!min && !max)
>>>>>>> +		spin_lock(&m->syncpt[i].intr.lock);
>>>>>>> +		list_for_each(pos, &m->syncpt[i].intr.wait_head)
>>>>>>> +			waiters++;
>>>>>>> +		spin_unlock(&m->syncpt[i].intr.lock);
>>>>>>
>>>>>> Would it make sense to keep a running count so that we don't have to
>>>>>> compute it here?
>>>>>
>>>>> Considering this is just a debug facility, I think I prefer not adding a new
>>>>> field just for it.
>>>>
>>>> This looks like IRQ-disabled region, so unless only root can trigger
>>>> this code, maybe the additional field could save a potential headache?
>>>> How many waiters can there be in the worst case?
>>>
>>> The host1x's IRQ handler runs in a workqueue, so it should be okay.
>>
>> Why, then, this uses a spinlock (and it has 'intr' in its name)?
> 
> The critical sections are already O(n) in number of waiters, so this
> patch doesn't make things worse as I previously thought. The questions
> remain: What is the expected number and upper bound of workers?
> Shouldn't this be a mutex instead?

Everything is primarily for historical reasons. The name 'intr' is 
because this is in the part of the host1x driver that handles syncpoint 
threshold interrupts - just some of it is in interrupt context and some not.

In any case, this code is scheduled for a complete redesign once we get 
the UAPI changes done. I'll take this into account at that point.

Cheers,
Mikko

> 
> Best Regards
> Michał Mirosław
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 195+ messages in thread

end of thread, other threads:[~2021-04-08 11:58 UTC | newest]

Thread overview: 195+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 12:59 [PATCH v5 00/21] Host1x/TegraDRM UAPI Mikko Perttunen
2021-01-11 12:59 ` Mikko Perttunen
2021-01-11 12:59 ` [PATCH v5 01/21] gpu: host1x: Use different lock classes for each client Mikko Perttunen
2021-01-11 12:59   ` Mikko Perttunen
2021-03-22 14:46   ` Thierry Reding
2021-03-22 14:46     ` Thierry Reding
2021-03-22 14:48     ` Dmitry Osipenko
2021-03-22 14:48       ` Dmitry Osipenko
2021-03-22 15:19       ` Mikko Perttunen
2021-03-22 15:19         ` Mikko Perttunen
2021-03-22 16:01         ` Dmitry Osipenko
2021-03-22 16:01           ` Dmitry Osipenko
2021-03-23 10:20           ` Thierry Reding
2021-03-23 10:20             ` Thierry Reding
2021-03-23 13:25             ` Dmitry Osipenko
2021-03-23 13:25               ` Dmitry Osipenko
2021-03-26 14:54         ` Mikko Perttunen
2021-03-26 14:54           ` Mikko Perttunen
2021-03-26 18:31           ` Dmitry Osipenko
2021-03-26 18:31             ` Dmitry Osipenko
2021-03-26 19:10             ` Mikko Perttunen
2021-03-26 19:10               ` Mikko Perttunen
2021-03-26 22:47               ` Dmitry Osipenko
2021-03-26 22:47                 ` Dmitry Osipenko
2021-01-11 13:00 ` [PATCH v5 02/21] gpu: host1x: Allow syncpoints without associated client Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 10:10   ` Thierry Reding
2021-03-23 10:10     ` Thierry Reding
2021-03-23 10:32     ` Mikko Perttunen
2021-03-23 10:32       ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 03/21] gpu: host1x: Show number of pending waiters in debugfs Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 10:16   ` Thierry Reding
2021-03-23 10:16     ` Thierry Reding
2021-03-26 14:34     ` Mikko Perttunen
2021-03-26 14:34       ` Mikko Perttunen
2021-04-01 21:19       ` Michał Mirosław
2021-04-01 21:19         ` Michał Mirosław
2021-04-02 16:02         ` Dmitry Osipenko
2021-04-02 16:02           ` Dmitry Osipenko
2021-04-08  4:13           ` Michał Mirosław
2021-04-08  4:13             ` Michał Mirosław
2021-04-08  4:25             ` Michał Mirosław
2021-04-08  4:25               ` Michał Mirosław
2021-04-08 11:58               ` Mikko Perttunen
2021-04-08 11:58                 ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 04/21] gpu: host1x: Remove cancelled waiters immediately Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-12 22:07   ` Dmitry Osipenko
2021-01-12 22:07     ` Dmitry Osipenko
2021-01-12 22:20     ` Mikko Perttunen
2021-01-12 22:20       ` Mikko Perttunen
2021-01-13 16:29       ` Dmitry Osipenko
2021-01-13 16:29         ` Dmitry Osipenko
2021-01-13 18:16         ` Mikko Perttunen
2021-01-13 18:16           ` Mikko Perttunen
2021-03-23 10:23       ` Thierry Reding
2021-03-23 10:23         ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 05/21] gpu: host1x: Use HW-equivalent syncpoint expiration check Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 10:26   ` Thierry Reding
2021-03-23 10:26     ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 06/21] gpu: host1x: Cleanup and refcounting for syncpoints Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 10:36   ` Thierry Reding
2021-03-23 10:36     ` Thierry Reding
2021-03-23 10:44     ` Mikko Perttunen
2021-03-23 10:44       ` Mikko Perttunen
2021-03-23 11:21       ` Thierry Reding
2021-03-23 11:21         ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 07/21] gpu: host1x: Introduce UAPI header Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 10:52   ` Thierry Reding
2021-03-23 10:52     ` Thierry Reding
2021-03-23 11:12     ` Mikko Perttunen
2021-03-23 11:12       ` Mikko Perttunen
2021-03-23 11:43       ` Thierry Reding
2021-03-23 11:43         ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 08/21] gpu: host1x: Implement /dev/host1x device node Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 11:02   ` Thierry Reding
2021-03-23 11:02     ` Thierry Reding
2021-03-23 11:15     ` Mikko Perttunen
2021-03-23 11:15       ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 09/21] gpu: host1x: DMA fences and userspace fence creation Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 11:15   ` Thierry Reding
2021-03-23 11:15     ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 10/21] gpu: host1x: Add no-recovery mode Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 11/21] gpu: host1x: Add job release callback Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 11:55   ` Thierry Reding
2021-03-23 11:55     ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 12/21] gpu: host1x: Add support for syncpoint waits in CDMA pushbuffer Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 13/21] gpu: host1x: Reset max value when freeing a syncpoint Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 14/21] gpu: host1x: Reserve VBLANK syncpoints at initialization Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 15/21] drm/tegra: Add new UAPI to header Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-13 18:14   ` Dmitry Osipenko
2021-01-13 18:14     ` Dmitry Osipenko
2021-01-13 18:56     ` Mikko Perttunen
2021-01-13 18:56       ` Mikko Perttunen
2021-01-14  8:36       ` Dmitry Osipenko
2021-01-14  8:36         ` Dmitry Osipenko
2021-01-14 10:34         ` Mikko Perttunen
2021-01-14 10:34           ` Mikko Perttunen
2021-03-23 12:30           ` Thierry Reding
2021-03-23 12:30             ` Thierry Reding
2021-03-23 14:00             ` Dmitry Osipenko
2021-03-23 14:00               ` Dmitry Osipenko
2021-03-23 16:44               ` Thierry Reding
2021-03-23 16:44                 ` Thierry Reding
2021-03-23 17:32                 ` Dmitry Osipenko
2021-03-23 17:32                   ` Dmitry Osipenko
2021-03-23 17:57                   ` Thierry Reding
2021-03-23 17:57                     ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 16/21] drm/tegra: Boot VIC during runtime PM resume Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 17/21] drm/tegra: Set resv fields when importing/exporting GEMs Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 18/21] drm/tegra: Allocate per-engine channel in core code Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 12:35   ` Thierry Reding
2021-03-23 12:35     ` Thierry Reding
2021-03-23 13:15     ` Mikko Perttunen
2021-03-23 13:15       ` Mikko Perttunen
2021-01-11 13:00 ` [PATCH v5 19/21] drm/tegra: Implement new UAPI Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-11 17:37   ` kernel test robot
2021-01-11 17:37     ` kernel test robot
2021-01-11 17:37     ` kernel test robot
2021-01-12 22:27   ` Dmitry Osipenko
2021-01-12 22:27     ` Dmitry Osipenko
2021-03-23 13:25   ` Thierry Reding
2021-03-23 13:25     ` Thierry Reding
2021-03-23 14:43     ` Mikko Perttunen
2021-03-23 14:43       ` Mikko Perttunen
2021-03-23 15:00       ` Dmitry Osipenko
2021-03-23 15:00         ` Dmitry Osipenko
2021-03-23 16:59         ` Thierry Reding
2021-03-23 16:59           ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 20/21] drm/tegra: Implement job submission part of " Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-03-23 13:38   ` Thierry Reding
2021-03-23 13:38     ` Thierry Reding
2021-03-23 14:16     ` Mikko Perttunen
2021-03-23 14:16       ` Mikko Perttunen
2021-03-23 17:04       ` Thierry Reding
2021-03-23 17:04         ` Thierry Reding
2021-01-11 13:00 ` [PATCH v5 21/21] drm/tegra: Add job firewall Mikko Perttunen
2021-01-11 13:00   ` Mikko Perttunen
2021-01-19 22:29 ` [PATCH v5 00/21] Host1x/TegraDRM UAPI Dmitry Osipenko
2021-01-19 22:29   ` Dmitry Osipenko
2021-01-26  2:45   ` Mikko Perttunen
2021-01-26  2:45     ` Mikko Perttunen
2021-01-27 21:20     ` [PATCH v5 00/21] Host1x sync point UAPI should not be used for tracking DRM jobs Dmitry Osipenko
2021-01-27 21:20       ` Dmitry Osipenko
2021-01-28 11:08       ` Mikko Perttunen
2021-01-28 11:08         ` Mikko Perttunen
2021-01-28 16:58         ` Thierry Reding
2021-01-28 16:58           ` Thierry Reding
2021-01-29 17:30           ` Dmitry Osipenko
2021-01-29 17:30             ` Dmitry Osipenko
2021-02-03 11:18             ` Mikko Perttunen
2021-02-03 11:18               ` Mikko Perttunen
2021-02-27 11:19               ` Dmitry Osipenko
2021-02-27 11:19                 ` Dmitry Osipenko
2021-03-01  8:19                 ` Mikko Perttunen
2021-03-01  8:19                   ` Mikko Perttunen
2021-03-23 18:21                 ` Thierry Reding
2021-03-23 18:21                   ` Thierry Reding
2021-03-23 19:57                   ` Dmitry Osipenko
2021-03-23 19:57                     ` Dmitry Osipenko
2021-03-23 20:13                     ` Dmitry Osipenko
2021-03-23 20:13                       ` Dmitry Osipenko
2021-01-27 21:26     ` [PATCH v5 00/21] Host1x/TegraDRM UAPI Dmitry Osipenko
2021-01-27 21:26       ` Dmitry Osipenko
2021-01-27 21:57       ` Mikko Perttunen
2021-01-27 21:57         ` Mikko Perttunen
2021-01-27 22:06         ` Dmitry Osipenko
2021-01-27 22:06           ` Dmitry Osipenko
2021-01-28 11:46           ` Mikko Perttunen
2021-01-28 11:46             ` Mikko Perttunen
2021-01-27 21:35     ` [PATCH v5 00/21] sync_file API is not very suitable for DRM Dmitry Osipenko
2021-01-27 21:35       ` Dmitry Osipenko
2021-01-27 21:53       ` Mikko Perttunen
2021-01-27 21:53         ` Mikko Perttunen
2021-01-27 22:26         ` Dmitry Osipenko
2021-01-27 22:26           ` Dmitry Osipenko
2021-01-27 21:52     ` [PATCH v5 00/21] support option where all commands are collected into a single,dedicated cmdstream Dmitry Osipenko
2021-01-27 21:52       ` Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.