All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Dynamic Host1x channel allocation
@ 2017-11-05 11:01 Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 02/10] gpu: host1x: Print MLOCK state in debug dumps on T186 Mikko Perttunen
                   ` (7 more replies)
  0 siblings, 8 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Hi all,

this adds support for a new model of hardware channel allocation for
Host1x/TegraDRM. In the current model, one hardware channel is
allocated for each client device at probe time. This is simple but
does not allow for optimal use of hardware resources.

In the new model, we allocate channels dynamically when a
"userspace channel", opened using the channel open IOCTL, has pending
jobs. However, each userspace channel can only have one assigned
channel at a time, ensuring current serialization behavior is kept.
As such there is no change in programming model for the userspace.

The patch adapts VIC to use the new model - GR2D and GR3D are not
modified, as the older Tegra chips they are found on do not have
a large number of hardware channels and therefore it is not clear
if the new model is beneficial (and I don't have access to those
chips to test it out).

Tested using the host1x_test test suite, and also by running
the performance test of host1x_test in parallel.

Thanks,
Mikko

Mikko Perttunen (10):
  gpu: host1x: Parameterize channel aperture size
  gpu: host1x: Print MLOCK state in debug dumps on T186
  gpu: host1x: Add lock around channel allocation
  gpu: host1x: Lock classes during job submission
  gpu: host1x: Add job done callback
  drm/tegra: Deliver job completion callback to client
  drm/tegra: Make syncpoints be per-context
  drm/tegra: Implement dynamic channel allocation model
  drm/tegra: Boot VIC in runtime resume
  gpu: host1x: Optionally block when acquiring channel

 drivers/gpu/drm/tegra/drm.c                    |  82 +++++++++++++++--
 drivers/gpu/drm/tegra/drm.h                    |  12 ++-
 drivers/gpu/drm/tegra/gr2d.c                   |   8 +-
 drivers/gpu/drm/tegra/gr3d.c                   |   8 +-
 drivers/gpu/drm/tegra/vic.c                    | 120 ++++++++++++------------
 drivers/gpu/host1x/cdma.c                      |  45 ++++++---
 drivers/gpu/host1x/cdma.h                      |   1 +
 drivers/gpu/host1x/channel.c                   |  47 ++++++++--
 drivers/gpu/host1x/channel.h                   |   3 +
 drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
 drivers/gpu/host1x/hw/channel_hw.c             |  74 +++++++++++----
 drivers/gpu/host1x/hw/debug_hw_1x06.c          |  18 +++-
 drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/hw_host1x01_channel.h    |   2 +
 drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x02_channel.h    |   2 +
 drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x04_channel.h    |   2 +
 drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x05_channel.h    |   2 +
 drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
 drivers/gpu/host1x/hw/hw_host1x06_vm.h         |   2 +
 include/linux/host1x.h                         |   6 +-
 28 files changed, 517 insertions(+), 118 deletions(-)

-- 
2.14.2

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/10] gpu: host1x: Parameterize channel aperture size
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01     ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w, jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: digetx-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mikko Perttunen

The size of a single channel's aperture is different on Tegra186 vs.
previous chips. Parameterize the value using a new define in the
register definition headers.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/host1x/hw/channel_hw.c          | 3 +--
 drivers/gpu/host1x/hw/hw_host1x01_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x02_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x04_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x05_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x06_vm.h      | 2 ++
 6 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5c0dc6bb51d1..246b78c41281 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -26,7 +26,6 @@
 #include "../intr.h"
 #include "../job.h"
 
-#define HOST1X_CHANNEL_SIZE 16384
 #define TRACE_MAX_LENGTH 128U
 
 static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
@@ -205,7 +204,7 @@ static void enable_gather_filter(struct host1x *host,
 static int host1x_channel_init(struct host1x_channel *ch, struct host1x *dev,
 			       unsigned int index)
 {
-	ch->regs = dev->regs + index * HOST1X_CHANNEL_SIZE;
+	ch->regs = dev->regs + HOST1X_CHANNEL_BASE(index);
 	enable_gather_filter(dev, ch);
 	return 0;
 }
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
index b4bc7ca4e051..be56a3a506de 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -51,6 +51,8 @@
 #ifndef __hw_host1x_channel_host1x_h__
 #define __hw_host1x_channel_host1x_h__
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x02_channel.h b/drivers/gpu/host1x/hw/hw_host1x02_channel.h
index e490bcde33fe..a142576a2c6e 100644
--- a/drivers/gpu/host1x/hw/hw_host1x02_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x02_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X02_CHANNEL_H
 #define HOST1X_HW_HOST1X02_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x04_channel.h b/drivers/gpu/host1x/hw/hw_host1x04_channel.h
index 2e8b635aa660..645483c07fc2 100644
--- a/drivers/gpu/host1x/hw/hw_host1x04_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x04_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X04_CHANNEL_H
 #define HOST1X_HW_HOST1X04_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x05_channel.h b/drivers/gpu/host1x/hw/hw_host1x05_channel.h
index abbbc2641ce6..6aef6bc1c96d 100644
--- a/drivers/gpu/host1x/hw/hw_host1x05_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x05_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X05_CHANNEL_H
 #define HOST1X_HW_HOST1X05_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_vm.h b/drivers/gpu/host1x/hw/hw_host1x06_vm.h
index e54b33902332..0750aea78a30 100644
--- a/drivers/gpu/host1x/hw/hw_host1x06_vm.h
+++ b/drivers/gpu/host1x/hw/hw_host1x06_vm.h
@@ -15,6 +15,8 @@
  *
  */
 
+#define HOST1X_CHANNEL_BASE(x)				((x) * 0x100)
+
 #define HOST1X_CHANNEL_DMASTART				0x0000
 #define HOST1X_CHANNEL_DMASTART_HI			0x0004
 #define HOST1X_CHANNEL_DMAPUT				0x0008
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 01/10] gpu: host1x: Parameterize channel aperture size
@ 2017-11-05 11:01     ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

The size of a single channel's aperture is different on Tegra186 vs.
previous chips. Parameterize the value using a new define in the
register definition headers.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c          | 3 +--
 drivers/gpu/host1x/hw/hw_host1x01_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x02_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x04_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x05_channel.h | 2 ++
 drivers/gpu/host1x/hw/hw_host1x06_vm.h      | 2 ++
 6 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 5c0dc6bb51d1..246b78c41281 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -26,7 +26,6 @@
 #include "../intr.h"
 #include "../job.h"
 
-#define HOST1X_CHANNEL_SIZE 16384
 #define TRACE_MAX_LENGTH 128U
 
 static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
@@ -205,7 +204,7 @@ static void enable_gather_filter(struct host1x *host,
 static int host1x_channel_init(struct host1x_channel *ch, struct host1x *dev,
 			       unsigned int index)
 {
-	ch->regs = dev->regs + index * HOST1X_CHANNEL_SIZE;
+	ch->regs = dev->regs + HOST1X_CHANNEL_BASE(index);
 	enable_gather_filter(dev, ch);
 	return 0;
 }
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_channel.h b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
index b4bc7ca4e051..be56a3a506de 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_channel.h
@@ -51,6 +51,8 @@
 #ifndef __hw_host1x_channel_host1x_h__
 #define __hw_host1x_channel_host1x_h__
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x02_channel.h b/drivers/gpu/host1x/hw/hw_host1x02_channel.h
index e490bcde33fe..a142576a2c6e 100644
--- a/drivers/gpu/host1x/hw/hw_host1x02_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x02_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X02_CHANNEL_H
 #define HOST1X_HW_HOST1X02_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x04_channel.h b/drivers/gpu/host1x/hw/hw_host1x04_channel.h
index 2e8b635aa660..645483c07fc2 100644
--- a/drivers/gpu/host1x/hw/hw_host1x04_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x04_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X04_CHANNEL_H
 #define HOST1X_HW_HOST1X04_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x05_channel.h b/drivers/gpu/host1x/hw/hw_host1x05_channel.h
index abbbc2641ce6..6aef6bc1c96d 100644
--- a/drivers/gpu/host1x/hw/hw_host1x05_channel.h
+++ b/drivers/gpu/host1x/hw/hw_host1x05_channel.h
@@ -51,6 +51,8 @@
 #ifndef HOST1X_HW_HOST1X05_CHANNEL_H
 #define HOST1X_HW_HOST1X05_CHANNEL_H
 
+#define HOST1X_CHANNEL_BASE(x)		((x) * 0x4000)
+
 static inline u32 host1x_channel_fifostat_r(void)
 {
 	return 0x0;
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_vm.h b/drivers/gpu/host1x/hw/hw_host1x06_vm.h
index e54b33902332..0750aea78a30 100644
--- a/drivers/gpu/host1x/hw/hw_host1x06_vm.h
+++ b/drivers/gpu/host1x/hw/hw_host1x06_vm.h
@@ -15,6 +15,8 @@
  *
  */
 
+#define HOST1X_CHANNEL_BASE(x)				((x) * 0x100)
+
 #define HOST1X_CHANNEL_DMASTART				0x0000
 #define HOST1X_CHANNEL_DMASTART_HI			0x0004
 #define HOST1X_CHANNEL_DMAPUT				0x0008
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 02/10] gpu: host1x: Print MLOCK state in debug dumps on T186
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01 ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Add support for dumping current MLOCK state in debug dumps also
on T186, now that MLOCKs are used by the driver.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/hw/debug_hw_1x06.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/hw/debug_hw_1x06.c b/drivers/gpu/host1x/hw/debug_hw_1x06.c
index b503c740c022..659dd6042ccc 100644
--- a/drivers/gpu/host1x/hw/debug_hw_1x06.c
+++ b/drivers/gpu/host1x/hw/debug_hw_1x06.c
@@ -131,5 +131,21 @@ static void host1x_debug_show_channel_fifo(struct host1x *host,
 
 static void host1x_debug_show_mlocks(struct host1x *host, struct output *o)
 {
-	/* TODO */
+	unsigned int i;
+
+	if (!host->hv_regs)
+		return;
+
+	host1x_debug_output(o, "---- mlocks ----\n");
+
+	for (i = 0; i < host1x_syncpt_nb_mlocks(host); i++) {
+		u32 val = host1x_hypervisor_readl(host, HOST1X_HV_MLOCK(i));
+		if (HOST1X_HV_MLOCK_LOCKED_V(val))
+			host1x_debug_output(o, "%u: locked by channel %u\n",
+					    i, HOST1X_HV_MLOCK_CH_V(val));
+		else
+			host1x_debug_output(o, "%u: unlocked\n", i);
+	}
+
+	host1x_debug_output(o, "\n");
 }
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 03/10] gpu: host1x: Add lock around channel allocation
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 02/10] gpu: host1x: Print MLOCK state in debug dumps on T186 Mikko Perttunen
@ 2017-11-05 11:01 ` Mikko Perttunen
  2017-11-05 11:01   ` Mikko Perttunen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

With the new channel allocation model, multiple threads can be
allocating channels simultaneously. Therefore we need to add a lock
around the code.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/channel.c | 7 +++++++
 drivers/gpu/host1x/channel.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 2fb93c27c1d9..9d8cad12f9d8 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -42,6 +42,8 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
 
 	bitmap_zero(chlist->allocated_channels, num_channels);
 
+	mutex_init(&chlist->lock);
+
 	return 0;
 }
 
@@ -111,8 +113,11 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
 	unsigned int max_channels = host->info->nb_channels;
 	unsigned int index;
 
+	mutex_lock(&chlist->lock);
+
 	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
 	if (index >= max_channels) {
+		mutex_unlock(&chlist->lock);
 		dev_err(host->dev, "failed to find free channel\n");
 		return NULL;
 	}
@@ -121,6 +126,8 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
 
 	set_bit(index, chlist->allocated_channels);
 
+	mutex_unlock(&chlist->lock);
+
 	return &chlist->channels[index];
 }
 
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
index 7068e42d42df..e68a8ae9a670 100644
--- a/drivers/gpu/host1x/channel.h
+++ b/drivers/gpu/host1x/channel.h
@@ -29,6 +29,8 @@ struct host1x_channel;
 
 struct host1x_channel_list {
 	struct host1x_channel *channels;
+
+	struct mutex lock;
 	unsigned long *allocated_channels;
 };
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01   ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: linux-tegra, digetx, linux-kernel, dri-devel, Mikko Perttunen

Host1x has a feature called MLOCKs which allow a certain class
(~HW unit) to be locked (in the mutex sense) and unlocked during
command execution, preventing other channels from accessing the
class while it is locked. This is necessary to prevent concurrent
jobs from messing up class state.

This has not been necessary so far since due to our channel allocation
model, there has only been a single hardware channel submitting
commands to each class. Future patches, however, change the channel
allocation model to allow hardware-scheduled concurrency, and as such
we need to start locking.

This patch implements locking on all platforms from Tegra20 to
Tegra186.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/cdma.c                      |   1 +
 drivers/gpu/host1x/cdma.h                      |   1 +
 drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
 drivers/gpu/host1x/hw/channel_hw.c             |  71 ++++++++++----
 drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
 14 files changed, 257 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 28541b280739..f787cfe69c11 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -232,6 +232,7 @@ static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 	}
 
 	cdma->timeout.client = job->client;
+	cdma->timeout.class = job->class;
 	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
index 286d49386be9..e72660fc83c9 100644
--- a/drivers/gpu/host1x/cdma.h
+++ b/drivers/gpu/host1x/cdma.h
@@ -59,6 +59,7 @@ struct buffer_timeout {
 	ktime_t start_ktime;		/* starting time */
 	/* context timeout information */
 	int client;
+	u32 class;
 };
 
 enum cdma_event {
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index ce320534cbed..4d5970d863d5 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -16,6 +16,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/iopoll.h>
 #include <linux/slab.h>
 #include <linux/scatterlist.h>
 #include <linux/dma-mapping.h>
@@ -243,6 +244,125 @@ static void cdma_resume(struct host1x_cdma *cdma, u32 getptr)
 	cdma_timeout_restart(cdma, getptr);
 }
 
+static int mlock_id_for_class(unsigned int class)
+{
+#if HOST1X_HW >= 6
+	switch (class)
+	{
+	case HOST1X_CLASS_HOST1X:
+		return 0;
+	case HOST1X_CLASS_VIC:
+		return 17;
+	default:
+		return -EINVAL;
+	}
+#else
+	switch (class)
+	{
+	case HOST1X_CLASS_HOST1X:
+		return 0;
+	case HOST1X_CLASS_GR2D:
+		return 1;
+	case HOST1X_CLASS_GR2D_SB:
+		return 2;
+	case HOST1X_CLASS_VIC:
+		return 3;
+	case HOST1X_CLASS_GR3D:
+		return 4;
+	default:
+		return -EINVAL;
+	}
+#endif
+}
+
+static void timeout_release_mlock(struct host1x_cdma *cdma)
+{
+#if HOST1X_HW >= 6
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host = cdma_to_host1x(cdma);
+	u32 pb_pos, pb_temp[3], val;
+	int err, mlock_id;
+
+	if (!host->hv_regs)
+		return;
+
+	mlock_id = mlock_id_for_class(cdma->timeout.class);
+	if (WARN(mlock_id < 0, "Invalid class ID"))
+		return;
+
+	val = host1x_hypervisor_readl(host, HOST1X_HV_MLOCK(mlock_id));
+	if (!HOST1X_HV_MLOCK_LOCKED_V(val) ||
+	    HOST1X_HV_MLOCK_CH_V(val) != ch->id)
+	{
+		/* Channel is not holding the MLOCK, nothing to release. */
+		return;
+	}
+
+	/*
+	 * On Tegra186, there is no register to unlock an MLOCK (don't ask me
+	 * why). As such, we have to execute a release_mlock instruction to
+	 * do it. We do this by backing up the first three opcodes of the
+	 * pushbuffer and replacing them with our own short sequence to do
+	 * the unlocking. We set the .pos field to 12, which causes DMAEND
+	 * to be set accordingly such that only the three opcodes we set
+	 * here are executed before CDMA stops. Finally we restore the value
+	 * of pos and pushbuffer contents.
+	 */
+
+	pb_pos = cdma->push_buffer.pos;
+	memcpy(pb_temp, cdma->push_buffer.mapped, ARRAY_SIZE(pb_temp) * 4);
+
+	{
+		u32 *pb = cdma->push_buffer.mapped;
+		pb[0] = host1x_opcode_acquire_mlock(cdma->timeout.class);
+		pb[1] = host1x_opcode_setclass(cdma->timeout.class, 0, 0);
+		pb[2] = host1x_opcode_release_mlock(cdma->timeout.class);
+	}
+
+	/* Flush writecombine buffer */
+	wmb();
+
+	cdma->push_buffer.pos = ARRAY_SIZE(pb_temp) * 4;
+
+	cdma_resume(cdma, 0);
+
+	/* Wait until the release_mlock opcode has been executed */
+	err = readl_relaxed_poll_timeout(
+		host->hv_regs + HOST1X_HV_MLOCK(mlock_id), val,
+		!HOST1X_HV_MLOCK_LOCKED_V(val) ||
+			HOST1X_HV_MLOCK_CH_V(val) != ch->id,
+		10, 10000);
+	WARN(err, "Failed to unlock mlock %u\n", mlock_id);
+
+	cdma_freeze(cdma);
+
+	/* Restore original pushbuffer state */
+	cdma->push_buffer.pos = pb_pos;
+	memcpy(cdma->push_buffer.mapped, pb_temp, ARRAY_SIZE(pb_temp) * 4);
+	wmb();
+#else
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host = cdma_to_host1x(cdma);
+	int mlock_id;
+	u32 val;
+
+	mlock_id = mlock_id_for_class(cdma->timeout.class);
+	if (WARN(mlock_id < 0, "Invalid class ID"))
+		return;
+
+	val = host1x_sync_readl(host, HOST1X_SYNC_MLOCK_OWNER(mlock_id));
+	if (!HOST1X_SYNC_MLOCK_OWNER_CH_OWNS_V(val) ||
+	    HOST1X_SYNC_MLOCK_OWNER_CHID_V(val) != ch->id)
+	{
+		/* Channel is not holding the MLOCK, nothing to release. */
+		return;
+	}
+
+	/* Unlock MLOCK */
+	host1x_sync_writel(host, 0x0, HOST1X_SYNC_MLOCK(mlock_id));
+#endif
+}
+
 /*
  * If this timeout fires, it indicates the current sync_queue entry has
  * exceeded its TTL and the userctx should be timed out and remaining
@@ -293,6 +413,8 @@ static void cdma_timeout_handler(struct work_struct *work)
 	/* stop HW, resetting channel/module */
 	host1x_hw_cdma_freeze(host1x, cdma);
 
+	timeout_release_mlock(cdma);
+
 	host1x_cdma_update_sync_queue(cdma, ch->dev);
 	mutex_unlock(&cdma->lock);
 }
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 246b78c41281..f80fb8be38e6 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -89,6 +89,34 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
 			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
 }
 
+static void channel_submit_serialize(struct host1x_channel *ch,
+				     struct host1x_syncpt *sp)
+{
+#if HOST1X_HW >= 6
+	/*
+	 * On T186, due to a hw issue, we need to issue an mlock acquire
+	 * for HOST1X class. It is still interpreted as a no-op in
+	 * hardware.
+	 */
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_acquire_mlock(HOST1X_CLASS_HOST1X),
+		host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_nonincr(host1x_uclass_wait_syncpt_r(), 1),
+		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
+					      host1x_syncpt_read_max(sp)));
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_release_mlock(HOST1X_CLASS_HOST1X),
+		HOST1X_OPCODE_NOP);
+#else
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+				       host1x_uclass_wait_syncpt_r(), 1),
+		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
+					      host1x_syncpt_read_max(sp)));
+#endif
+}
+
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
@@ -96,6 +124,7 @@ static int channel_submit(struct host1x_job *job)
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
+	u32 mlock;
 	int err;
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
@@ -128,17 +157,12 @@ static int channel_submit(struct host1x_job *job)
 		goto error;
 	}
 
-	if (job->serialize) {
-		/*
-		 * Force serialization by inserting a host wait for the
-		 * previous job to finish before this one can commence.
-		 */
-		host1x_cdma_push(&ch->cdma,
-				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
-					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
-					host1x_syncpt_read_max(sp)));
-	}
+	/*
+	 * Force serialization by inserting a host wait for the
+	 * previous job to finish before this one can commence.
+	 */
+	if (job->serialize)
+		channel_submit_serialize(ch, sp);
 
 	/* Synchronize base register to allow using it for relative waiting */
 	if (sp->base)
@@ -149,16 +173,29 @@ static int channel_submit(struct host1x_job *job)
 	/* assign syncpoint to channel */
 	host1x_hw_syncpt_assign_channel(host, sp, ch);
 
-	job->syncpt_end = syncval;
-
-	/* add a setclass for modules that require it */
-	if (job->class)
+	/* acquire MLOCK and set channel class to specified class */
+	mlock = HOST1X_HW >= 6 ? job->class : mlock_id_for_class(job->class);
+	if (job->class) {
 		host1x_cdma_push(&ch->cdma,
-				 host1x_opcode_setclass(job->class, 0, 0),
-				 HOST1X_OPCODE_NOP);
+				 host1x_opcode_acquire_mlock(mlock),
+				 host1x_opcode_setclass(job->class, 0, 0));
+	}
 
 	submit_gathers(job);
 
+	if (job->class) {
+		/*
+		 * Push additional increment to catch jobs that crash before
+		 * finishing their gather, not reaching the unlock opcode.
+		 */
+		syncval = host1x_syncpt_incr_max(sp, 1);
+		host1x_cdma_push(&ch->cdma,
+			host1x_opcode_imm_incr_syncpt(0, job->syncpt_id),
+			host1x_opcode_release_mlock(mlock));
+	}
+
+	job->syncpt_end = syncval;
+
 	/* end CDMA submit & stash pinned hMems into sync queue */
 	host1x_cdma_end(&ch->cdma, job);
 
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index 5f0fb866efa8..6a2c9f905acc 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -138,6 +138,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x02_hardware.h b/drivers/gpu/host1x/hw/host1x02_hardware.h
index 154901860bc6..c524c6c8d82f 100644
--- a/drivers/gpu/host1x/hw/host1x02_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x02_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x04_hardware.h b/drivers/gpu/host1x/hw/host1x04_hardware.h
index de1a38175328..8dedd77b7a1b 100644
--- a/drivers/gpu/host1x/hw/host1x04_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x04_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x05_hardware.h b/drivers/gpu/host1x/hw/host1x05_hardware.h
index 2937ebb6be11..6aafcb5e97e6 100644
--- a/drivers/gpu/host1x/hw/host1x05_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x05_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 3039c92ea605..60147d26ad9b 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 31238c285d46..4630fec2237a 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x02_sync.h b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
index 540c7b65995f..dec8e812a114 100644
--- a/drivers/gpu/host1x/hw/hw_host1x02_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x04_sync.h b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
index 3d6c8ec65934..9a951a4db07f 100644
--- a/drivers/gpu/host1x/hw/hw_host1x04_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x05_sync.h b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
index ca10eee5045c..e29e2e19a60b 100644
--- a/drivers/gpu/host1x/hw/hw_host1x05_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
index c05dab8a178b..be73530c3585 100644
--- a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
+++ b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
@@ -18,6 +18,11 @@
 #define HOST1X_HV_SYNCPT_PROT_EN			0x1ac4
 #define HOST1X_HV_SYNCPT_PROT_EN_CH_EN			BIT(1)
 #define HOST1X_HV_CH_KERNEL_FILTER_GBUFFER(x)		(0x2020 + (x * 4))
+#define HOST1X_HV_MLOCK(x)				(0x2030 + (x * 4))
+#define HOST1X_HV_MLOCK_CH(x)				(((x) & 0x3f) << 2)
+#define HOST1X_HV_MLOCK_CH_V(x)				(((x) >> 2) & 0x3f)
+#define HOST1X_HV_MLOCK_LOCKED				BIT(0)
+#define HOST1X_HV_MLOCK_LOCKED_V(x)			((x) & 0x1)
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL			0x233c
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL_ADDR(x)		(x)
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL_CHANNEL(x)		((x) << 16)
-- 
2.14.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 04/10] gpu: host1x: Lock classes during job submission
@ 2017-11-05 11:01   ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Host1x has a feature called MLOCKs which allow a certain class
(~HW unit) to be locked (in the mutex sense) and unlocked during
command execution, preventing other channels from accessing the
class while it is locked. This is necessary to prevent concurrent
jobs from messing up class state.

This has not been necessary so far since due to our channel allocation
model, there has only been a single hardware channel submitting
commands to each class. Future patches, however, change the channel
allocation model to allow hardware-scheduled concurrency, and as such
we need to start locking.

This patch implements locking on all platforms from Tegra20 to
Tegra186.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/cdma.c                      |   1 +
 drivers/gpu/host1x/cdma.h                      |   1 +
 drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
 drivers/gpu/host1x/hw/channel_hw.c             |  71 ++++++++++----
 drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
 drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
 drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
 14 files changed, 257 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 28541b280739..f787cfe69c11 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -232,6 +232,7 @@ static void cdma_start_timer_locked(struct host1x_cdma *cdma,
 	}
 
 	cdma->timeout.client = job->client;
+	cdma->timeout.class = job->class;
 	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
 	cdma->timeout.syncpt_val = job->syncpt_end;
 	cdma->timeout.start_ktime = ktime_get();
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
index 286d49386be9..e72660fc83c9 100644
--- a/drivers/gpu/host1x/cdma.h
+++ b/drivers/gpu/host1x/cdma.h
@@ -59,6 +59,7 @@ struct buffer_timeout {
 	ktime_t start_ktime;		/* starting time */
 	/* context timeout information */
 	int client;
+	u32 class;
 };
 
 enum cdma_event {
diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index ce320534cbed..4d5970d863d5 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -16,6 +16,7 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/iopoll.h>
 #include <linux/slab.h>
 #include <linux/scatterlist.h>
 #include <linux/dma-mapping.h>
@@ -243,6 +244,125 @@ static void cdma_resume(struct host1x_cdma *cdma, u32 getptr)
 	cdma_timeout_restart(cdma, getptr);
 }
 
+static int mlock_id_for_class(unsigned int class)
+{
+#if HOST1X_HW >= 6
+	switch (class)
+	{
+	case HOST1X_CLASS_HOST1X:
+		return 0;
+	case HOST1X_CLASS_VIC:
+		return 17;
+	default:
+		return -EINVAL;
+	}
+#else
+	switch (class)
+	{
+	case HOST1X_CLASS_HOST1X:
+		return 0;
+	case HOST1X_CLASS_GR2D:
+		return 1;
+	case HOST1X_CLASS_GR2D_SB:
+		return 2;
+	case HOST1X_CLASS_VIC:
+		return 3;
+	case HOST1X_CLASS_GR3D:
+		return 4;
+	default:
+		return -EINVAL;
+	}
+#endif
+}
+
+static void timeout_release_mlock(struct host1x_cdma *cdma)
+{
+#if HOST1X_HW >= 6
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host = cdma_to_host1x(cdma);
+	u32 pb_pos, pb_temp[3], val;
+	int err, mlock_id;
+
+	if (!host->hv_regs)
+		return;
+
+	mlock_id = mlock_id_for_class(cdma->timeout.class);
+	if (WARN(mlock_id < 0, "Invalid class ID"))
+		return;
+
+	val = host1x_hypervisor_readl(host, HOST1X_HV_MLOCK(mlock_id));
+	if (!HOST1X_HV_MLOCK_LOCKED_V(val) ||
+	    HOST1X_HV_MLOCK_CH_V(val) != ch->id)
+	{
+		/* Channel is not holding the MLOCK, nothing to release. */
+		return;
+	}
+
+	/*
+	 * On Tegra186, there is no register to unlock an MLOCK (don't ask me
+	 * why). As such, we have to execute a release_mlock instruction to
+	 * do it. We do this by backing up the first three opcodes of the
+	 * pushbuffer and replacing them with our own short sequence to do
+	 * the unlocking. We set the .pos field to 12, which causes DMAEND
+	 * to be set accordingly such that only the three opcodes we set
+	 * here are executed before CDMA stops. Finally we restore the value
+	 * of pos and pushbuffer contents.
+	 */
+
+	pb_pos = cdma->push_buffer.pos;
+	memcpy(pb_temp, cdma->push_buffer.mapped, ARRAY_SIZE(pb_temp) * 4);
+
+	{
+		u32 *pb = cdma->push_buffer.mapped;
+		pb[0] = host1x_opcode_acquire_mlock(cdma->timeout.class);
+		pb[1] = host1x_opcode_setclass(cdma->timeout.class, 0, 0);
+		pb[2] = host1x_opcode_release_mlock(cdma->timeout.class);
+	}
+
+	/* Flush writecombine buffer */
+	wmb();
+
+	cdma->push_buffer.pos = ARRAY_SIZE(pb_temp) * 4;
+
+	cdma_resume(cdma, 0);
+
+	/* Wait until the release_mlock opcode has been executed */
+	err = readl_relaxed_poll_timeout(
+		host->hv_regs + HOST1X_HV_MLOCK(mlock_id), val,
+		!HOST1X_HV_MLOCK_LOCKED_V(val) ||
+			HOST1X_HV_MLOCK_CH_V(val) != ch->id,
+		10, 10000);
+	WARN(err, "Failed to unlock mlock %u\n", mlock_id);
+
+	cdma_freeze(cdma);
+
+	/* Restore original pushbuffer state */
+	cdma->push_buffer.pos = pb_pos;
+	memcpy(cdma->push_buffer.mapped, pb_temp, ARRAY_SIZE(pb_temp) * 4);
+	wmb();
+#else
+	struct host1x_channel *ch = cdma_to_channel(cdma);
+	struct host1x *host = cdma_to_host1x(cdma);
+	int mlock_id;
+	u32 val;
+
+	mlock_id = mlock_id_for_class(cdma->timeout.class);
+	if (WARN(mlock_id < 0, "Invalid class ID"))
+		return;
+
+	val = host1x_sync_readl(host, HOST1X_SYNC_MLOCK_OWNER(mlock_id));
+	if (!HOST1X_SYNC_MLOCK_OWNER_CH_OWNS_V(val) ||
+	    HOST1X_SYNC_MLOCK_OWNER_CHID_V(val) != ch->id)
+	{
+		/* Channel is not holding the MLOCK, nothing to release. */
+		return;
+	}
+
+	/* Unlock MLOCK */
+	host1x_sync_writel(host, 0x0, HOST1X_SYNC_MLOCK(mlock_id));
+#endif
+}
+
 /*
  * If this timeout fires, it indicates the current sync_queue entry has
  * exceeded its TTL and the userctx should be timed out and remaining
@@ -293,6 +413,8 @@ static void cdma_timeout_handler(struct work_struct *work)
 	/* stop HW, resetting channel/module */
 	host1x_hw_cdma_freeze(host1x, cdma);
 
+	timeout_release_mlock(cdma);
+
 	host1x_cdma_update_sync_queue(cdma, ch->dev);
 	mutex_unlock(&cdma->lock);
 }
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 246b78c41281..f80fb8be38e6 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -89,6 +89,34 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
 			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
 }
 
+static void channel_submit_serialize(struct host1x_channel *ch,
+				     struct host1x_syncpt *sp)
+{
+#if HOST1X_HW >= 6
+	/*
+	 * On T186, due to a hw issue, we need to issue an mlock acquire
+	 * for HOST1X class. It is still interpreted as a no-op in
+	 * hardware.
+	 */
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_acquire_mlock(HOST1X_CLASS_HOST1X),
+		host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_nonincr(host1x_uclass_wait_syncpt_r(), 1),
+		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
+					      host1x_syncpt_read_max(sp)));
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_release_mlock(HOST1X_CLASS_HOST1X),
+		HOST1X_OPCODE_NOP);
+#else
+	host1x_cdma_push(&ch->cdma,
+		host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+				       host1x_uclass_wait_syncpt_r(), 1),
+		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
+					      host1x_syncpt_read_max(sp)));
+#endif
+}
+
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
@@ -96,6 +124,7 @@ static int channel_submit(struct host1x_job *job)
 	u32 user_syncpt_incrs = job->syncpt_incrs;
 	u32 prev_max = 0;
 	u32 syncval;
+	u32 mlock;
 	int err;
 	struct host1x_waitlist *completed_waiter = NULL;
 	struct host1x *host = dev_get_drvdata(ch->dev->parent);
@@ -128,17 +157,12 @@ static int channel_submit(struct host1x_job *job)
 		goto error;
 	}
 
-	if (job->serialize) {
-		/*
-		 * Force serialization by inserting a host wait for the
-		 * previous job to finish before this one can commence.
-		 */
-		host1x_cdma_push(&ch->cdma,
-				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
-					host1x_uclass_wait_syncpt_r(), 1),
-				 host1x_class_host_wait_syncpt(job->syncpt_id,
-					host1x_syncpt_read_max(sp)));
-	}
+	/*
+	 * Force serialization by inserting a host wait for the
+	 * previous job to finish before this one can commence.
+	 */
+	if (job->serialize)
+		channel_submit_serialize(ch, sp);
 
 	/* Synchronize base register to allow using it for relative waiting */
 	if (sp->base)
@@ -149,16 +173,29 @@ static int channel_submit(struct host1x_job *job)
 	/* assign syncpoint to channel */
 	host1x_hw_syncpt_assign_channel(host, sp, ch);
 
-	job->syncpt_end = syncval;
-
-	/* add a setclass for modules that require it */
-	if (job->class)
+	/* acquire MLOCK and set channel class to specified class */
+	mlock = HOST1X_HW >= 6 ? job->class : mlock_id_for_class(job->class);
+	if (job->class) {
 		host1x_cdma_push(&ch->cdma,
-				 host1x_opcode_setclass(job->class, 0, 0),
-				 HOST1X_OPCODE_NOP);
+				 host1x_opcode_acquire_mlock(mlock),
+				 host1x_opcode_setclass(job->class, 0, 0));
+	}
 
 	submit_gathers(job);
 
+	if (job->class) {
+		/*
+		 * Push additional increment to catch jobs that crash before
+		 * finishing their gather, not reaching the unlock opcode.
+		 */
+		syncval = host1x_syncpt_incr_max(sp, 1);
+		host1x_cdma_push(&ch->cdma,
+			host1x_opcode_imm_incr_syncpt(0, job->syncpt_id),
+			host1x_opcode_release_mlock(mlock));
+	}
+
+	job->syncpt_end = syncval;
+
 	/* end CDMA submit & stash pinned hMems into sync queue */
 	host1x_cdma_end(&ch->cdma, job);
 
diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
index 5f0fb866efa8..6a2c9f905acc 100644
--- a/drivers/gpu/host1x/hw/host1x01_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
@@ -138,6 +138,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x02_hardware.h b/drivers/gpu/host1x/hw/host1x02_hardware.h
index 154901860bc6..c524c6c8d82f 100644
--- a/drivers/gpu/host1x/hw/host1x02_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x02_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x04_hardware.h b/drivers/gpu/host1x/hw/host1x04_hardware.h
index de1a38175328..8dedd77b7a1b 100644
--- a/drivers/gpu/host1x/hw/host1x04_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x04_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x05_hardware.h b/drivers/gpu/host1x/hw/host1x05_hardware.h
index 2937ebb6be11..6aafcb5e97e6 100644
--- a/drivers/gpu/host1x/hw/host1x05_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x05_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 3039c92ea605..60147d26ad9b 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_acquire_mlock(unsigned id)
+{
+	return (14 << 28) | id;
+}
+
+static inline u32 host1x_opcode_release_mlock(unsigned id)
+{
+	return (14 << 28) | (1 << 24) | id;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
index 31238c285d46..4630fec2237a 100644
--- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x02_sync.h b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
index 540c7b65995f..dec8e812a114 100644
--- a/drivers/gpu/host1x/hw/hw_host1x02_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x04_sync.h b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
index 3d6c8ec65934..9a951a4db07f 100644
--- a/drivers/gpu/host1x/hw/hw_host1x04_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x05_sync.h b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
index ca10eee5045c..e29e2e19a60b 100644
--- a/drivers/gpu/host1x/hw/hw_host1x05_sync.h
+++ b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
@@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
 }
 #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
 	host1x_sync_ip_busy_timeout_r()
+static inline u32 host1x_sync_mlock_r(unsigned int id)
+{
+	return 0x2c0 + id * REGISTER_STRIDE;
+}
+#define HOST1X_SYNC_MLOCK(id) \
+	host1x_sync_mlock_r(id)
 static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
 {
 	return 0x340 + id * REGISTER_STRIDE;
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
index c05dab8a178b..be73530c3585 100644
--- a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
+++ b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
@@ -18,6 +18,11 @@
 #define HOST1X_HV_SYNCPT_PROT_EN			0x1ac4
 #define HOST1X_HV_SYNCPT_PROT_EN_CH_EN			BIT(1)
 #define HOST1X_HV_CH_KERNEL_FILTER_GBUFFER(x)		(0x2020 + (x * 4))
+#define HOST1X_HV_MLOCK(x)				(0x2030 + (x * 4))
+#define HOST1X_HV_MLOCK_CH(x)				(((x) & 0x3f) << 2)
+#define HOST1X_HV_MLOCK_CH_V(x)				(((x) >> 2) & 0x3f)
+#define HOST1X_HV_MLOCK_LOCKED				BIT(0)
+#define HOST1X_HV_MLOCK_LOCKED_V(x)			((x) & 0x1)
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL			0x233c
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL_ADDR(x)		(x)
 #define HOST1X_HV_CMDFIFO_PEEK_CTRL_CHANNEL(x)		((x) << 16)
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 05/10] gpu: host1x: Add job done callback
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
                   ` (2 preceding siblings ...)
  2017-11-05 11:01   ` Mikko Perttunen
@ 2017-11-05 11:01 ` Mikko Perttunen
       [not found] ` <20171105110118.15142-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Allow job submitters to set a callback to be called when the job has
completed. The jobs are stored and the callbacks called outside the
CDMA lock area to allow the callbacks to do CDMA-requiring operations
like freeing channels.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/host1x/cdma.c | 44 +++++++++++++++++++++++++++++++++-----------
 include/linux/host1x.h    |  4 ++++
 2 files changed, 37 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index f787cfe69c11..57221d199d33 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -251,17 +251,24 @@ static void stop_cdma_timer_locked(struct host1x_cdma *cdma)
 	cdma->timeout.client = 0;
 }
 
-/*
- * For all sync queue entries that have already finished according to the
- * current sync point registers:
- *  - unpin & unref their mems
- *  - pop their push buffer slots
- *  - remove them from the sync queue
+/**
+ * update_cdma_locked() - Update CDMA sync queue
+ * @cdma: CDMA instance to update
+ * @done_jobs: List that finished jobs will be added to
+ *
+ * Go through the CDMA's sync queue, and for each job that has been finished,
+ * - unpin it
+ * - pop its push buffer slots
+ * - remove it from the sync queue
+ * - add it to the done_jobs list.
+ *
  * This is normally called from the host code's worker thread, but can be
  * called manually if necessary.
- * Must be called with the cdma lock held.
+ *
+ * Must be called with the CDMA lock held.
  */
-static void update_cdma_locked(struct host1x_cdma *cdma)
+static void update_cdma_locked(struct host1x_cdma *cdma,
+			       struct list_head *done_jobs)
 {
 	bool signal = false;
 	struct host1x *host1x = cdma_to_host1x(cdma);
@@ -305,8 +312,7 @@ static void update_cdma_locked(struct host1x_cdma *cdma)
 				signal = true;
 		}
 
-		list_del(&job->list);
-		host1x_job_put(job);
+		list_move_tail(&job->list, done_jobs);
 	}
 
 	if (cdma->event == CDMA_EVENT_SYNC_QUEUE_EMPTY &&
@@ -542,7 +548,23 @@ void host1x_cdma_end(struct host1x_cdma *cdma,
  */
 void host1x_cdma_update(struct host1x_cdma *cdma)
 {
+	struct host1x_job *job, *tmp;
+	LIST_HEAD(done_jobs);
+
 	mutex_lock(&cdma->lock);
-	update_cdma_locked(cdma);
+	update_cdma_locked(cdma, &done_jobs);
 	mutex_unlock(&cdma->lock);
+
+	/*
+	 * The done callback may want to free the channel, which requires
+	 * taking the CDMA lock, so we need to do it outside the above lock
+	 * region.
+	 */
+	list_for_each_entry_safe(job, tmp, &done_jobs, list) {
+		if (job->done)
+			job->done(job);
+
+		list_del(&job->list);
+		host1x_job_put(job);
+	}
 }
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 630b1a98ab58..f931d28a68ff 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -253,6 +253,10 @@ struct host1x_job {
 	/* Check if class belongs to the unit */
 	int (*is_valid_class)(u32 class);
 
+	/* Job done callback */
+	void (*done)(struct host1x_job *job);
+	void *callback_data;
+
 	/* Request a SETCLASS to this class */
 	u32 class;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/10] drm/tegra: Deliver job completion callback to client
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01     ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w, jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: digetx-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mikko Perttunen

To allow client drivers to free resources when jobs have completed,
deliver job completion callbacks to them. This requires adding
reference counting to context objects, as job completion can happen
after the userspace application has closed the context. As such,
also add kref-based refcounting for contexts.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 2cdd054520bf..3e2a4a19412e 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	return 0;
 }
 
-static void tegra_drm_context_free(struct tegra_drm_context *context)
+static void tegra_drm_context_free(struct kref *ref)
 {
+	struct tegra_drm_context *context =
+		container_of(ref, struct tegra_drm_context, ref);
+
 	context->client->ops->close_channel(context);
 	kfree(context);
 }
@@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
 	return 0;
 }
 
+static void tegra_drm_job_done(struct host1x_job *job)
+{
+	struct tegra_drm_context *context = job->callback_data;
+
+	if (context->client->ops->submit_done)
+		context->client->ops->submit_done(context);
+
+	kref_put(&context->ref, tegra_drm_context_free);
+}
+
 int tegra_drm_submit(struct tegra_drm_context *context,
 		     struct drm_tegra_submit *args, struct drm_device *drm,
 		     struct drm_file *file)
@@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->syncpt_id = syncpt.id;
 	job->timeout = 10000;
 
+	job->done = tegra_drm_job_done;
+	job->callback_data = context;
+
 	if (args->timeout && args->timeout < 10000)
 		job->timeout = args->timeout;
 
@@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	if (err)
 		goto fail;
 
+	kref_get(&context->ref);
+
 	err = host1x_job_submit(job);
 	if (err) {
+		kref_put(&context->ref, tegra_drm_context_free);
 		host1x_job_unpin(job);
 		goto fail;
 	}
@@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
 	if (err < 0)
 		kfree(context);
 
+	kref_init(&context->ref);
+
 	mutex_unlock(&fpriv->lock);
 	return err;
 }
@@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 	}
 
 	idr_remove(&fpriv->contexts, context->id);
-	tegra_drm_context_free(context);
+	kref_put(&context->ref, tegra_drm_context_free);
 
 unlock:
 	mutex_unlock(&fpriv->lock);
@@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
 {
 	struct tegra_drm_context *context = p;
 
-	tegra_drm_context_free(context);
+	kref_put(&context->ref, tegra_drm_context_free);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 063f5d397526..079aebb3fb38 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -13,6 +13,7 @@
 #include <uapi/drm/tegra_drm.h>
 #include <linux/host1x.h>
 #include <linux/iova.h>
+#include <linux/kref.h>
 #include <linux/of_gpio.h>
 
 #include <drm/drmP.h>
@@ -74,6 +75,8 @@ struct tegra_drm {
 struct tegra_drm_client;
 
 struct tegra_drm_context {
+	struct kref ref;
+
 	struct tegra_drm_client *client;
 	struct host1x_channel *channel;
 	unsigned int id;
@@ -88,6 +91,7 @@ struct tegra_drm_client_ops {
 	int (*submit)(struct tegra_drm_context *context,
 		      struct drm_tegra_submit *args, struct drm_device *drm,
 		      struct drm_file *file);
+	void (*submit_done)(struct tegra_drm_context *context);
 };
 
 int tegra_drm_submit(struct tegra_drm_context *context,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 06/10] drm/tegra: Deliver job completion callback to client
@ 2017-11-05 11:01     ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

To allow client drivers to free resources when jobs have completed,
deliver job completion callbacks to them. This requires adding
reference counting to context objects, as job completion can happen
after the userspace application has closed the context. As such,
also add kref-based refcounting for contexts.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
 drivers/gpu/drm/tegra/drm.h |  4 ++++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 2cdd054520bf..3e2a4a19412e 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
 	return 0;
 }
 
-static void tegra_drm_context_free(struct tegra_drm_context *context)
+static void tegra_drm_context_free(struct kref *ref)
 {
+	struct tegra_drm_context *context =
+		container_of(ref, struct tegra_drm_context, ref);
+
 	context->client->ops->close_channel(context);
 	kfree(context);
 }
@@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
 	return 0;
 }
 
+static void tegra_drm_job_done(struct host1x_job *job)
+{
+	struct tegra_drm_context *context = job->callback_data;
+
+	if (context->client->ops->submit_done)
+		context->client->ops->submit_done(context);
+
+	kref_put(&context->ref, tegra_drm_context_free);
+}
+
 int tegra_drm_submit(struct tegra_drm_context *context,
 		     struct drm_tegra_submit *args, struct drm_device *drm,
 		     struct drm_file *file)
@@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	job->syncpt_id = syncpt.id;
 	job->timeout = 10000;
 
+	job->done = tegra_drm_job_done;
+	job->callback_data = context;
+
 	if (args->timeout && args->timeout < 10000)
 		job->timeout = args->timeout;
 
@@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 	if (err)
 		goto fail;
 
+	kref_get(&context->ref);
+
 	err = host1x_job_submit(job);
 	if (err) {
+		kref_put(&context->ref, tegra_drm_context_free);
 		host1x_job_unpin(job);
 		goto fail;
 	}
@@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
 	if (err < 0)
 		kfree(context);
 
+	kref_init(&context->ref);
+
 	mutex_unlock(&fpriv->lock);
 	return err;
 }
@@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
 	}
 
 	idr_remove(&fpriv->contexts, context->id);
-	tegra_drm_context_free(context);
+	kref_put(&context->ref, tegra_drm_context_free);
 
 unlock:
 	mutex_unlock(&fpriv->lock);
@@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
 {
 	struct tegra_drm_context *context = p;
 
-	tegra_drm_context_free(context);
+	kref_put(&context->ref, tegra_drm_context_free);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 063f5d397526..079aebb3fb38 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -13,6 +13,7 @@
 #include <uapi/drm/tegra_drm.h>
 #include <linux/host1x.h>
 #include <linux/iova.h>
+#include <linux/kref.h>
 #include <linux/of_gpio.h>
 
 #include <drm/drmP.h>
@@ -74,6 +75,8 @@ struct tegra_drm {
 struct tegra_drm_client;
 
 struct tegra_drm_context {
+	struct kref ref;
+
 	struct tegra_drm_client *client;
 	struct host1x_channel *channel;
 	unsigned int id;
@@ -88,6 +91,7 @@ struct tegra_drm_client_ops {
 	int (*submit)(struct tegra_drm_context *context,
 		      struct drm_tegra_submit *args, struct drm_device *drm,
 		      struct drm_file *file);
+	void (*submit_done)(struct tegra_drm_context *context);
 };
 
 int tegra_drm_submit(struct tegra_drm_context *context,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/10] drm/tegra: Make syncpoints be per-context
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01     ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding-Re5JQEeQqe8AvxtiuMwx3w, jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: digetx-Re5JQEeQqe8AvxtiuMwx3w,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Mikko Perttunen

As a preparation for each context potentially being able to have a
separate hardware channel, and thus requiring a separate syncpoint,
move syncpoints to be stored inside each context instead of global
client data.

Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 drivers/gpu/drm/tegra/drm.c  | 8 ++++----
 drivers/gpu/drm/tegra/drm.h  | 1 +
 drivers/gpu/drm/tegra/gr2d.c | 2 ++
 drivers/gpu/drm/tegra/gr3d.c | 2 ++
 drivers/gpu/drm/tegra/vic.c  | 2 ++
 5 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 3e2a4a19412e..b964e18e3058 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -783,12 +783,12 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 		goto unlock;
 	}
 
-	if (args->index >= context->client->base.num_syncpts) {
+	if (args->index >= 1) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	syncpt = context->client->base.syncpts[args->index];
+	syncpt = context->syncpt;
 	args->id = host1x_syncpt_id(syncpt);
 
 unlock:
@@ -837,12 +837,12 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 		goto unlock;
 	}
 
-	if (args->syncpt >= context->client->base.num_syncpts) {
+	if (args->syncpt >= 1) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	syncpt = context->client->base.syncpts[args->syncpt];
+	syncpt = context->syncpt;
 
 	base = host1x_syncpt_get_base(syncpt);
 	if (!base) {
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 079aebb3fb38..11d690846fd0 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -79,6 +79,7 @@ struct tegra_drm_context {
 
 	struct tegra_drm_client *client;
 	struct host1x_channel *channel;
+	struct host1x_syncpt *syncpt;
 	unsigned int id;
 };
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 6ea070da7718..3db3bcac48b9 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -76,6 +76,8 @@ static int gr2d_open_channel(struct tegra_drm_client *client,
 	if (!context->channel)
 		return -ENOMEM;
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index cee2ab645cde..279438342c8c 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -86,6 +86,8 @@ static int gr3d_open_channel(struct tegra_drm_client *client,
 	if (!context->channel)
 		return -ENOMEM;
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 6697a21a250d..efe5f3af933e 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -240,6 +240,8 @@ static int vic_open_channel(struct tegra_drm_client *client,
 		return -ENOMEM;
 	}
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 07/10] drm/tegra: Make syncpoints be per-context
@ 2017-11-05 11:01     ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

As a preparation for each context potentially being able to have a
separate hardware channel, and thus requiring a separate syncpoint,
move syncpoints to be stored inside each context instead of global
client data.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c  | 8 ++++----
 drivers/gpu/drm/tegra/drm.h  | 1 +
 drivers/gpu/drm/tegra/gr2d.c | 2 ++
 drivers/gpu/drm/tegra/gr3d.c | 2 ++
 drivers/gpu/drm/tegra/vic.c  | 2 ++
 5 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 3e2a4a19412e..b964e18e3058 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -783,12 +783,12 @@ static int tegra_get_syncpt(struct drm_device *drm, void *data,
 		goto unlock;
 	}
 
-	if (args->index >= context->client->base.num_syncpts) {
+	if (args->index >= 1) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	syncpt = context->client->base.syncpts[args->index];
+	syncpt = context->syncpt;
 	args->id = host1x_syncpt_id(syncpt);
 
 unlock:
@@ -837,12 +837,12 @@ static int tegra_get_syncpt_base(struct drm_device *drm, void *data,
 		goto unlock;
 	}
 
-	if (args->syncpt >= context->client->base.num_syncpts) {
+	if (args->syncpt >= 1) {
 		err = -EINVAL;
 		goto unlock;
 	}
 
-	syncpt = context->client->base.syncpts[args->syncpt];
+	syncpt = context->syncpt;
 
 	base = host1x_syncpt_get_base(syncpt);
 	if (!base) {
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 079aebb3fb38..11d690846fd0 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -79,6 +79,7 @@ struct tegra_drm_context {
 
 	struct tegra_drm_client *client;
 	struct host1x_channel *channel;
+	struct host1x_syncpt *syncpt;
 	unsigned int id;
 };
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 6ea070da7718..3db3bcac48b9 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -76,6 +76,8 @@ static int gr2d_open_channel(struct tegra_drm_client *client,
 	if (!context->channel)
 		return -ENOMEM;
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index cee2ab645cde..279438342c8c 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -86,6 +86,8 @@ static int gr3d_open_channel(struct tegra_drm_client *client,
 	if (!context->channel)
 		return -ENOMEM;
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 6697a21a250d..efe5f3af933e 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -240,6 +240,8 @@ static int vic_open_channel(struct tegra_drm_client *client,
 		return -ENOMEM;
 	}
 
+	context->syncpt = client->base.syncpts[0];
+
 	return 0;
 }
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
                   ` (4 preceding siblings ...)
       [not found] ` <20171105110118.15142-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
@ 2017-11-05 11:01 ` Mikko Perttunen
       [not found]   ` <20171105110118.15142-9-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
  2017-11-05 11:01 ` [PATCH 09/10] drm/tegra: Boot VIC in runtime resume Mikko Perttunen
  2017-11-05 11:01   ` Mikko Perttunen
  7 siblings, 1 reply; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

In the traditional channel allocation model, a single hardware channel
was allocated for each client. This is simple from an implementation
perspective but prevents use of hardware scheduling.

This patch implements a channel allocation model where when a user
submits a job for a context, a hardware channel is allocated for
that context. The same channel is kept for as long as there are
incomplete jobs for that context. This way we can use hardware
scheduling and channel isolation between userspace processes, but
also prevent idling contexts from taking up hardware resources.

For now, this patch only adapts VIC to the new model.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 46 ++++++++++++++++++++++++++
 drivers/gpu/drm/tegra/drm.h |  7 +++-
 drivers/gpu/drm/tegra/vic.c | 79 +++++++++++++++++++++++----------------------
 3 files changed, 92 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index b964e18e3058..658bc8814f38 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -382,6 +382,51 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
 	return 0;
 }
 
+/**
+ * tegra_drm_context_get_channel() - Get a channel for submissions
+ * @context: Context for which to get a channel for
+ *
+ * Request a free hardware host1x channel for this user context, or if the
+ * context already has one, bump its refcount.
+ *
+ * Returns 0 on success, or -EBUSY if there were no free hardware channels.
+ */
+int tegra_drm_context_get_channel(struct tegra_drm_context *context)
+{
+	struct host1x_client *client = &context->client->base;
+
+	mutex_lock(&context->lock);
+
+	if (context->pending_jobs == 0) {
+		context->channel = host1x_channel_request(client->dev);
+		if (!context->channel) {
+			mutex_unlock(&context->lock);
+			return -EBUSY;
+		}
+	}
+
+	context->pending_jobs++;
+
+	mutex_unlock(&context->lock);
+
+	return 0;
+}
+
+/**
+ * tegra_drm_context_put_channel() - Put a previously gotten channel
+ * @context: Context which channel is no longer needed
+ *
+ * Decrease the refcount of the channel associated with this context,
+ * freeing it if the refcount drops to zero.
+ */
+void tegra_drm_context_put_channel(struct tegra_drm_context *context)
+{
+	mutex_lock(&context->lock);
+	if (--context->pending_jobs == 0)
+		host1x_channel_put(context->channel);
+	mutex_unlock(&context->lock);
+}
+
 static void tegra_drm_job_done(struct host1x_job *job)
 {
 	struct tegra_drm_context *context = job->callback_data;
@@ -737,6 +782,7 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
 		kfree(context);
 
 	kref_init(&context->ref);
+	mutex_init(&context->lock);
 
 	mutex_unlock(&fpriv->lock);
 	return err;
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 11d690846fd0..d0c3f1f779f6 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -78,9 +78,12 @@ struct tegra_drm_context {
 	struct kref ref;
 
 	struct tegra_drm_client *client;
+	unsigned int id;
+
+	struct mutex lock;
 	struct host1x_channel *channel;
 	struct host1x_syncpt *syncpt;
-	unsigned int id;
+	unsigned int pending_jobs;
 };
 
 struct tegra_drm_client_ops {
@@ -95,6 +98,8 @@ struct tegra_drm_client_ops {
 	void (*submit_done)(struct tegra_drm_context *context);
 };
 
+int tegra_drm_context_get_channel(struct tegra_drm_context *context);
+void tegra_drm_context_put_channel(struct tegra_drm_context *context);
 int tegra_drm_submit(struct tegra_drm_context *context,
 		     struct drm_tegra_submit *args, struct drm_device *drm,
 		     struct drm_file *file);
diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index efe5f3af933e..0cacf023a890 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -33,7 +33,6 @@ struct vic {
 
 	void __iomem *regs;
 	struct tegra_drm_client client;
-	struct host1x_channel *channel;
 	struct iommu_domain *domain;
 	struct device *dev;
 	struct clk *clk;
@@ -161,28 +160,12 @@ static int vic_init(struct host1x_client *client)
 			goto detach_device;
 	}
 
-	vic->channel = host1x_channel_request(client->dev);
-	if (!vic->channel) {
-		err = -ENOMEM;
-		goto detach_device;
-	}
-
-	client->syncpts[0] = host1x_syncpt_request(client->dev, 0);
-	if (!client->syncpts[0]) {
-		err = -ENOMEM;
-		goto free_channel;
-	}
-
 	err = tegra_drm_register_client(tegra, drm);
 	if (err < 0)
-		goto free_syncpt;
+		goto detach_device;
 
 	return 0;
 
-free_syncpt:
-	host1x_syncpt_free(client->syncpts[0]);
-free_channel:
-	host1x_channel_put(vic->channel);
 detach_device:
 	if (tegra->domain)
 		iommu_detach_device(tegra->domain, vic->dev);
@@ -202,9 +185,6 @@ static int vic_exit(struct host1x_client *client)
 	if (err < 0)
 		return err;
 
-	host1x_syncpt_free(client->syncpts[0]);
-	host1x_channel_put(vic->channel);
-
 	if (vic->domain) {
 		iommu_detach_device(vic->domain, vic->dev);
 		vic->domain = NULL;
@@ -221,7 +201,24 @@ static const struct host1x_client_ops vic_client_ops = {
 static int vic_open_channel(struct tegra_drm_client *client,
 			    struct tegra_drm_context *context)
 {
-	struct vic *vic = to_vic(client);
+	context->syncpt = host1x_syncpt_request(client->base.dev, 0);
+	if (!context->syncpt)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void vic_close_channel(struct tegra_drm_context *context)
+{
+	host1x_syncpt_free(context->syncpt);
+}
+
+static int vic_submit(struct tegra_drm_context *context,
+		      struct drm_tegra_submit *args, struct drm_device *drm,
+		      struct drm_file *file)
+{
+	struct host1x_client *client = &context->client->base;
+	struct vic *vic = dev_get_drvdata(client->dev);
 	int err;
 
 	err = pm_runtime_get_sync(vic->dev);
@@ -229,35 +226,41 @@ static int vic_open_channel(struct tegra_drm_client *client,
 		return err;
 
 	err = vic_boot(vic);
-	if (err < 0) {
-		pm_runtime_put(vic->dev);
-		return err;
-	}
+	if (err < 0)
+		goto put_vic;
 
-	context->channel = host1x_channel_get(vic->channel);
-	if (!context->channel) {
-		pm_runtime_put(vic->dev);
-		return -ENOMEM;
-	}
+	err = tegra_drm_context_get_channel(context);
+	if (err < 0)
+		goto put_vic;
 
-	context->syncpt = client->base.syncpts[0];
+	err = tegra_drm_submit(context, args, drm, file);
+	if (err)
+		goto put_channel;
 
 	return 0;
+
+put_channel:
+	tegra_drm_context_put_channel(context);
+put_vic:
+	pm_runtime_put(vic->dev);
+
+	return err;
 }
 
-static void vic_close_channel(struct tegra_drm_context *context)
+static void vic_submit_done(struct tegra_drm_context *context)
 {
-	struct vic *vic = to_vic(context->client);
-
-	host1x_channel_put(context->channel);
+	struct host1x_client *client = &context->client->base;
+	struct vic *vic = dev_get_drvdata(client->dev);
 
+	tegra_drm_context_put_channel(context);
 	pm_runtime_put(vic->dev);
 }
 
 static const struct tegra_drm_client_ops vic_ops = {
 	.open_channel = vic_open_channel,
 	.close_channel = vic_close_channel,
-	.submit = tegra_drm_submit,
+	.submit = vic_submit,
+	.submit_done = vic_submit_done,
 };
 
 #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
@@ -340,8 +343,6 @@ static int vic_probe(struct platform_device *pdev)
 	vic->client.base.ops = &vic_client_ops;
 	vic->client.base.dev = dev;
 	vic->client.base.class = HOST1X_CLASS_VIC;
-	vic->client.base.syncpts = syncpts;
-	vic->client.base.num_syncpts = 1;
 	vic->dev = dev;
 	vic->config = vic_config;
 
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 09/10] drm/tegra: Boot VIC in runtime resume
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
                   ` (5 preceding siblings ...)
  2017-11-05 11:01 ` [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model Mikko Perttunen
@ 2017-11-05 11:01 ` Mikko Perttunen
  2017-11-05 11:01   ` Mikko Perttunen
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

This ensures that there are no concurrency issues when multiple users
are trying to use VIC concurrently, and also simplifies the code
slightly.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/vic.c | 47 +++++++++++++++++++--------------------------
 1 file changed, 20 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 0cacf023a890..3de20f287112 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -29,7 +29,6 @@ struct vic_config {
 
 struct vic {
 	struct falcon falcon;
-	bool booted;
 
 	void __iomem *regs;
 	struct tegra_drm_client client;
@@ -51,33 +50,12 @@ static void vic_writel(struct vic *vic, u32 value, unsigned int offset)
 	writel(value, vic->regs + offset);
 }
 
-static int vic_runtime_resume(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-
-	return clk_prepare_enable(vic->clk);
-}
-
-static int vic_runtime_suspend(struct device *dev)
-{
-	struct vic *vic = dev_get_drvdata(dev);
-
-	clk_disable_unprepare(vic->clk);
-
-	vic->booted = false;
-
-	return 0;
-}
-
 static int vic_boot(struct vic *vic)
 {
 	u32 fce_ucode_size, fce_bin_data_offset;
 	void *hdr;
 	int err = 0;
 
-	if (vic->booted)
-		return 0;
-
 	/* setup clockgating registers */
 	vic_writel(vic, CG_IDLE_CG_DLY_CNT(4) |
 			CG_IDLE_CG_EN |
@@ -108,7 +86,26 @@ static int vic_boot(struct vic *vic)
 		return err;
 	}
 
-	vic->booted = true;
+	return 0;
+}
+
+static int vic_runtime_resume(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+	int err;
+
+	err = clk_prepare_enable(vic->clk);
+	if (err < 0)
+		return err;
+
+	return vic_boot(vic);
+}
+
+static int vic_runtime_suspend(struct device *dev)
+{
+	struct vic *vic = dev_get_drvdata(dev);
+
+	clk_disable_unprepare(vic->clk);
 
 	return 0;
 }
@@ -225,10 +222,6 @@ static int vic_submit(struct tegra_drm_context *context,
 	if (err < 0)
 		return err;
 
-	err = vic_boot(vic);
-	if (err < 0)
-		goto put_vic;
-
 	err = tegra_drm_context_get_channel(context);
 	if (err < 0)
 		goto put_vic;
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-05 11:01   ` Mikko Perttunen
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: linux-tegra, digetx, linux-kernel, dri-devel, Mikko Perttunen

Add an option to host1x_channel_request to interruptibly wait for a
free channel. This allows IOCTLs that acquire a channel to block
the userspace.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c  |  9 +++++----
 drivers/gpu/drm/tegra/gr2d.c |  6 +++---
 drivers/gpu/drm/tegra/gr3d.c |  6 +++---
 drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
 drivers/gpu/host1x/channel.h |  1 +
 include/linux/host1x.h       |  2 +-
 6 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 658bc8814f38..19f77c1a76c0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
  * Request a free hardware host1x channel for this user context, or if the
  * context already has one, bump its refcount.
  *
- * Returns 0 on success, or -EBUSY if there were no free hardware channels.
+ * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
+ * or other error.
  */
 int tegra_drm_context_get_channel(struct tegra_drm_context *context)
 {
@@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
 	mutex_lock(&context->lock);
 
 	if (context->pending_jobs == 0) {
-		context->channel = host1x_channel_request(client->dev);
-		if (!context->channel) {
+		context->channel = host1x_channel_request(client->dev, true);
+		if (IS_ERR(context->channel)) {
 			mutex_unlock(&context->lock);
-			return -EBUSY;
+			return PTR_ERR(context->channel);
 		}
 	}
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 3db3bcac48b9..c1853402f69b 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
 	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
 	struct gr2d *gr2d = to_gr2d(drm);
 
-	gr2d->channel = host1x_channel_request(client->dev);
-	if (!gr2d->channel)
-		return -ENOMEM;
+	gr2d->channel = host1x_channel_request(client->dev, false);
+	if (IS_ERR(gr2d->channel))
+		return PTR_ERR(gr2d->channel);
 
 	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
 	if (!client->syncpts[0]) {
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index 279438342c8c..793a91d577cb 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
 	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
 	struct gr3d *gr3d = to_gr3d(drm);
 
-	gr3d->channel = host1x_channel_request(client->dev);
-	if (!gr3d->channel)
-		return -ENOMEM;
+	gr3d->channel = host1x_channel_request(client->dev, false);
+	if (IS_ERR(gr3d->channel))
+		return PTR_ERR(gr3d->channel);
 
 	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
 	if (!client->syncpts[0]) {
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 9d8cad12f9d8..eebcd51261df 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
 	bitmap_zero(chlist->allocated_channels, num_channels);
 
 	mutex_init(&chlist->lock);
+	sema_init(&chlist->sema, num_channels);
 
 	return 0;
 }
@@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
 	host1x_cdma_deinit(&channel->cdma);
 
 	clear_bit(channel->id, chlist->allocated_channels);
+
+	up(&chlist->sema);
 }
 
 void host1x_channel_put(struct host1x_channel *channel)
@@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
 }
 EXPORT_SYMBOL(host1x_channel_put);
 
-static struct host1x_channel *acquire_unused_channel(struct host1x *host)
+static struct host1x_channel *acquire_unused_channel(struct host1x *host,
+						     bool wait)
 {
 	struct host1x_channel_list *chlist = &host->channel_list;
 	unsigned int max_channels = host->info->nb_channels;
 	unsigned int index;
+	int err;
+
+	if (wait) {
+		err = down_interruptible(&chlist->sema);
+		if (err)
+			return ERR_PTR(err);
+	} else {
+		if (down_trylock(&chlist->sema))
+			return ERR_PTR(-EBUSY);
+	}
 
 	mutex_lock(&chlist->lock);
 
 	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
-	if (index >= max_channels) {
+	if (WARN(index >= max_channels, "failed to find free channel")) {
 		mutex_unlock(&chlist->lock);
 		dev_err(host->dev, "failed to find free channel\n");
-		return NULL;
+		return ERR_PTR(-EBUSY);
 	}
 
 	chlist->channels[index].id = index;
@@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
 /**
  * host1x_channel_request() - Allocate a channel
  * @device: Host1x unit this channel will be used to send commands to
+ * @wait: Whether to wait for a free channels if all are reserved
+ *
+ * Allocates a new host1x channel for @device. If all channels are in use,
+ * and @wait is true, does an interruptible wait until one is available.
  *
- * Allocates a new host1x channel for @device. May return NULL if CDMA
- * initialization fails.
+ * If a channel was acquired, returns a pointer to it. Otherwise returns
+ * an error pointer with -EINTR if the wait was interrupted, -EBUSY
+ * if a channel could not be acquired or another error code if channel
+ * initialization failed.
  */
-struct host1x_channel *host1x_channel_request(struct device *dev)
+struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
 {
 	struct host1x *host = dev_get_drvdata(dev->parent);
 	struct host1x_channel_list *chlist = &host->channel_list;
 	struct host1x_channel *channel;
 	int err;
 
-	channel = acquire_unused_channel(host);
-	if (!channel)
-		return NULL;
+	channel = acquire_unused_channel(host, wait);
+	if (IS_ERR(channel))
+		return channel;
 
 	kref_init(&channel->refcount);
 	mutex_init(&channel->submitlock);
@@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
 
 	dev_err(dev, "failed to initialize channel\n");
 
-	return NULL;
+	return ERR_PTR(err);
 }
 EXPORT_SYMBOL(host1x_channel_request);
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
index e68a8ae9a670..1f5cf8029b62 100644
--- a/drivers/gpu/host1x/channel.h
+++ b/drivers/gpu/host1x/channel.h
@@ -31,6 +31,7 @@ struct host1x_channel_list {
 	struct host1x_channel *channels;
 
 	struct mutex lock;
+	struct semaphore sema;
 	unsigned long *allocated_channels;
 };
 
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index f931d28a68ff..2a34905d4408 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 struct host1x_channel;
 struct host1x_job;
 
-struct host1x_channel *host1x_channel_request(struct device *dev);
+struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
 struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
 void host1x_channel_put(struct host1x_channel *channel);
 int host1x_job_submit(struct host1x_job *job);
-- 
2.14.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-05 11:01   ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-05 11:01 UTC (permalink / raw)
  To: thierry.reding, jonathanh
  Cc: digetx, dri-devel, linux-tegra, linux-kernel, Mikko Perttunen

Add an option to host1x_channel_request to interruptibly wait for a
free channel. This allows IOCTLs that acquire a channel to block
the userspace.

Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c  |  9 +++++----
 drivers/gpu/drm/tegra/gr2d.c |  6 +++---
 drivers/gpu/drm/tegra/gr3d.c |  6 +++---
 drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
 drivers/gpu/host1x/channel.h |  1 +
 include/linux/host1x.h       |  2 +-
 6 files changed, 43 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 658bc8814f38..19f77c1a76c0 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
  * Request a free hardware host1x channel for this user context, or if the
  * context already has one, bump its refcount.
  *
- * Returns 0 on success, or -EBUSY if there were no free hardware channels.
+ * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
+ * or other error.
  */
 int tegra_drm_context_get_channel(struct tegra_drm_context *context)
 {
@@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
 	mutex_lock(&context->lock);
 
 	if (context->pending_jobs == 0) {
-		context->channel = host1x_channel_request(client->dev);
-		if (!context->channel) {
+		context->channel = host1x_channel_request(client->dev, true);
+		if (IS_ERR(context->channel)) {
 			mutex_unlock(&context->lock);
-			return -EBUSY;
+			return PTR_ERR(context->channel);
 		}
 	}
 
diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
index 3db3bcac48b9..c1853402f69b 100644
--- a/drivers/gpu/drm/tegra/gr2d.c
+++ b/drivers/gpu/drm/tegra/gr2d.c
@@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
 	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
 	struct gr2d *gr2d = to_gr2d(drm);
 
-	gr2d->channel = host1x_channel_request(client->dev);
-	if (!gr2d->channel)
-		return -ENOMEM;
+	gr2d->channel = host1x_channel_request(client->dev, false);
+	if (IS_ERR(gr2d->channel))
+		return PTR_ERR(gr2d->channel);
 
 	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
 	if (!client->syncpts[0]) {
diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
index 279438342c8c..793a91d577cb 100644
--- a/drivers/gpu/drm/tegra/gr3d.c
+++ b/drivers/gpu/drm/tegra/gr3d.c
@@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
 	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
 	struct gr3d *gr3d = to_gr3d(drm);
 
-	gr3d->channel = host1x_channel_request(client->dev);
-	if (!gr3d->channel)
-		return -ENOMEM;
+	gr3d->channel = host1x_channel_request(client->dev, false);
+	if (IS_ERR(gr3d->channel))
+		return PTR_ERR(gr3d->channel);
 
 	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
 	if (!client->syncpts[0]) {
diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
index 9d8cad12f9d8..eebcd51261df 100644
--- a/drivers/gpu/host1x/channel.c
+++ b/drivers/gpu/host1x/channel.c
@@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
 	bitmap_zero(chlist->allocated_channels, num_channels);
 
 	mutex_init(&chlist->lock);
+	sema_init(&chlist->sema, num_channels);
 
 	return 0;
 }
@@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
 	host1x_cdma_deinit(&channel->cdma);
 
 	clear_bit(channel->id, chlist->allocated_channels);
+
+	up(&chlist->sema);
 }
 
 void host1x_channel_put(struct host1x_channel *channel)
@@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
 }
 EXPORT_SYMBOL(host1x_channel_put);
 
-static struct host1x_channel *acquire_unused_channel(struct host1x *host)
+static struct host1x_channel *acquire_unused_channel(struct host1x *host,
+						     bool wait)
 {
 	struct host1x_channel_list *chlist = &host->channel_list;
 	unsigned int max_channels = host->info->nb_channels;
 	unsigned int index;
+	int err;
+
+	if (wait) {
+		err = down_interruptible(&chlist->sema);
+		if (err)
+			return ERR_PTR(err);
+	} else {
+		if (down_trylock(&chlist->sema))
+			return ERR_PTR(-EBUSY);
+	}
 
 	mutex_lock(&chlist->lock);
 
 	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
-	if (index >= max_channels) {
+	if (WARN(index >= max_channels, "failed to find free channel")) {
 		mutex_unlock(&chlist->lock);
 		dev_err(host->dev, "failed to find free channel\n");
-		return NULL;
+		return ERR_PTR(-EBUSY);
 	}
 
 	chlist->channels[index].id = index;
@@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
 /**
  * host1x_channel_request() - Allocate a channel
  * @device: Host1x unit this channel will be used to send commands to
+ * @wait: Whether to wait for a free channels if all are reserved
+ *
+ * Allocates a new host1x channel for @device. If all channels are in use,
+ * and @wait is true, does an interruptible wait until one is available.
  *
- * Allocates a new host1x channel for @device. May return NULL if CDMA
- * initialization fails.
+ * If a channel was acquired, returns a pointer to it. Otherwise returns
+ * an error pointer with -EINTR if the wait was interrupted, -EBUSY
+ * if a channel could not be acquired or another error code if channel
+ * initialization failed.
  */
-struct host1x_channel *host1x_channel_request(struct device *dev)
+struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
 {
 	struct host1x *host = dev_get_drvdata(dev->parent);
 	struct host1x_channel_list *chlist = &host->channel_list;
 	struct host1x_channel *channel;
 	int err;
 
-	channel = acquire_unused_channel(host);
-	if (!channel)
-		return NULL;
+	channel = acquire_unused_channel(host, wait);
+	if (IS_ERR(channel))
+		return channel;
 
 	kref_init(&channel->refcount);
 	mutex_init(&channel->submitlock);
@@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
 
 	dev_err(dev, "failed to initialize channel\n");
 
-	return NULL;
+	return ERR_PTR(err);
 }
 EXPORT_SYMBOL(host1x_channel_request);
diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
index e68a8ae9a670..1f5cf8029b62 100644
--- a/drivers/gpu/host1x/channel.h
+++ b/drivers/gpu/host1x/channel.h
@@ -31,6 +31,7 @@ struct host1x_channel_list {
 	struct host1x_channel *channels;
 
 	struct mutex lock;
+	struct semaphore sema;
 	unsigned long *allocated_channels;
 };
 
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index f931d28a68ff..2a34905d4408 100644
--- a/include/linux/host1x.h
+++ b/include/linux/host1x.h
@@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
 struct host1x_channel;
 struct host1x_job;
 
-struct host1x_channel *host1x_channel_request(struct device *dev);
+struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
 struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
 void host1x_channel_put(struct host1x_channel *channel);
 int host1x_job_submit(struct host1x_job *job);
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-11-05 11:01   ` Mikko Perttunen
@ 2017-11-05 16:46       ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 16:46 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Host1x has a feature called MLOCKs which allow a certain class
> (~HW unit) to be locked (in the mutex sense) and unlocked during
> command execution, preventing other channels from accessing the
> class while it is locked. This is necessary to prevent concurrent
> jobs from messing up class state.
> 
> This has not been necessary so far since due to our channel allocation
> model, there has only been a single hardware channel submitting
> commands to each class. Future patches, however, change the channel
> allocation model to allow hardware-scheduled concurrency, and as such
> we need to start locking.
> 
> This patch implements locking on all platforms from Tegra20 to
> Tegra186.
> 
> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/gpu/host1x/cdma.c                      |   1 +
>  drivers/gpu/host1x/cdma.h                      |   1 +
>  drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
>  drivers/gpu/host1x/hw/channel_hw.c             |  71 ++++++++++----
>  drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
>  14 files changed, 257 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index 28541b280739..f787cfe69c11 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -232,6 +232,7 @@ static void cdma_start_timer_locked(struct host1x_cdma *cdma,
>  	}
>  
>  	cdma->timeout.client = job->client;
> +	cdma->timeout.class = job->class;
>  	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
>  	cdma->timeout.syncpt_val = job->syncpt_end;
>  	cdma->timeout.start_ktime = ktime_get();
> diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
> index 286d49386be9..e72660fc83c9 100644
> --- a/drivers/gpu/host1x/cdma.h
> +++ b/drivers/gpu/host1x/cdma.h
> @@ -59,6 +59,7 @@ struct buffer_timeout {
>  	ktime_t start_ktime;		/* starting time */
>  	/* context timeout information */
>  	int client;
> +	u32 class;
>  };
>  
>  enum cdma_event {
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index ce320534cbed..4d5970d863d5 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -16,6 +16,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/iopoll.h>
>  #include <linux/slab.h>
>  #include <linux/scatterlist.h>
>  #include <linux/dma-mapping.h>
> @@ -243,6 +244,125 @@ static void cdma_resume(struct host1x_cdma *cdma, u32 getptr)
>  	cdma_timeout_restart(cdma, getptr);
>  }
>  
> +static int mlock_id_for_class(unsigned int class)
> +{
> +#if HOST1X_HW >= 6
> +	switch (class)
> +	{
> +	case HOST1X_CLASS_HOST1X:
> +		return 0;
> +	case HOST1X_CLASS_VIC:
> +		return 17;

What is the meaning of returned ID values that you have defined here? Why VIC
should have different ID on T186?

> +	default:
> +		return -EINVAL;
> +	}
> +#else
> +	switch (class)
> +	{
> +	case HOST1X_CLASS_HOST1X:
> +		return 0;
> +	case HOST1X_CLASS_GR2D:
> +		return 1;
> +	case HOST1X_CLASS_GR2D_SB:
> +		return 2;

Note that we are allowing to switch 2d classes in the same jobs context and
currently jobs class is somewhat hardcoded to GR2D.

Even though that GR2D and GR2D_SB use different register banks, is it okay to
trigger execution of different classes simultaneously? Would syncpoint
differentiate classes on OP_DONE event?

I suppose that MLOCK (the module lock) implies the whole module locking,
wouldn't it make sense to just use the module ID's defined in the TRM?

> +	case HOST1X_CLASS_VIC:
> +		return 3;
> +	case HOST1X_CLASS_GR3D:
> +		return 4;
> +	default:
> +		return -EINVAL;
> +	}
> +#endif
> +}
> +
> +static void timeout_release_mlock(struct host1x_cdma *cdma)
> +{
> +#if HOST1X_HW >= 6
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	struct host1x *host = cdma_to_host1x(cdma);
> +	u32 pb_pos, pb_temp[3], val;
> +	int err, mlock_id;
> +
> +	if (!host->hv_regs)
> +		return;
> +
> +	mlock_id = mlock_id_for_class(cdma->timeout.class);
> +	if (WARN(mlock_id < 0, "Invalid class ID"))
> +		return;
> +
> +	val = host1x_hypervisor_readl(host, HOST1X_HV_MLOCK(mlock_id));
> +	if (!HOST1X_HV_MLOCK_LOCKED_V(val) ||
> +	    HOST1X_HV_MLOCK_CH_V(val) != ch->id)
> +	{
> +		/* Channel is not holding the MLOCK, nothing to release. */
> +		return;
> +	}
> +
> +	/*
> +	 * On Tegra186, there is no register to unlock an MLOCK (don't ask me
> +	 * why). As such, we have to execute a release_mlock instruction to
> +	 * do it. We do this by backing up the first three opcodes of the
> +	 * pushbuffer and replacing them with our own short sequence to do
> +	 * the unlocking. We set the .pos field to 12, which causes DMAEND
> +	 * to be set accordingly such that only the three opcodes we set
> +	 * here are executed before CDMA stops. Finally we restore the value
> +	 * of pos and pushbuffer contents.
> +	 */
> +
> +	pb_pos = cdma->push_buffer.pos;
> +	memcpy(pb_temp, cdma->push_buffer.mapped, ARRAY_SIZE(pb_temp) * 4);
> +
> +	{
> +		u32 *pb = cdma->push_buffer.mapped;
> +		pb[0] = host1x_opcode_acquire_mlock(cdma->timeout.class);
> +		pb[1] = host1x_opcode_setclass(cdma->timeout.class, 0, 0);
> +		pb[2] = host1x_opcode_release_mlock(cdma->timeout.class);
> +	}
> +
> +	/* Flush writecombine buffer */
> +	wmb();
> +
> +	cdma->push_buffer.pos = ARRAY_SIZE(pb_temp) * 4;
> +
> +	cdma_resume(cdma, 0);
> +
> +	/* Wait until the release_mlock opcode has been executed */
> +	err = readl_relaxed_poll_timeout(
> +		host->hv_regs + HOST1X_HV_MLOCK(mlock_id), val,
> +		!HOST1X_HV_MLOCK_LOCKED_V(val) ||
> +			HOST1X_HV_MLOCK_CH_V(val) != ch->id,
> +		10, 10000);
> +	WARN(err, "Failed to unlock mlock %u\n", mlock_id);
> +
> +	cdma_freeze(cdma);
> +
> +	/* Restore original pushbuffer state */
> +	cdma->push_buffer.pos = pb_pos;
> +	memcpy(cdma->push_buffer.mapped, pb_temp, ARRAY_SIZE(pb_temp) * 4);
> +	wmb();
> +#else
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	struct host1x *host = cdma_to_host1x(cdma);
> +	int mlock_id;
> +	u32 val;
> +
> +	mlock_id = mlock_id_for_class(cdma->timeout.class);
> +	if (WARN(mlock_id < 0, "Invalid class ID"))
> +		return;
> +
> +	val = host1x_sync_readl(host, HOST1X_SYNC_MLOCK_OWNER(mlock_id));
> +	if (!HOST1X_SYNC_MLOCK_OWNER_CH_OWNS_V(val) ||
> +	    HOST1X_SYNC_MLOCK_OWNER_CHID_V(val) != ch->id)
> +	{
> +		/* Channel is not holding the MLOCK, nothing to release. */
> +		return;
> +	}
> +
> +	/* Unlock MLOCK */
> +	host1x_sync_writel(host, 0x0, HOST1X_SYNC_MLOCK(mlock_id));
> +#endif
> +}
> +
>  /*
>   * If this timeout fires, it indicates the current sync_queue entry has
>   * exceeded its TTL and the userctx should be timed out and remaining
> @@ -293,6 +413,8 @@ static void cdma_timeout_handler(struct work_struct *work)
>  	/* stop HW, resetting channel/module */
>  	host1x_hw_cdma_freeze(host1x, cdma);
>  
> +	timeout_release_mlock(cdma);
> +
>  	host1x_cdma_update_sync_queue(cdma, ch->dev);
>  	mutex_unlock(&cdma->lock);
>  }
> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> index 246b78c41281..f80fb8be38e6 100644
> --- a/drivers/gpu/host1x/hw/channel_hw.c
> +++ b/drivers/gpu/host1x/hw/channel_hw.c
> @@ -89,6 +89,34 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
>  			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
>  }
>  
> +static void channel_submit_serialize(struct host1x_channel *ch,
> +				     struct host1x_syncpt *sp)
> +{
> +#if HOST1X_HW >= 6
> +	/*
> +	 * On T186, due to a hw issue, we need to issue an mlock acquire
> +	 * for HOST1X class. It is still interpreted as a no-op in
> +	 * hardware.
> +	 */
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_acquire_mlock(HOST1X_CLASS_HOST1X),
> +		host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_nonincr(host1x_uclass_wait_syncpt_r(), 1),
> +		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
> +					      host1x_syncpt_read_max(sp)));
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_release_mlock(HOST1X_CLASS_HOST1X),
> +		HOST1X_OPCODE_NOP);
> +#else
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
> +				       host1x_uclass_wait_syncpt_r(), 1),
> +		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
> +					      host1x_syncpt_read_max(sp)));
> +#endif
> +}
> +
>  static int channel_submit(struct host1x_job *job)
>  {
>  	struct host1x_channel *ch = job->channel;
> @@ -96,6 +124,7 @@ static int channel_submit(struct host1x_job *job)
>  	u32 user_syncpt_incrs = job->syncpt_incrs;
>  	u32 prev_max = 0;
>  	u32 syncval;
> +	u32 mlock;
>  	int err;
>  	struct host1x_waitlist *completed_waiter = NULL;
>  	struct host1x *host = dev_get_drvdata(ch->dev->parent);
> @@ -128,17 +157,12 @@ static int channel_submit(struct host1x_job *job)
>  		goto error;
>  	}
>  
> -	if (job->serialize) {
> -		/*
> -		 * Force serialization by inserting a host wait for the
> -		 * previous job to finish before this one can commence.
> -		 */
> -		host1x_cdma_push(&ch->cdma,
> -				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
> -					host1x_uclass_wait_syncpt_r(), 1),
> -				 host1x_class_host_wait_syncpt(job->syncpt_id,
> -					host1x_syncpt_read_max(sp)));
> -	}
> +	/*
> +	 * Force serialization by inserting a host wait for the
> +	 * previous job to finish before this one can commence.
> +	 */
> +	if (job->serialize)
> +		channel_submit_serialize(ch, sp);
>  
>  	/* Synchronize base register to allow using it for relative waiting */
>  	if (sp->base)
> @@ -149,16 +173,29 @@ static int channel_submit(struct host1x_job *job)
>  	/* assign syncpoint to channel */
>  	host1x_hw_syncpt_assign_channel(host, sp, ch);
>  
> -	job->syncpt_end = syncval;
> -
> -	/* add a setclass for modules that require it */
> -	if (job->class)
> +	/* acquire MLOCK and set channel class to specified class */
> +	mlock = HOST1X_HW >= 6 ? job->class : mlock_id_for_class(job->class);
> +	if (job->class) {
>  		host1x_cdma_push(&ch->cdma,
> -				 host1x_opcode_setclass(job->class, 0, 0),
> -				 HOST1X_OPCODE_NOP);
> +				 host1x_opcode_acquire_mlock(mlock),
> +				 host1x_opcode_setclass(job->class, 0, 0));
> +	}
>  
>  	submit_gathers(job);
>  
> +	if (job->class) {
> +		/*
> +		 * Push additional increment to catch jobs that crash before
> +		 * finishing their gather, not reaching the unlock opcode.
> +		 */
> +		syncval = host1x_syncpt_incr_max(sp, 1);
> +		host1x_cdma_push(&ch->cdma,
> +			host1x_opcode_imm_incr_syncpt(0, job->syncpt_id),
> +			host1x_opcode_release_mlock(mlock));
> +	}
> +
> +	job->syncpt_end = syncval;
> +
>  	/* end CDMA submit & stash pinned hMems into sync queue */
>  	host1x_cdma_end(&ch->cdma, job);
>  
> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
> index 5f0fb866efa8..6a2c9f905acc 100644
> --- a/drivers/gpu/host1x/hw/host1x01_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
> @@ -138,6 +138,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x02_hardware.h b/drivers/gpu/host1x/hw/host1x02_hardware.h
> index 154901860bc6..c524c6c8d82f 100644
> --- a/drivers/gpu/host1x/hw/host1x02_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x02_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x04_hardware.h b/drivers/gpu/host1x/hw/host1x04_hardware.h
> index de1a38175328..8dedd77b7a1b 100644
> --- a/drivers/gpu/host1x/hw/host1x04_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x04_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x05_hardware.h b/drivers/gpu/host1x/hw/host1x05_hardware.h
> index 2937ebb6be11..6aafcb5e97e6 100644
> --- a/drivers/gpu/host1x/hw/host1x05_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x05_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
> index 3039c92ea605..60147d26ad9b 100644
> --- a/drivers/gpu/host1x/hw/host1x06_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> index 31238c285d46..4630fec2237a 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x02_sync.h b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> index 540c7b65995f..dec8e812a114 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x04_sync.h b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> index 3d6c8ec65934..9a951a4db07f 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x05_sync.h b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> index ca10eee5045c..e29e2e19a60b 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> index c05dab8a178b..be73530c3585 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> @@ -18,6 +18,11 @@
>  #define HOST1X_HV_SYNCPT_PROT_EN			0x1ac4
>  #define HOST1X_HV_SYNCPT_PROT_EN_CH_EN			BIT(1)
>  #define HOST1X_HV_CH_KERNEL_FILTER_GBUFFER(x)		(0x2020 + (x * 4))
> +#define HOST1X_HV_MLOCK(x)				(0x2030 + (x * 4))
> +#define HOST1X_HV_MLOCK_CH(x)				(((x) & 0x3f) << 2)
> +#define HOST1X_HV_MLOCK_CH_V(x)				(((x) >> 2) & 0x3f)
> +#define HOST1X_HV_MLOCK_LOCKED				BIT(0)
> +#define HOST1X_HV_MLOCK_LOCKED_V(x)			((x) & 0x1)
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL			0x233c
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL_ADDR(x)		(x)
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL_CHANNEL(x)		((x) << 16)
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
@ 2017-11-05 16:46       ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 16:46 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Host1x has a feature called MLOCKs which allow a certain class
> (~HW unit) to be locked (in the mutex sense) and unlocked during
> command execution, preventing other channels from accessing the
> class while it is locked. This is necessary to prevent concurrent
> jobs from messing up class state.
> 
> This has not been necessary so far since due to our channel allocation
> model, there has only been a single hardware channel submitting
> commands to each class. Future patches, however, change the channel
> allocation model to allow hardware-scheduled concurrency, and as such
> we need to start locking.
> 
> This patch implements locking on all platforms from Tegra20 to
> Tegra186.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/host1x/cdma.c                      |   1 +
>  drivers/gpu/host1x/cdma.h                      |   1 +
>  drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
>  drivers/gpu/host1x/hw/channel_hw.c             |  71 ++++++++++----
>  drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
>  14 files changed, 257 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index 28541b280739..f787cfe69c11 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -232,6 +232,7 @@ static void cdma_start_timer_locked(struct host1x_cdma *cdma,
>  	}
>  
>  	cdma->timeout.client = job->client;
> +	cdma->timeout.class = job->class;
>  	cdma->timeout.syncpt = host1x_syncpt_get(host, job->syncpt_id);
>  	cdma->timeout.syncpt_val = job->syncpt_end;
>  	cdma->timeout.start_ktime = ktime_get();
> diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
> index 286d49386be9..e72660fc83c9 100644
> --- a/drivers/gpu/host1x/cdma.h
> +++ b/drivers/gpu/host1x/cdma.h
> @@ -59,6 +59,7 @@ struct buffer_timeout {
>  	ktime_t start_ktime;		/* starting time */
>  	/* context timeout information */
>  	int client;
> +	u32 class;
>  };
>  
>  enum cdma_event {
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index ce320534cbed..4d5970d863d5 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -16,6 +16,7 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
>  
> +#include <linux/iopoll.h>
>  #include <linux/slab.h>
>  #include <linux/scatterlist.h>
>  #include <linux/dma-mapping.h>
> @@ -243,6 +244,125 @@ static void cdma_resume(struct host1x_cdma *cdma, u32 getptr)
>  	cdma_timeout_restart(cdma, getptr);
>  }
>  
> +static int mlock_id_for_class(unsigned int class)
> +{
> +#if HOST1X_HW >= 6
> +	switch (class)
> +	{
> +	case HOST1X_CLASS_HOST1X:
> +		return 0;
> +	case HOST1X_CLASS_VIC:
> +		return 17;

What is the meaning of returned ID values that you have defined here? Why VIC
should have different ID on T186?

> +	default:
> +		return -EINVAL;
> +	}
> +#else
> +	switch (class)
> +	{
> +	case HOST1X_CLASS_HOST1X:
> +		return 0;
> +	case HOST1X_CLASS_GR2D:
> +		return 1;
> +	case HOST1X_CLASS_GR2D_SB:
> +		return 2;

Note that we are allowing to switch 2d classes in the same jobs context and
currently jobs class is somewhat hardcoded to GR2D.

Even though that GR2D and GR2D_SB use different register banks, is it okay to
trigger execution of different classes simultaneously? Would syncpoint
differentiate classes on OP_DONE event?

I suppose that MLOCK (the module lock) implies the whole module locking,
wouldn't it make sense to just use the module ID's defined in the TRM?

> +	case HOST1X_CLASS_VIC:
> +		return 3;
> +	case HOST1X_CLASS_GR3D:
> +		return 4;
> +	default:
> +		return -EINVAL;
> +	}
> +#endif
> +}
> +
> +static void timeout_release_mlock(struct host1x_cdma *cdma)
> +{
> +#if HOST1X_HW >= 6
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	struct host1x *host = cdma_to_host1x(cdma);
> +	u32 pb_pos, pb_temp[3], val;
> +	int err, mlock_id;
> +
> +	if (!host->hv_regs)
> +		return;
> +
> +	mlock_id = mlock_id_for_class(cdma->timeout.class);
> +	if (WARN(mlock_id < 0, "Invalid class ID"))
> +		return;
> +
> +	val = host1x_hypervisor_readl(host, HOST1X_HV_MLOCK(mlock_id));
> +	if (!HOST1X_HV_MLOCK_LOCKED_V(val) ||
> +	    HOST1X_HV_MLOCK_CH_V(val) != ch->id)
> +	{
> +		/* Channel is not holding the MLOCK, nothing to release. */
> +		return;
> +	}
> +
> +	/*
> +	 * On Tegra186, there is no register to unlock an MLOCK (don't ask me
> +	 * why). As such, we have to execute a release_mlock instruction to
> +	 * do it. We do this by backing up the first three opcodes of the
> +	 * pushbuffer and replacing them with our own short sequence to do
> +	 * the unlocking. We set the .pos field to 12, which causes DMAEND
> +	 * to be set accordingly such that only the three opcodes we set
> +	 * here are executed before CDMA stops. Finally we restore the value
> +	 * of pos and pushbuffer contents.
> +	 */
> +
> +	pb_pos = cdma->push_buffer.pos;
> +	memcpy(pb_temp, cdma->push_buffer.mapped, ARRAY_SIZE(pb_temp) * 4);
> +
> +	{
> +		u32 *pb = cdma->push_buffer.mapped;
> +		pb[0] = host1x_opcode_acquire_mlock(cdma->timeout.class);
> +		pb[1] = host1x_opcode_setclass(cdma->timeout.class, 0, 0);
> +		pb[2] = host1x_opcode_release_mlock(cdma->timeout.class);
> +	}
> +
> +	/* Flush writecombine buffer */
> +	wmb();
> +
> +	cdma->push_buffer.pos = ARRAY_SIZE(pb_temp) * 4;
> +
> +	cdma_resume(cdma, 0);
> +
> +	/* Wait until the release_mlock opcode has been executed */
> +	err = readl_relaxed_poll_timeout(
> +		host->hv_regs + HOST1X_HV_MLOCK(mlock_id), val,
> +		!HOST1X_HV_MLOCK_LOCKED_V(val) ||
> +			HOST1X_HV_MLOCK_CH_V(val) != ch->id,
> +		10, 10000);
> +	WARN(err, "Failed to unlock mlock %u\n", mlock_id);
> +
> +	cdma_freeze(cdma);
> +
> +	/* Restore original pushbuffer state */
> +	cdma->push_buffer.pos = pb_pos;
> +	memcpy(cdma->push_buffer.mapped, pb_temp, ARRAY_SIZE(pb_temp) * 4);
> +	wmb();
> +#else
> +	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	struct host1x *host = cdma_to_host1x(cdma);
> +	int mlock_id;
> +	u32 val;
> +
> +	mlock_id = mlock_id_for_class(cdma->timeout.class);
> +	if (WARN(mlock_id < 0, "Invalid class ID"))
> +		return;
> +
> +	val = host1x_sync_readl(host, HOST1X_SYNC_MLOCK_OWNER(mlock_id));
> +	if (!HOST1X_SYNC_MLOCK_OWNER_CH_OWNS_V(val) ||
> +	    HOST1X_SYNC_MLOCK_OWNER_CHID_V(val) != ch->id)
> +	{
> +		/* Channel is not holding the MLOCK, nothing to release. */
> +		return;
> +	}
> +
> +	/* Unlock MLOCK */
> +	host1x_sync_writel(host, 0x0, HOST1X_SYNC_MLOCK(mlock_id));
> +#endif
> +}
> +
>  /*
>   * If this timeout fires, it indicates the current sync_queue entry has
>   * exceeded its TTL and the userctx should be timed out and remaining
> @@ -293,6 +413,8 @@ static void cdma_timeout_handler(struct work_struct *work)
>  	/* stop HW, resetting channel/module */
>  	host1x_hw_cdma_freeze(host1x, cdma);
>  
> +	timeout_release_mlock(cdma);
> +
>  	host1x_cdma_update_sync_queue(cdma, ch->dev);
>  	mutex_unlock(&cdma->lock);
>  }
> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> index 246b78c41281..f80fb8be38e6 100644
> --- a/drivers/gpu/host1x/hw/channel_hw.c
> +++ b/drivers/gpu/host1x/hw/channel_hw.c
> @@ -89,6 +89,34 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
>  			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
>  }
>  
> +static void channel_submit_serialize(struct host1x_channel *ch,
> +				     struct host1x_syncpt *sp)
> +{
> +#if HOST1X_HW >= 6
> +	/*
> +	 * On T186, due to a hw issue, we need to issue an mlock acquire
> +	 * for HOST1X class. It is still interpreted as a no-op in
> +	 * hardware.
> +	 */
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_acquire_mlock(HOST1X_CLASS_HOST1X),
> +		host1x_opcode_setclass(HOST1X_CLASS_HOST1X, 0, 0));
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_nonincr(host1x_uclass_wait_syncpt_r(), 1),
> +		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
> +					      host1x_syncpt_read_max(sp)));
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_release_mlock(HOST1X_CLASS_HOST1X),
> +		HOST1X_OPCODE_NOP);
> +#else
> +	host1x_cdma_push(&ch->cdma,
> +		host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
> +				       host1x_uclass_wait_syncpt_r(), 1),
> +		host1x_class_host_wait_syncpt(host1x_syncpt_id(sp),
> +					      host1x_syncpt_read_max(sp)));
> +#endif
> +}
> +
>  static int channel_submit(struct host1x_job *job)
>  {
>  	struct host1x_channel *ch = job->channel;
> @@ -96,6 +124,7 @@ static int channel_submit(struct host1x_job *job)
>  	u32 user_syncpt_incrs = job->syncpt_incrs;
>  	u32 prev_max = 0;
>  	u32 syncval;
> +	u32 mlock;
>  	int err;
>  	struct host1x_waitlist *completed_waiter = NULL;
>  	struct host1x *host = dev_get_drvdata(ch->dev->parent);
> @@ -128,17 +157,12 @@ static int channel_submit(struct host1x_job *job)
>  		goto error;
>  	}
>  
> -	if (job->serialize) {
> -		/*
> -		 * Force serialization by inserting a host wait for the
> -		 * previous job to finish before this one can commence.
> -		 */
> -		host1x_cdma_push(&ch->cdma,
> -				 host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
> -					host1x_uclass_wait_syncpt_r(), 1),
> -				 host1x_class_host_wait_syncpt(job->syncpt_id,
> -					host1x_syncpt_read_max(sp)));
> -	}
> +	/*
> +	 * Force serialization by inserting a host wait for the
> +	 * previous job to finish before this one can commence.
> +	 */
> +	if (job->serialize)
> +		channel_submit_serialize(ch, sp);
>  
>  	/* Synchronize base register to allow using it for relative waiting */
>  	if (sp->base)
> @@ -149,16 +173,29 @@ static int channel_submit(struct host1x_job *job)
>  	/* assign syncpoint to channel */
>  	host1x_hw_syncpt_assign_channel(host, sp, ch);
>  
> -	job->syncpt_end = syncval;
> -
> -	/* add a setclass for modules that require it */
> -	if (job->class)
> +	/* acquire MLOCK and set channel class to specified class */
> +	mlock = HOST1X_HW >= 6 ? job->class : mlock_id_for_class(job->class);
> +	if (job->class) {
>  		host1x_cdma_push(&ch->cdma,
> -				 host1x_opcode_setclass(job->class, 0, 0),
> -				 HOST1X_OPCODE_NOP);
> +				 host1x_opcode_acquire_mlock(mlock),
> +				 host1x_opcode_setclass(job->class, 0, 0));
> +	}
>  
>  	submit_gathers(job);
>  
> +	if (job->class) {
> +		/*
> +		 * Push additional increment to catch jobs that crash before
> +		 * finishing their gather, not reaching the unlock opcode.
> +		 */
> +		syncval = host1x_syncpt_incr_max(sp, 1);
> +		host1x_cdma_push(&ch->cdma,
> +			host1x_opcode_imm_incr_syncpt(0, job->syncpt_id),
> +			host1x_opcode_release_mlock(mlock));
> +	}
> +
> +	job->syncpt_end = syncval;
> +
>  	/* end CDMA submit & stash pinned hMems into sync queue */
>  	host1x_cdma_end(&ch->cdma, job);
>  
> diff --git a/drivers/gpu/host1x/hw/host1x01_hardware.h b/drivers/gpu/host1x/hw/host1x01_hardware.h
> index 5f0fb866efa8..6a2c9f905acc 100644
> --- a/drivers/gpu/host1x/hw/host1x01_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x01_hardware.h
> @@ -138,6 +138,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x02_hardware.h b/drivers/gpu/host1x/hw/host1x02_hardware.h
> index 154901860bc6..c524c6c8d82f 100644
> --- a/drivers/gpu/host1x/hw/host1x02_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x02_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x04_hardware.h b/drivers/gpu/host1x/hw/host1x04_hardware.h
> index de1a38175328..8dedd77b7a1b 100644
> --- a/drivers/gpu/host1x/hw/host1x04_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x04_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x05_hardware.h b/drivers/gpu/host1x/hw/host1x05_hardware.h
> index 2937ebb6be11..6aafcb5e97e6 100644
> --- a/drivers/gpu/host1x/hw/host1x05_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x05_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
> index 3039c92ea605..60147d26ad9b 100644
> --- a/drivers/gpu/host1x/hw/host1x06_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
> @@ -137,6 +137,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>  	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>  }
>  
> +static inline u32 host1x_opcode_acquire_mlock(unsigned id)
> +{
> +	return (14 << 28) | id;
> +}
> +
> +static inline u32 host1x_opcode_release_mlock(unsigned id)
> +{
> +	return (14 << 28) | (1 << 24) | id;
> +}
> +
>  #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>  
>  #endif
> diff --git a/drivers/gpu/host1x/hw/hw_host1x01_sync.h b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> index 31238c285d46..4630fec2237a 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x01_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x02_sync.h b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> index 540c7b65995f..dec8e812a114 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x02_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x04_sync.h b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> index 3d6c8ec65934..9a951a4db07f 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x04_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x05_sync.h b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> index ca10eee5045c..e29e2e19a60b 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x05_sync.h
> @@ -125,6 +125,12 @@ static inline u32 host1x_sync_ip_busy_timeout_r(void)
>  }
>  #define HOST1X_SYNC_IP_BUSY_TIMEOUT \
>  	host1x_sync_ip_busy_timeout_r()
> +static inline u32 host1x_sync_mlock_r(unsigned int id)
> +{
> +	return 0x2c0 + id * REGISTER_STRIDE;
> +}
> +#define HOST1X_SYNC_MLOCK(id) \
> +	host1x_sync_mlock_r(id)
>  static inline u32 host1x_sync_mlock_owner_r(unsigned int id)
>  {
>  	return 0x340 + id * REGISTER_STRIDE;
> diff --git a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> index c05dab8a178b..be73530c3585 100644
> --- a/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> +++ b/drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h
> @@ -18,6 +18,11 @@
>  #define HOST1X_HV_SYNCPT_PROT_EN			0x1ac4
>  #define HOST1X_HV_SYNCPT_PROT_EN_CH_EN			BIT(1)
>  #define HOST1X_HV_CH_KERNEL_FILTER_GBUFFER(x)		(0x2020 + (x * 4))
> +#define HOST1X_HV_MLOCK(x)				(0x2030 + (x * 4))
> +#define HOST1X_HV_MLOCK_CH(x)				(((x) & 0x3f) << 2)
> +#define HOST1X_HV_MLOCK_CH_V(x)				(((x) >> 2) & 0x3f)
> +#define HOST1X_HV_MLOCK_LOCKED				BIT(0)
> +#define HOST1X_HV_MLOCK_LOCKED_V(x)			((x) & 0x1)
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL			0x233c
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL_ADDR(x)		(x)
>  #define HOST1X_HV_CMDFIFO_PEEK_CTRL_CHANNEL(x)		((x) << 16)
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-05 11:01   ` Mikko Perttunen
@ 2017-11-05 17:14       ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 17:14 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Add an option to host1x_channel_request to interruptibly wait for a
> free channel. This allows IOCTLs that acquire a channel to block
> the userspace.
> 

Wouldn't it be more optimal to request channel and block after job's pining,
when all patching and checks are completed? Note that right now we have locking
around submission in DRM, which I suppose should go away by making locking fine
grained.

Or maybe it would be more optimal to just iterate over channels, like I
suggested before [0]?

[0]
https://github.com/cyndis/linux/commit/9e6d87f40afb01fbe13ba65c73cb617bdfcd80b2#commitcomment-25012960

> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/gpu/drm/tegra/drm.c  |  9 +++++----
>  drivers/gpu/drm/tegra/gr2d.c |  6 +++---
>  drivers/gpu/drm/tegra/gr3d.c |  6 +++---
>  drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
>  drivers/gpu/host1x/channel.h |  1 +
>  include/linux/host1x.h       |  2 +-
>  6 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 658bc8814f38..19f77c1a76c0 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>   * Request a free hardware host1x channel for this user context, or if the
>   * context already has one, bump its refcount.
>   *
> - * Returns 0 on success, or -EBUSY if there were no free hardware channels.
> + * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
> + * or other error.
>   */
>  int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>  {
> @@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>  	mutex_lock(&context->lock);
>  
>  	if (context->pending_jobs == 0) {
> -		context->channel = host1x_channel_request(client->dev);
> -		if (!context->channel) {
> +		context->channel = host1x_channel_request(client->dev, true);
> +		if (IS_ERR(context->channel)) {
>  			mutex_unlock(&context->lock);
> -			return -EBUSY;
> +			return PTR_ERR(context->channel);
>  		}
>  	}
>  
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index 3db3bcac48b9..c1853402f69b 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>  	struct gr2d *gr2d = to_gr2d(drm);
>  
> -	gr2d->channel = host1x_channel_request(client->dev);
> -	if (!gr2d->channel)
> -		return -ENOMEM;
> +	gr2d->channel = host1x_channel_request(client->dev, false);
> +	if (IS_ERR(gr2d->channel))
> +		return PTR_ERR(gr2d->channel);
>  
>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>  	if (!client->syncpts[0]) {
> diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
> index 279438342c8c..793a91d577cb 100644
> --- a/drivers/gpu/drm/tegra/gr3d.c
> +++ b/drivers/gpu/drm/tegra/gr3d.c
> @@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>  	struct gr3d *gr3d = to_gr3d(drm);
>  
> -	gr3d->channel = host1x_channel_request(client->dev);
> -	if (!gr3d->channel)
> -		return -ENOMEM;
> +	gr3d->channel = host1x_channel_request(client->dev, false);
> +	if (IS_ERR(gr3d->channel))
> +		return PTR_ERR(gr3d->channel);
>  
>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>  	if (!client->syncpts[0]) {
> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> index 9d8cad12f9d8..eebcd51261df 100644
> --- a/drivers/gpu/host1x/channel.c
> +++ b/drivers/gpu/host1x/channel.c
> @@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
>  	bitmap_zero(chlist->allocated_channels, num_channels);
>  
>  	mutex_init(&chlist->lock);
> +	sema_init(&chlist->sema, num_channels);
>  
>  	return 0;
>  }
> @@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
>  	host1x_cdma_deinit(&channel->cdma);
>  
>  	clear_bit(channel->id, chlist->allocated_channels);
> +
> +	up(&chlist->sema);
>  }
>  
>  void host1x_channel_put(struct host1x_channel *channel)
> @@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
>  }
>  EXPORT_SYMBOL(host1x_channel_put);
>  
> -static struct host1x_channel *acquire_unused_channel(struct host1x *host)
> +static struct host1x_channel *acquire_unused_channel(struct host1x *host,
> +						     bool wait)
>  {
>  	struct host1x_channel_list *chlist = &host->channel_list;
>  	unsigned int max_channels = host->info->nb_channels;
>  	unsigned int index;
> +	int err;
> +
> +	if (wait) {
> +		err = down_interruptible(&chlist->sema);
> +		if (err)
> +			return ERR_PTR(err);
> +	} else {
> +		if (down_trylock(&chlist->sema))
> +			return ERR_PTR(-EBUSY);
> +	}
>  
>  	mutex_lock(&chlist->lock);
>  
>  	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
> -	if (index >= max_channels) {
> +	if (WARN(index >= max_channels, "failed to find free channel")) {
>  		mutex_unlock(&chlist->lock);
>  		dev_err(host->dev, "failed to find free channel\n");
> -		return NULL;
> +		return ERR_PTR(-EBUSY);
>  	}
>  
>  	chlist->channels[index].id = index;
> @@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>  /**
>   * host1x_channel_request() - Allocate a channel
>   * @device: Host1x unit this channel will be used to send commands to
> + * @wait: Whether to wait for a free channels if all are reserved
> + *
> + * Allocates a new host1x channel for @device. If all channels are in use,
> + * and @wait is true, does an interruptible wait until one is available.
>   *
> - * Allocates a new host1x channel for @device. May return NULL if CDMA
> - * initialization fails.
> + * If a channel was acquired, returns a pointer to it. Otherwise returns
> + * an error pointer with -EINTR if the wait was interrupted, -EBUSY
> + * if a channel could not be acquired or another error code if channel
> + * initialization failed.
>   */
> -struct host1x_channel *host1x_channel_request(struct device *dev)
> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
>  {
>  	struct host1x *host = dev_get_drvdata(dev->parent);
>  	struct host1x_channel_list *chlist = &host->channel_list;
>  	struct host1x_channel *channel;
>  	int err;
>  
> -	channel = acquire_unused_channel(host);
> -	if (!channel)
> -		return NULL;
> +	channel = acquire_unused_channel(host, wait);
> +	if (IS_ERR(channel))
> +		return channel;
>  
>  	kref_init(&channel->refcount);
>  	mutex_init(&channel->submitlock);
> @@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
>  
>  	dev_err(dev, "failed to initialize channel\n");
>  
> -	return NULL;
> +	return ERR_PTR(err);
>  }
>  EXPORT_SYMBOL(host1x_channel_request);
> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
> index e68a8ae9a670..1f5cf8029b62 100644
> --- a/drivers/gpu/host1x/channel.h
> +++ b/drivers/gpu/host1x/channel.h
> @@ -31,6 +31,7 @@ struct host1x_channel_list {
>  	struct host1x_channel *channels;
>  
>  	struct mutex lock;
> +	struct semaphore sema;
>  	unsigned long *allocated_channels;
>  };
>  
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index f931d28a68ff..2a34905d4408 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>  struct host1x_channel;
>  struct host1x_job;
>  
> -struct host1x_channel *host1x_channel_request(struct device *dev);
> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
>  struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
>  void host1x_channel_put(struct host1x_channel *channel);
>  int host1x_job_submit(struct host1x_job *job);
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-05 17:14       ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 17:14 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Add an option to host1x_channel_request to interruptibly wait for a
> free channel. This allows IOCTLs that acquire a channel to block
> the userspace.
> 

Wouldn't it be more optimal to request channel and block after job's pining,
when all patching and checks are completed? Note that right now we have locking
around submission in DRM, which I suppose should go away by making locking fine
grained.

Or maybe it would be more optimal to just iterate over channels, like I
suggested before [0]?

[0]
https://github.com/cyndis/linux/commit/9e6d87f40afb01fbe13ba65c73cb617bdfcd80b2#commitcomment-25012960

> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c  |  9 +++++----
>  drivers/gpu/drm/tegra/gr2d.c |  6 +++---
>  drivers/gpu/drm/tegra/gr3d.c |  6 +++---
>  drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
>  drivers/gpu/host1x/channel.h |  1 +
>  include/linux/host1x.h       |  2 +-
>  6 files changed, 43 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 658bc8814f38..19f77c1a76c0 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>   * Request a free hardware host1x channel for this user context, or if the
>   * context already has one, bump its refcount.
>   *
> - * Returns 0 on success, or -EBUSY if there were no free hardware channels.
> + * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
> + * or other error.
>   */
>  int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>  {
> @@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>  	mutex_lock(&context->lock);
>  
>  	if (context->pending_jobs == 0) {
> -		context->channel = host1x_channel_request(client->dev);
> -		if (!context->channel) {
> +		context->channel = host1x_channel_request(client->dev, true);
> +		if (IS_ERR(context->channel)) {
>  			mutex_unlock(&context->lock);
> -			return -EBUSY;
> +			return PTR_ERR(context->channel);
>  		}
>  	}
>  
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index 3db3bcac48b9..c1853402f69b 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>  	struct gr2d *gr2d = to_gr2d(drm);
>  
> -	gr2d->channel = host1x_channel_request(client->dev);
> -	if (!gr2d->channel)
> -		return -ENOMEM;
> +	gr2d->channel = host1x_channel_request(client->dev, false);
> +	if (IS_ERR(gr2d->channel))
> +		return PTR_ERR(gr2d->channel);
>  
>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>  	if (!client->syncpts[0]) {
> diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
> index 279438342c8c..793a91d577cb 100644
> --- a/drivers/gpu/drm/tegra/gr3d.c
> +++ b/drivers/gpu/drm/tegra/gr3d.c
> @@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>  	struct gr3d *gr3d = to_gr3d(drm);
>  
> -	gr3d->channel = host1x_channel_request(client->dev);
> -	if (!gr3d->channel)
> -		return -ENOMEM;
> +	gr3d->channel = host1x_channel_request(client->dev, false);
> +	if (IS_ERR(gr3d->channel))
> +		return PTR_ERR(gr3d->channel);
>  
>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>  	if (!client->syncpts[0]) {
> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> index 9d8cad12f9d8..eebcd51261df 100644
> --- a/drivers/gpu/host1x/channel.c
> +++ b/drivers/gpu/host1x/channel.c
> @@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
>  	bitmap_zero(chlist->allocated_channels, num_channels);
>  
>  	mutex_init(&chlist->lock);
> +	sema_init(&chlist->sema, num_channels);
>  
>  	return 0;
>  }
> @@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
>  	host1x_cdma_deinit(&channel->cdma);
>  
>  	clear_bit(channel->id, chlist->allocated_channels);
> +
> +	up(&chlist->sema);
>  }
>  
>  void host1x_channel_put(struct host1x_channel *channel)
> @@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
>  }
>  EXPORT_SYMBOL(host1x_channel_put);
>  
> -static struct host1x_channel *acquire_unused_channel(struct host1x *host)
> +static struct host1x_channel *acquire_unused_channel(struct host1x *host,
> +						     bool wait)
>  {
>  	struct host1x_channel_list *chlist = &host->channel_list;
>  	unsigned int max_channels = host->info->nb_channels;
>  	unsigned int index;
> +	int err;
> +
> +	if (wait) {
> +		err = down_interruptible(&chlist->sema);
> +		if (err)
> +			return ERR_PTR(err);
> +	} else {
> +		if (down_trylock(&chlist->sema))
> +			return ERR_PTR(-EBUSY);
> +	}
>  
>  	mutex_lock(&chlist->lock);
>  
>  	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
> -	if (index >= max_channels) {
> +	if (WARN(index >= max_channels, "failed to find free channel")) {
>  		mutex_unlock(&chlist->lock);
>  		dev_err(host->dev, "failed to find free channel\n");
> -		return NULL;
> +		return ERR_PTR(-EBUSY);
>  	}
>  
>  	chlist->channels[index].id = index;
> @@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>  /**
>   * host1x_channel_request() - Allocate a channel
>   * @device: Host1x unit this channel will be used to send commands to
> + * @wait: Whether to wait for a free channels if all are reserved
> + *
> + * Allocates a new host1x channel for @device. If all channels are in use,
> + * and @wait is true, does an interruptible wait until one is available.
>   *
> - * Allocates a new host1x channel for @device. May return NULL if CDMA
> - * initialization fails.
> + * If a channel was acquired, returns a pointer to it. Otherwise returns
> + * an error pointer with -EINTR if the wait was interrupted, -EBUSY
> + * if a channel could not be acquired or another error code if channel
> + * initialization failed.
>   */
> -struct host1x_channel *host1x_channel_request(struct device *dev)
> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
>  {
>  	struct host1x *host = dev_get_drvdata(dev->parent);
>  	struct host1x_channel_list *chlist = &host->channel_list;
>  	struct host1x_channel *channel;
>  	int err;
>  
> -	channel = acquire_unused_channel(host);
> -	if (!channel)
> -		return NULL;
> +	channel = acquire_unused_channel(host, wait);
> +	if (IS_ERR(channel))
> +		return channel;
>  
>  	kref_init(&channel->refcount);
>  	mutex_init(&channel->submitlock);
> @@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
>  
>  	dev_err(dev, "failed to initialize channel\n");
>  
> -	return NULL;
> +	return ERR_PTR(err);
>  }
>  EXPORT_SYMBOL(host1x_channel_request);
> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
> index e68a8ae9a670..1f5cf8029b62 100644
> --- a/drivers/gpu/host1x/channel.h
> +++ b/drivers/gpu/host1x/channel.h
> @@ -31,6 +31,7 @@ struct host1x_channel_list {
>  	struct host1x_channel *channels;
>  
>  	struct mutex lock;
> +	struct semaphore sema;
>  	unsigned long *allocated_channels;
>  };
>  
> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
> index f931d28a68ff..2a34905d4408 100644
> --- a/include/linux/host1x.h
> +++ b/include/linux/host1x.h
> @@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>  struct host1x_channel;
>  struct host1x_job;
>  
> -struct host1x_channel *host1x_channel_request(struct device *dev);
> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
>  struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
>  void host1x_channel_put(struct host1x_channel *channel);
>  int host1x_job_submit(struct host1x_job *job);
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
  2017-11-05 11:01 ` [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model Mikko Perttunen
@ 2017-11-05 17:43       ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 17:43 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 14:01, Mikko Perttunen wrote:
> In the traditional channel allocation model, a single hardware channel
> was allocated for each client. This is simple from an implementation
> perspective but prevents use of hardware scheduling.
> 
> This patch implements a channel allocation model where when a user
> submits a job for a context, a hardware channel is allocated for
> that context. The same channel is kept for as long as there are
> incomplete jobs for that context. This way we can use hardware
> scheduling and channel isolation between userspace processes, but
> also prevent idling contexts from taking up hardware resources.
> 

The dynamic channels resources (pushbuf) allocation is very expensive,
neglecting all benefits that this model should bring at least in non-IOMMU case.
We could have statically preallocated channels resources or defer resources freeing.

> For now, this patch only adapts VIC to the new model.
> 

I think VIC's conversion should be a distinct patch.

> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/gpu/drm/tegra/drm.c | 46 ++++++++++++++++++++++++++
>  drivers/gpu/drm/tegra/drm.h |  7 +++-
>  drivers/gpu/drm/tegra/vic.c | 79 +++++++++++++++++++++++----------------------
>  3 files changed, 92 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index b964e18e3058..658bc8814f38 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -382,6 +382,51 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>  	return 0;
>  }
>  
> +/**
> + * tegra_drm_context_get_channel() - Get a channel for submissions
> + * @context: Context for which to get a channel for
> + *
> + * Request a free hardware host1x channel for this user context, or if the
> + * context already has one, bump its refcount.
> + *
> + * Returns 0 on success, or -EBUSY if there were no free hardware channels.
> + */
> +int tegra_drm_context_get_channel(struct tegra_drm_context *context)
> +{
> +	struct host1x_client *client = &context->client->base;
> +
> +	mutex_lock(&context->lock);
> +
> +	if (context->pending_jobs == 0) {
> +		context->channel = host1x_channel_request(client->dev);
> +		if (!context->channel) {
> +			mutex_unlock(&context->lock);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	context->pending_jobs++;
> +
> +	mutex_unlock(&context->lock);
> +
> +	return 0;
> +}
> +
> +/**
> + * tegra_drm_context_put_channel() - Put a previously gotten channel
> + * @context: Context which channel is no longer needed
> + *
> + * Decrease the refcount of the channel associated with this context,
> + * freeing it if the refcount drops to zero.
> + */
> +void tegra_drm_context_put_channel(struct tegra_drm_context *context)
> +{
> +	mutex_lock(&context->lock);
> +	if (--context->pending_jobs == 0)
> +		host1x_channel_put(context->channel);
> +	mutex_unlock(&context->lock);
> +}
> +
>  static void tegra_drm_job_done(struct host1x_job *job)
>  {
>  	struct tegra_drm_context *context = job->callback_data;
> @@ -737,6 +782,7 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>  		kfree(context);
>  
>  	kref_init(&context->ref);
> +	mutex_init(&context->lock);
>  
>  	mutex_unlock(&fpriv->lock);
>  	return err;
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 11d690846fd0..d0c3f1f779f6 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -78,9 +78,12 @@ struct tegra_drm_context {
>  	struct kref ref;
>  
>  	struct tegra_drm_client *client;
> +	unsigned int id;
> +
> +	struct mutex lock;
>  	struct host1x_channel *channel;
>  	struct host1x_syncpt *syncpt;
> -	unsigned int id;
> +	unsigned int pending_jobs;
>  };
>  
>  struct tegra_drm_client_ops {
> @@ -95,6 +98,8 @@ struct tegra_drm_client_ops {
>  	void (*submit_done)(struct tegra_drm_context *context);
>  };
>  
> +int tegra_drm_context_get_channel(struct tegra_drm_context *context);
> +void tegra_drm_context_put_channel(struct tegra_drm_context *context);
>  int tegra_drm_submit(struct tegra_drm_context *context,
>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>  		     struct drm_file *file);
> diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
> index efe5f3af933e..0cacf023a890 100644
> --- a/drivers/gpu/drm/tegra/vic.c
> +++ b/drivers/gpu/drm/tegra/vic.c
> @@ -33,7 +33,6 @@ struct vic {
>  
>  	void __iomem *regs;
>  	struct tegra_drm_client client;
> -	struct host1x_channel *channel;
>  	struct iommu_domain *domain;
>  	struct device *dev;
>  	struct clk *clk;
> @@ -161,28 +160,12 @@ static int vic_init(struct host1x_client *client)
>  			goto detach_device;
>  	}
>  
> -	vic->channel = host1x_channel_request(client->dev);
> -	if (!vic->channel) {
> -		err = -ENOMEM;
> -		goto detach_device;
> -	}
> -
> -	client->syncpts[0] = host1x_syncpt_request(client->dev, 0);
> -	if (!client->syncpts[0]) {
> -		err = -ENOMEM;
> -		goto free_channel;
> -	}
> -
>  	err = tegra_drm_register_client(tegra, drm);
>  	if (err < 0)
> -		goto free_syncpt;
> +		goto detach_device;
>  
>  	return 0;
>  
> -free_syncpt:
> -	host1x_syncpt_free(client->syncpts[0]);
> -free_channel:
> -	host1x_channel_put(vic->channel);
>  detach_device:
>  	if (tegra->domain)
>  		iommu_detach_device(tegra->domain, vic->dev);
> @@ -202,9 +185,6 @@ static int vic_exit(struct host1x_client *client)
>  	if (err < 0)
>  		return err;
>  
> -	host1x_syncpt_free(client->syncpts[0]);
> -	host1x_channel_put(vic->channel);
> -
>  	if (vic->domain) {
>  		iommu_detach_device(vic->domain, vic->dev);
>  		vic->domain = NULL;
> @@ -221,7 +201,24 @@ static const struct host1x_client_ops vic_client_ops = {
>  static int vic_open_channel(struct tegra_drm_client *client,
>  			    struct tegra_drm_context *context)
>  {
> -	struct vic *vic = to_vic(client);
> +	context->syncpt = host1x_syncpt_request(client->base.dev, 0);
> +	if (!context->syncpt)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void vic_close_channel(struct tegra_drm_context *context)
> +{
> +	host1x_syncpt_free(context->syncpt);
> +}
> +
> +static int vic_submit(struct tegra_drm_context *context,
> +		      struct drm_tegra_submit *args, struct drm_device *drm,
> +		      struct drm_file *file)
> +{
> +	struct host1x_client *client = &context->client->base;
> +	struct vic *vic = dev_get_drvdata(client->dev);
>  	int err;
>  
>  	err = pm_runtime_get_sync(vic->dev);
> @@ -229,35 +226,41 @@ static int vic_open_channel(struct tegra_drm_client *client,
>  		return err;
>  
>  	err = vic_boot(vic);
> -	if (err < 0) {
> -		pm_runtime_put(vic->dev);
> -		return err;
> -	}
> +	if (err < 0)
> +		goto put_vic;
>  
> -	context->channel = host1x_channel_get(vic->channel);
> -	if (!context->channel) {
> -		pm_runtime_put(vic->dev);
> -		return -ENOMEM;
> -	}
> +	err = tegra_drm_context_get_channel(context);
> +	if (err < 0)
> +		goto put_vic;
>  
> -	context->syncpt = client->base.syncpts[0];
> +	err = tegra_drm_submit(context, args, drm, file);
> +	if (err)
> +		goto put_channel;
>  
>  	return 0;
> +
> +put_channel:
> +	tegra_drm_context_put_channel(context);
> +put_vic:
> +	pm_runtime_put(vic->dev);
> +
> +	return err;
>  }
>  
> -static void vic_close_channel(struct tegra_drm_context *context)
> +static void vic_submit_done(struct tegra_drm_context *context)
>  {
> -	struct vic *vic = to_vic(context->client);
> -
> -	host1x_channel_put(context->channel);
> +	struct host1x_client *client = &context->client->base;
> +	struct vic *vic = dev_get_drvdata(client->dev);
>  
> +	tegra_drm_context_put_channel(context);
>  	pm_runtime_put(vic->dev);
>  }
>  
>  static const struct tegra_drm_client_ops vic_ops = {
>  	.open_channel = vic_open_channel,
>  	.close_channel = vic_close_channel,
> -	.submit = tegra_drm_submit,
> +	.submit = vic_submit,
> +	.submit_done = vic_submit_done,
>  };
>  
>  #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
> @@ -340,8 +343,6 @@ static int vic_probe(struct platform_device *pdev)
>  	vic->client.base.ops = &vic_client_ops;
>  	vic->client.base.dev = dev;
>  	vic->client.base.class = HOST1X_CLASS_VIC;
> -	vic->client.base.syncpts = syncpts;
> -	vic->client.base.num_syncpts = 1;
>  	vic->dev = dev;
>  	vic->config = vic_config;
>  
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
@ 2017-11-05 17:43       ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-05 17:43 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> In the traditional channel allocation model, a single hardware channel
> was allocated for each client. This is simple from an implementation
> perspective but prevents use of hardware scheduling.
> 
> This patch implements a channel allocation model where when a user
> submits a job for a context, a hardware channel is allocated for
> that context. The same channel is kept for as long as there are
> incomplete jobs for that context. This way we can use hardware
> scheduling and channel isolation between userspace processes, but
> also prevent idling contexts from taking up hardware resources.
> 

The dynamic channels resources (pushbuf) allocation is very expensive,
neglecting all benefits that this model should bring at least in non-IOMMU case.
We could have statically preallocated channels resources or defer resources freeing.

> For now, this patch only adapts VIC to the new model.
> 

I think VIC's conversion should be a distinct patch.

> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 46 ++++++++++++++++++++++++++
>  drivers/gpu/drm/tegra/drm.h |  7 +++-
>  drivers/gpu/drm/tegra/vic.c | 79 +++++++++++++++++++++++----------------------
>  3 files changed, 92 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index b964e18e3058..658bc8814f38 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -382,6 +382,51 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>  	return 0;
>  }
>  
> +/**
> + * tegra_drm_context_get_channel() - Get a channel for submissions
> + * @context: Context for which to get a channel for
> + *
> + * Request a free hardware host1x channel for this user context, or if the
> + * context already has one, bump its refcount.
> + *
> + * Returns 0 on success, or -EBUSY if there were no free hardware channels.
> + */
> +int tegra_drm_context_get_channel(struct tegra_drm_context *context)
> +{
> +	struct host1x_client *client = &context->client->base;
> +
> +	mutex_lock(&context->lock);
> +
> +	if (context->pending_jobs == 0) {
> +		context->channel = host1x_channel_request(client->dev);
> +		if (!context->channel) {
> +			mutex_unlock(&context->lock);
> +			return -EBUSY;
> +		}
> +	}
> +
> +	context->pending_jobs++;
> +
> +	mutex_unlock(&context->lock);
> +
> +	return 0;
> +}
> +
> +/**
> + * tegra_drm_context_put_channel() - Put a previously gotten channel
> + * @context: Context which channel is no longer needed
> + *
> + * Decrease the refcount of the channel associated with this context,
> + * freeing it if the refcount drops to zero.
> + */
> +void tegra_drm_context_put_channel(struct tegra_drm_context *context)
> +{
> +	mutex_lock(&context->lock);
> +	if (--context->pending_jobs == 0)
> +		host1x_channel_put(context->channel);
> +	mutex_unlock(&context->lock);
> +}
> +
>  static void tegra_drm_job_done(struct host1x_job *job)
>  {
>  	struct tegra_drm_context *context = job->callback_data;
> @@ -737,6 +782,7 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>  		kfree(context);
>  
>  	kref_init(&context->ref);
> +	mutex_init(&context->lock);
>  
>  	mutex_unlock(&fpriv->lock);
>  	return err;
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 11d690846fd0..d0c3f1f779f6 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -78,9 +78,12 @@ struct tegra_drm_context {
>  	struct kref ref;
>  
>  	struct tegra_drm_client *client;
> +	unsigned int id;
> +
> +	struct mutex lock;
>  	struct host1x_channel *channel;
>  	struct host1x_syncpt *syncpt;
> -	unsigned int id;
> +	unsigned int pending_jobs;
>  };
>  
>  struct tegra_drm_client_ops {
> @@ -95,6 +98,8 @@ struct tegra_drm_client_ops {
>  	void (*submit_done)(struct tegra_drm_context *context);
>  };
>  
> +int tegra_drm_context_get_channel(struct tegra_drm_context *context);
> +void tegra_drm_context_put_channel(struct tegra_drm_context *context);
>  int tegra_drm_submit(struct tegra_drm_context *context,
>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>  		     struct drm_file *file);
> diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
> index efe5f3af933e..0cacf023a890 100644
> --- a/drivers/gpu/drm/tegra/vic.c
> +++ b/drivers/gpu/drm/tegra/vic.c
> @@ -33,7 +33,6 @@ struct vic {
>  
>  	void __iomem *regs;
>  	struct tegra_drm_client client;
> -	struct host1x_channel *channel;
>  	struct iommu_domain *domain;
>  	struct device *dev;
>  	struct clk *clk;
> @@ -161,28 +160,12 @@ static int vic_init(struct host1x_client *client)
>  			goto detach_device;
>  	}
>  
> -	vic->channel = host1x_channel_request(client->dev);
> -	if (!vic->channel) {
> -		err = -ENOMEM;
> -		goto detach_device;
> -	}
> -
> -	client->syncpts[0] = host1x_syncpt_request(client->dev, 0);
> -	if (!client->syncpts[0]) {
> -		err = -ENOMEM;
> -		goto free_channel;
> -	}
> -
>  	err = tegra_drm_register_client(tegra, drm);
>  	if (err < 0)
> -		goto free_syncpt;
> +		goto detach_device;
>  
>  	return 0;
>  
> -free_syncpt:
> -	host1x_syncpt_free(client->syncpts[0]);
> -free_channel:
> -	host1x_channel_put(vic->channel);
>  detach_device:
>  	if (tegra->domain)
>  		iommu_detach_device(tegra->domain, vic->dev);
> @@ -202,9 +185,6 @@ static int vic_exit(struct host1x_client *client)
>  	if (err < 0)
>  		return err;
>  
> -	host1x_syncpt_free(client->syncpts[0]);
> -	host1x_channel_put(vic->channel);
> -
>  	if (vic->domain) {
>  		iommu_detach_device(vic->domain, vic->dev);
>  		vic->domain = NULL;
> @@ -221,7 +201,24 @@ static const struct host1x_client_ops vic_client_ops = {
>  static int vic_open_channel(struct tegra_drm_client *client,
>  			    struct tegra_drm_context *context)
>  {
> -	struct vic *vic = to_vic(client);
> +	context->syncpt = host1x_syncpt_request(client->base.dev, 0);
> +	if (!context->syncpt)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void vic_close_channel(struct tegra_drm_context *context)
> +{
> +	host1x_syncpt_free(context->syncpt);
> +}
> +
> +static int vic_submit(struct tegra_drm_context *context,
> +		      struct drm_tegra_submit *args, struct drm_device *drm,
> +		      struct drm_file *file)
> +{
> +	struct host1x_client *client = &context->client->base;
> +	struct vic *vic = dev_get_drvdata(client->dev);
>  	int err;
>  
>  	err = pm_runtime_get_sync(vic->dev);
> @@ -229,35 +226,41 @@ static int vic_open_channel(struct tegra_drm_client *client,
>  		return err;
>  
>  	err = vic_boot(vic);
> -	if (err < 0) {
> -		pm_runtime_put(vic->dev);
> -		return err;
> -	}
> +	if (err < 0)
> +		goto put_vic;
>  
> -	context->channel = host1x_channel_get(vic->channel);
> -	if (!context->channel) {
> -		pm_runtime_put(vic->dev);
> -		return -ENOMEM;
> -	}
> +	err = tegra_drm_context_get_channel(context);
> +	if (err < 0)
> +		goto put_vic;
>  
> -	context->syncpt = client->base.syncpts[0];
> +	err = tegra_drm_submit(context, args, drm, file);
> +	if (err)
> +		goto put_channel;
>  
>  	return 0;
> +
> +put_channel:
> +	tegra_drm_context_put_channel(context);
> +put_vic:
> +	pm_runtime_put(vic->dev);
> +
> +	return err;
>  }
>  
> -static void vic_close_channel(struct tegra_drm_context *context)
> +static void vic_submit_done(struct tegra_drm_context *context)
>  {
> -	struct vic *vic = to_vic(context->client);
> -
> -	host1x_channel_put(context->channel);
> +	struct host1x_client *client = &context->client->base;
> +	struct vic *vic = dev_get_drvdata(client->dev);
>  
> +	tegra_drm_context_put_channel(context);
>  	pm_runtime_put(vic->dev);
>  }
>  
>  static const struct tegra_drm_client_ops vic_ops = {
>  	.open_channel = vic_open_channel,
>  	.close_channel = vic_close_channel,
> -	.submit = tegra_drm_submit,
> +	.submit = vic_submit,
> +	.submit_done = vic_submit_done,
>  };
>  
>  #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
> @@ -340,8 +343,6 @@ static int vic_probe(struct platform_device *pdev)
>  	vic->client.base.ops = &vic_client_ops;
>  	vic->client.base.dev = dev;
>  	vic->client.base.class = HOST1X_CLASS_VIC;
> -	vic->client.base.syncpts = syncpts;
> -	vic->client.base.num_syncpts = 1;
>  	vic->dev = dev;
>  	vic->config = vic_config;
>  
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-11-05 16:46       ` Dmitry Osipenko
  (?)
@ 2017-11-07 12:28       ` Mikko Perttunen
       [not found]         ` <ef08d3d8-94a7-8804-c339-5310719333f3-/1wQRMveznE@public.gmane.org>
  -1 siblings, 1 reply; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-07 12:28 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 18:46, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
 >> ...
>>
>> +static int mlock_id_for_class(unsigned int class)
>> +{
>> +#if HOST1X_HW >= 6
>> +	switch (class)
>> +	{
>> +	case HOST1X_CLASS_HOST1X:
>> +		return 0;
>> +	case HOST1X_CLASS_VIC:
>> +		return 17;
>
> What is the meaning of returned ID values that you have defined here? Why VIC
> should have different ID on T186?

On T186, MLOCKs are not "generic" - the HW knows that each MLOCK 
corresponds to a specific class. Therefore we must map that correctly.

>
>> +	default:
>> +		return -EINVAL;
>> +	}
>> +#else
>> +	switch (class)
>> +	{
>> +	case HOST1X_CLASS_HOST1X:
>> +		return 0;
>> +	case HOST1X_CLASS_GR2D:
>> +		return 1;
>> +	case HOST1X_CLASS_GR2D_SB:
>> +		return 2;
>
> Note that we are allowing to switch 2d classes in the same jobs context and
> currently jobs class is somewhat hardcoded to GR2D.
>
> Even though that GR2D and GR2D_SB use different register banks, is it okay to
> trigger execution of different classes simultaneously? Would syncpoint
> differentiate classes on OP_DONE event?

Good point, we might need to use the same lock for these two.

>
> I suppose that MLOCK (the module lock) implies the whole module locking,
> wouldn't it make sense to just use the module ID's defined in the TRM?

Can you point out where these are defined?

Mikko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
  2017-11-05 17:43       ` Dmitry Osipenko
  (?)
@ 2017-11-07 12:29       ` Mikko Perttunen
       [not found]         ` <38fcf947-0d5d-e2c7-f49f-9efce5eeb1a3-/1wQRMveznE@public.gmane.org>
  -1 siblings, 1 reply; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-07 12:29 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 19:43, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
>> In the traditional channel allocation model, a single hardware channel
>> was allocated for each client. This is simple from an implementation
>> perspective but prevents use of hardware scheduling.
>>
>> This patch implements a channel allocation model where when a user
>> submits a job for a context, a hardware channel is allocated for
>> that context. The same channel is kept for as long as there are
>> incomplete jobs for that context. This way we can use hardware
>> scheduling and channel isolation between userspace processes, but
>> also prevent idling contexts from taking up hardware resources.
>>
>
> The dynamic channels resources (pushbuf) allocation is very expensive,
> neglecting all benefits that this model should bring at least in non-IOMMU case.
> We could have statically preallocated channels resources or defer resources freeing.

This is true. I'll try to figure out a nice way to keep the pushbuf 
allocations.

>
>> For now, this patch only adapts VIC to the new model.
>>
>
> I think VIC's conversion should be a distinct patch.

Sure.

Cheers,
Mikko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-05 17:14       ` Dmitry Osipenko
@ 2017-11-07 13:11           ` Mikko Perttunen
  -1 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-07 13:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 19:14, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
>> Add an option to host1x_channel_request to interruptibly wait for a
>> free channel. This allows IOCTLs that acquire a channel to block
>> the userspace.
>>
>
> Wouldn't it be more optimal to request channel and block after job's pining,
> when all patching and checks are completed? Note that right now we have locking
> around submission in DRM, which I suppose should go away by making locking fine
> grained.

That would be possible, but I don't think it should matter much since 
contention here should not be the common case.

>
> Or maybe it would be more optimal to just iterate over channels, like I
> suggested before [0]?

Somehow I hadn't noticed this before, but this would break the invariant 
of having one client/class per channel.

In general since we haven't seen any issues downstream with the model 
implemented here, I'd like to try to go with this and if we have 
problems with channel allocation then we could revisit.

Mikko

>
> [0]
> https://github.com/cyndis/linux/commit/9e6d87f40afb01fbe13ba65c73cb617bdfcd80b2#commitcomment-25012960
>
>> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> ---
>>  drivers/gpu/drm/tegra/drm.c  |  9 +++++----
>>  drivers/gpu/drm/tegra/gr2d.c |  6 +++---
>>  drivers/gpu/drm/tegra/gr3d.c |  6 +++---
>>  drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
>>  drivers/gpu/host1x/channel.h |  1 +
>>  include/linux/host1x.h       |  2 +-
>>  6 files changed, 43 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 658bc8814f38..19f77c1a76c0 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>>   * Request a free hardware host1x channel for this user context, or if the
>>   * context already has one, bump its refcount.
>>   *
>> - * Returns 0 on success, or -EBUSY if there were no free hardware channels.
>> + * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
>> + * or other error.
>>   */
>>  int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>>  {
>> @@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>>  	mutex_lock(&context->lock);
>>
>>  	if (context->pending_jobs == 0) {
>> -		context->channel = host1x_channel_request(client->dev);
>> -		if (!context->channel) {
>> +		context->channel = host1x_channel_request(client->dev, true);
>> +		if (IS_ERR(context->channel)) {
>>  			mutex_unlock(&context->lock);
>> -			return -EBUSY;
>> +			return PTR_ERR(context->channel);
>>  		}
>>  	}
>>
>> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
>> index 3db3bcac48b9..c1853402f69b 100644
>> --- a/drivers/gpu/drm/tegra/gr2d.c
>> +++ b/drivers/gpu/drm/tegra/gr2d.c
>> @@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
>>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>  	struct gr2d *gr2d = to_gr2d(drm);
>>
>> -	gr2d->channel = host1x_channel_request(client->dev);
>> -	if (!gr2d->channel)
>> -		return -ENOMEM;
>> +	gr2d->channel = host1x_channel_request(client->dev, false);
>> +	if (IS_ERR(gr2d->channel))
>> +		return PTR_ERR(gr2d->channel);
>>
>>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>  	if (!client->syncpts[0]) {
>> diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
>> index 279438342c8c..793a91d577cb 100644
>> --- a/drivers/gpu/drm/tegra/gr3d.c
>> +++ b/drivers/gpu/drm/tegra/gr3d.c
>> @@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
>>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>  	struct gr3d *gr3d = to_gr3d(drm);
>>
>> -	gr3d->channel = host1x_channel_request(client->dev);
>> -	if (!gr3d->channel)
>> -		return -ENOMEM;
>> +	gr3d->channel = host1x_channel_request(client->dev, false);
>> +	if (IS_ERR(gr3d->channel))
>> +		return PTR_ERR(gr3d->channel);
>>
>>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>  	if (!client->syncpts[0]) {
>> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
>> index 9d8cad12f9d8..eebcd51261df 100644
>> --- a/drivers/gpu/host1x/channel.c
>> +++ b/drivers/gpu/host1x/channel.c
>> @@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
>>  	bitmap_zero(chlist->allocated_channels, num_channels);
>>
>>  	mutex_init(&chlist->lock);
>> +	sema_init(&chlist->sema, num_channels);
>>
>>  	return 0;
>>  }
>> @@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
>>  	host1x_cdma_deinit(&channel->cdma);
>>
>>  	clear_bit(channel->id, chlist->allocated_channels);
>> +
>> +	up(&chlist->sema);
>>  }
>>
>>  void host1x_channel_put(struct host1x_channel *channel)
>> @@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
>>  }
>>  EXPORT_SYMBOL(host1x_channel_put);
>>
>> -static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>> +static struct host1x_channel *acquire_unused_channel(struct host1x *host,
>> +						     bool wait)
>>  {
>>  	struct host1x_channel_list *chlist = &host->channel_list;
>>  	unsigned int max_channels = host->info->nb_channels;
>>  	unsigned int index;
>> +	int err;
>> +
>> +	if (wait) {
>> +		err = down_interruptible(&chlist->sema);
>> +		if (err)
>> +			return ERR_PTR(err);
>> +	} else {
>> +		if (down_trylock(&chlist->sema))
>> +			return ERR_PTR(-EBUSY);
>> +	}
>>
>>  	mutex_lock(&chlist->lock);
>>
>>  	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
>> -	if (index >= max_channels) {
>> +	if (WARN(index >= max_channels, "failed to find free channel")) {
>>  		mutex_unlock(&chlist->lock);
>>  		dev_err(host->dev, "failed to find free channel\n");
>> -		return NULL;
>> +		return ERR_PTR(-EBUSY);
>>  	}
>>
>>  	chlist->channels[index].id = index;
>> @@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>>  /**
>>   * host1x_channel_request() - Allocate a channel
>>   * @device: Host1x unit this channel will be used to send commands to
>> + * @wait: Whether to wait for a free channels if all are reserved
>> + *
>> + * Allocates a new host1x channel for @device. If all channels are in use,
>> + * and @wait is true, does an interruptible wait until one is available.
>>   *
>> - * Allocates a new host1x channel for @device. May return NULL if CDMA
>> - * initialization fails.
>> + * If a channel was acquired, returns a pointer to it. Otherwise returns
>> + * an error pointer with -EINTR if the wait was interrupted, -EBUSY
>> + * if a channel could not be acquired or another error code if channel
>> + * initialization failed.
>>   */
>> -struct host1x_channel *host1x_channel_request(struct device *dev)
>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
>>  {
>>  	struct host1x *host = dev_get_drvdata(dev->parent);
>>  	struct host1x_channel_list *chlist = &host->channel_list;
>>  	struct host1x_channel *channel;
>>  	int err;
>>
>> -	channel = acquire_unused_channel(host);
>> -	if (!channel)
>> -		return NULL;
>> +	channel = acquire_unused_channel(host, wait);
>> +	if (IS_ERR(channel))
>> +		return channel;
>>
>>  	kref_init(&channel->refcount);
>>  	mutex_init(&channel->submitlock);
>> @@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
>>
>>  	dev_err(dev, "failed to initialize channel\n");
>>
>> -	return NULL;
>> +	return ERR_PTR(err);
>>  }
>>  EXPORT_SYMBOL(host1x_channel_request);
>> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
>> index e68a8ae9a670..1f5cf8029b62 100644
>> --- a/drivers/gpu/host1x/channel.h
>> +++ b/drivers/gpu/host1x/channel.h
>> @@ -31,6 +31,7 @@ struct host1x_channel_list {
>>  	struct host1x_channel *channels;
>>
>>  	struct mutex lock;
>> +	struct semaphore sema;
>>  	unsigned long *allocated_channels;
>>  };
>>
>> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
>> index f931d28a68ff..2a34905d4408 100644
>> --- a/include/linux/host1x.h
>> +++ b/include/linux/host1x.h
>> @@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>>  struct host1x_channel;
>>  struct host1x_job;
>>
>> -struct host1x_channel *host1x_channel_request(struct device *dev);
>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
>>  struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
>>  void host1x_channel_put(struct host1x_channel *channel);
>>  int host1x_job_submit(struct host1x_job *job);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-07 13:11           ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-07 13:11 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 19:14, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
>> Add an option to host1x_channel_request to interruptibly wait for a
>> free channel. This allows IOCTLs that acquire a channel to block
>> the userspace.
>>
>
> Wouldn't it be more optimal to request channel and block after job's pining,
> when all patching and checks are completed? Note that right now we have locking
> around submission in DRM, which I suppose should go away by making locking fine
> grained.

That would be possible, but I don't think it should matter much since 
contention here should not be the common case.

>
> Or maybe it would be more optimal to just iterate over channels, like I
> suggested before [0]?

Somehow I hadn't noticed this before, but this would break the invariant 
of having one client/class per channel.

In general since we haven't seen any issues downstream with the model 
implemented here, I'd like to try to go with this and if we have 
problems with channel allocation then we could revisit.

Mikko

>
> [0]
> https://github.com/cyndis/linux/commit/9e6d87f40afb01fbe13ba65c73cb617bdfcd80b2#commitcomment-25012960
>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>  drivers/gpu/drm/tegra/drm.c  |  9 +++++----
>>  drivers/gpu/drm/tegra/gr2d.c |  6 +++---
>>  drivers/gpu/drm/tegra/gr3d.c |  6 +++---
>>  drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
>>  drivers/gpu/host1x/channel.h |  1 +
>>  include/linux/host1x.h       |  2 +-
>>  6 files changed, 43 insertions(+), 21 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 658bc8814f38..19f77c1a76c0 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>>   * Request a free hardware host1x channel for this user context, or if the
>>   * context already has one, bump its refcount.
>>   *
>> - * Returns 0 on success, or -EBUSY if there were no free hardware channels.
>> + * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
>> + * or other error.
>>   */
>>  int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>>  {
>> @@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>>  	mutex_lock(&context->lock);
>>
>>  	if (context->pending_jobs == 0) {
>> -		context->channel = host1x_channel_request(client->dev);
>> -		if (!context->channel) {
>> +		context->channel = host1x_channel_request(client->dev, true);
>> +		if (IS_ERR(context->channel)) {
>>  			mutex_unlock(&context->lock);
>> -			return -EBUSY;
>> +			return PTR_ERR(context->channel);
>>  		}
>>  	}
>>
>> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
>> index 3db3bcac48b9..c1853402f69b 100644
>> --- a/drivers/gpu/drm/tegra/gr2d.c
>> +++ b/drivers/gpu/drm/tegra/gr2d.c
>> @@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
>>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>  	struct gr2d *gr2d = to_gr2d(drm);
>>
>> -	gr2d->channel = host1x_channel_request(client->dev);
>> -	if (!gr2d->channel)
>> -		return -ENOMEM;
>> +	gr2d->channel = host1x_channel_request(client->dev, false);
>> +	if (IS_ERR(gr2d->channel))
>> +		return PTR_ERR(gr2d->channel);
>>
>>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>  	if (!client->syncpts[0]) {
>> diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
>> index 279438342c8c..793a91d577cb 100644
>> --- a/drivers/gpu/drm/tegra/gr3d.c
>> +++ b/drivers/gpu/drm/tegra/gr3d.c
>> @@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
>>  	unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>  	struct gr3d *gr3d = to_gr3d(drm);
>>
>> -	gr3d->channel = host1x_channel_request(client->dev);
>> -	if (!gr3d->channel)
>> -		return -ENOMEM;
>> +	gr3d->channel = host1x_channel_request(client->dev, false);
>> +	if (IS_ERR(gr3d->channel))
>> +		return PTR_ERR(gr3d->channel);
>>
>>  	client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>  	if (!client->syncpts[0]) {
>> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
>> index 9d8cad12f9d8..eebcd51261df 100644
>> --- a/drivers/gpu/host1x/channel.c
>> +++ b/drivers/gpu/host1x/channel.c
>> @@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list *chlist,
>>  	bitmap_zero(chlist->allocated_channels, num_channels);
>>
>>  	mutex_init(&chlist->lock);
>> +	sema_init(&chlist->sema, num_channels);
>>
>>  	return 0;
>>  }
>> @@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
>>  	host1x_cdma_deinit(&channel->cdma);
>>
>>  	clear_bit(channel->id, chlist->allocated_channels);
>> +
>> +	up(&chlist->sema);
>>  }
>>
>>  void host1x_channel_put(struct host1x_channel *channel)
>> @@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
>>  }
>>  EXPORT_SYMBOL(host1x_channel_put);
>>
>> -static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>> +static struct host1x_channel *acquire_unused_channel(struct host1x *host,
>> +						     bool wait)
>>  {
>>  	struct host1x_channel_list *chlist = &host->channel_list;
>>  	unsigned int max_channels = host->info->nb_channels;
>>  	unsigned int index;
>> +	int err;
>> +
>> +	if (wait) {
>> +		err = down_interruptible(&chlist->sema);
>> +		if (err)
>> +			return ERR_PTR(err);
>> +	} else {
>> +		if (down_trylock(&chlist->sema))
>> +			return ERR_PTR(-EBUSY);
>> +	}
>>
>>  	mutex_lock(&chlist->lock);
>>
>>  	index = find_first_zero_bit(chlist->allocated_channels, max_channels);
>> -	if (index >= max_channels) {
>> +	if (WARN(index >= max_channels, "failed to find free channel")) {
>>  		mutex_unlock(&chlist->lock);
>>  		dev_err(host->dev, "failed to find free channel\n");
>> -		return NULL;
>> +		return ERR_PTR(-EBUSY);
>>  	}
>>
>>  	chlist->channels[index].id = index;
>> @@ -134,20 +148,26 @@ static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>>  /**
>>   * host1x_channel_request() - Allocate a channel
>>   * @device: Host1x unit this channel will be used to send commands to
>> + * @wait: Whether to wait for a free channels if all are reserved
>> + *
>> + * Allocates a new host1x channel for @device. If all channels are in use,
>> + * and @wait is true, does an interruptible wait until one is available.
>>   *
>> - * Allocates a new host1x channel for @device. May return NULL if CDMA
>> - * initialization fails.
>> + * If a channel was acquired, returns a pointer to it. Otherwise returns
>> + * an error pointer with -EINTR if the wait was interrupted, -EBUSY
>> + * if a channel could not be acquired or another error code if channel
>> + * initialization failed.
>>   */
>> -struct host1x_channel *host1x_channel_request(struct device *dev)
>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
>>  {
>>  	struct host1x *host = dev_get_drvdata(dev->parent);
>>  	struct host1x_channel_list *chlist = &host->channel_list;
>>  	struct host1x_channel *channel;
>>  	int err;
>>
>> -	channel = acquire_unused_channel(host);
>> -	if (!channel)
>> -		return NULL;
>> +	channel = acquire_unused_channel(host, wait);
>> +	if (IS_ERR(channel))
>> +		return channel;
>>
>>  	kref_init(&channel->refcount);
>>  	mutex_init(&channel->submitlock);
>> @@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct device *dev)
>>
>>  	dev_err(dev, "failed to initialize channel\n");
>>
>> -	return NULL;
>> +	return ERR_PTR(err);
>>  }
>>  EXPORT_SYMBOL(host1x_channel_request);
>> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
>> index e68a8ae9a670..1f5cf8029b62 100644
>> --- a/drivers/gpu/host1x/channel.h
>> +++ b/drivers/gpu/host1x/channel.h
>> @@ -31,6 +31,7 @@ struct host1x_channel_list {
>>  	struct host1x_channel *channels;
>>
>>  	struct mutex lock;
>> +	struct semaphore sema;
>>  	unsigned long *allocated_channels;
>>  };
>>
>> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
>> index f931d28a68ff..2a34905d4408 100644
>> --- a/include/linux/host1x.h
>> +++ b/include/linux/host1x.h
>> @@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>>  struct host1x_channel;
>>  struct host1x_job;
>>
>> -struct host1x_channel *host1x_channel_request(struct device *dev);
>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
>>  struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
>>  void host1x_channel_put(struct host1x_channel *channel);
>>  int host1x_job_submit(struct host1x_job *job);
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-07 13:11           ` Mikko Perttunen
  (?)
@ 2017-11-07 15:29           ` Dmitry Osipenko
       [not found]             ` <1b35ec93-167b-3436-0ff2-5e2e0886aea7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-07 15:29 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 07.11.2017 16:11, Mikko Perttunen wrote:
> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>> Add an option to host1x_channel_request to interruptibly wait for a
>>> free channel. This allows IOCTLs that acquire a channel to block
>>> the userspace.
>>>
>>
>> Wouldn't it be more optimal to request channel and block after job's pining,
>> when all patching and checks are completed? Note that right now we have locking
>> around submission in DRM, which I suppose should go away by making locking fine
>> grained.
> 
> That would be possible, but I don't think it should matter much since contention
> here should not be the common case.
> 
>>
>> Or maybe it would be more optimal to just iterate over channels, like I
>> suggested before [0]?
> 
> Somehow I hadn't noticed this before, but this would break the invariant of
> having one client/class per channel.
> 

Yes, currently there is a weak relation of channel and clients device, but seems
channels device is only used for printing dev_* messages and device could be
borrowed from the channels job. I don't see any real point of hardwiring channel
to a specific device or client.

> In general since we haven't seen any issues downstream with the model
> implemented here, I'd like to try to go with this and if we have problems with
> channel allocation then we could revisit.
> 

I'd prefer to collect some real numbers first, will test it with our grate /
mesa stuff. Also, we should have a host1x_test, maybe something similar to
submission perf test but using multiple contexts.

> 
>>
>> [0]
>> https://github.com/cyndis/linux/commit/9e6d87f40afb01fbe13ba65c73cb617bdfcd80b2#commitcomment-25012960
>>
>>
>>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>>> ---
>>>  drivers/gpu/drm/tegra/drm.c  |  9 +++++----
>>>  drivers/gpu/drm/tegra/gr2d.c |  6 +++---
>>>  drivers/gpu/drm/tegra/gr3d.c |  6 +++---
>>>  drivers/gpu/host1x/channel.c | 40 ++++++++++++++++++++++++++++++----------
>>>  drivers/gpu/host1x/channel.h |  1 +
>>>  include/linux/host1x.h       |  2 +-
>>>  6 files changed, 43 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>>> index 658bc8814f38..19f77c1a76c0 100644
>>> --- a/drivers/gpu/drm/tegra/drm.c
>>> +++ b/drivers/gpu/drm/tegra/drm.c
>>> @@ -389,7 +389,8 @@ static int host1x_waitchk_copy_from_user(struct
>>> host1x_waitchk *dest,
>>>   * Request a free hardware host1x channel for this user context, or if the
>>>   * context already has one, bump its refcount.
>>>   *
>>> - * Returns 0 on success, or -EBUSY if there were no free hardware channels.
>>> + * Returns 0 on success, -EINTR if wait for a free channel was interrupted,
>>> + * or other error.
>>>   */
>>>  int tegra_drm_context_get_channel(struct tegra_drm_context *context)
>>>  {
>>> @@ -398,10 +399,10 @@ int tegra_drm_context_get_channel(struct
>>> tegra_drm_context *context)
>>>      mutex_lock(&context->lock);
>>>
>>>      if (context->pending_jobs == 0) {
>>> -        context->channel = host1x_channel_request(client->dev);
>>> -        if (!context->channel) {
>>> +        context->channel = host1x_channel_request(client->dev, true);
>>> +        if (IS_ERR(context->channel)) {
>>>              mutex_unlock(&context->lock);
>>> -            return -EBUSY;
>>> +            return PTR_ERR(context->channel);
>>>          }
>>>      }
>>>
>>> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
>>> index 3db3bcac48b9..c1853402f69b 100644
>>> --- a/drivers/gpu/drm/tegra/gr2d.c
>>> +++ b/drivers/gpu/drm/tegra/gr2d.c
>>> @@ -32,9 +32,9 @@ static int gr2d_init(struct host1x_client *client)
>>>      unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>>      struct gr2d *gr2d = to_gr2d(drm);
>>>
>>> -    gr2d->channel = host1x_channel_request(client->dev);
>>> -    if (!gr2d->channel)
>>> -        return -ENOMEM;
>>> +    gr2d->channel = host1x_channel_request(client->dev, false);
>>> +    if (IS_ERR(gr2d->channel))
>>> +        return PTR_ERR(gr2d->channel);
>>>
>>>      client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>>      if (!client->syncpts[0]) {
>>> diff --git a/drivers/gpu/drm/tegra/gr3d.c b/drivers/gpu/drm/tegra/gr3d.c
>>> index 279438342c8c..793a91d577cb 100644
>>> --- a/drivers/gpu/drm/tegra/gr3d.c
>>> +++ b/drivers/gpu/drm/tegra/gr3d.c
>>> @@ -42,9 +42,9 @@ static int gr3d_init(struct host1x_client *client)
>>>      unsigned long flags = HOST1X_SYNCPT_HAS_BASE;
>>>      struct gr3d *gr3d = to_gr3d(drm);
>>>
>>> -    gr3d->channel = host1x_channel_request(client->dev);
>>> -    if (!gr3d->channel)
>>> -        return -ENOMEM;
>>> +    gr3d->channel = host1x_channel_request(client->dev, false);
>>> +    if (IS_ERR(gr3d->channel))
>>> +        return PTR_ERR(gr3d->channel);
>>>
>>>      client->syncpts[0] = host1x_syncpt_request(client->dev, flags);
>>>      if (!client->syncpts[0]) {
>>> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
>>> index 9d8cad12f9d8..eebcd51261df 100644
>>> --- a/drivers/gpu/host1x/channel.c
>>> +++ b/drivers/gpu/host1x/channel.c
>>> @@ -43,6 +43,7 @@ int host1x_channel_list_init(struct host1x_channel_list
>>> *chlist,
>>>      bitmap_zero(chlist->allocated_channels, num_channels);
>>>
>>>      mutex_init(&chlist->lock);
>>> +    sema_init(&chlist->sema, num_channels);
>>>
>>>      return 0;
>>>  }
>>> @@ -99,6 +100,8 @@ static void release_channel(struct kref *kref)
>>>      host1x_cdma_deinit(&channel->cdma);
>>>
>>>      clear_bit(channel->id, chlist->allocated_channels);
>>> +
>>> +    up(&chlist->sema);
>>>  }
>>>
>>>  void host1x_channel_put(struct host1x_channel *channel)
>>> @@ -107,19 +110,30 @@ void host1x_channel_put(struct host1x_channel *channel)
>>>  }
>>>  EXPORT_SYMBOL(host1x_channel_put);
>>>
>>> -static struct host1x_channel *acquire_unused_channel(struct host1x *host)
>>> +static struct host1x_channel *acquire_unused_channel(struct host1x *host,
>>> +                             bool wait)
>>>  {
>>>      struct host1x_channel_list *chlist = &host->channel_list;
>>>      unsigned int max_channels = host->info->nb_channels;
>>>      unsigned int index;
>>> +    int err;
>>> +
>>> +    if (wait) {
>>> +        err = down_interruptible(&chlist->sema);
>>> +        if (err)
>>> +            return ERR_PTR(err);
>>> +    } else {
>>> +        if (down_trylock(&chlist->sema))
>>> +            return ERR_PTR(-EBUSY);
>>> +    }
>>>
>>>      mutex_lock(&chlist->lock);
>>>
>>>      index = find_first_zero_bit(chlist->allocated_channels, max_channels);
>>> -    if (index >= max_channels) {
>>> +    if (WARN(index >= max_channels, "failed to find free channel")) {
>>>          mutex_unlock(&chlist->lock);
>>>          dev_err(host->dev, "failed to find free channel\n");
>>> -        return NULL;
>>> +        return ERR_PTR(-EBUSY);
>>>      }
>>>
>>>      chlist->channels[index].id = index;
>>> @@ -134,20 +148,26 @@ static struct host1x_channel
>>> *acquire_unused_channel(struct host1x *host)
>>>  /**
>>>   * host1x_channel_request() - Allocate a channel
>>>   * @device: Host1x unit this channel will be used to send commands to
>>> + * @wait: Whether to wait for a free channels if all are reserved
>>> + *
>>> + * Allocates a new host1x channel for @device. If all channels are in use,
>>> + * and @wait is true, does an interruptible wait until one is available.
>>>   *
>>> - * Allocates a new host1x channel for @device. May return NULL if CDMA
>>> - * initialization fails.
>>> + * If a channel was acquired, returns a pointer to it. Otherwise returns
>>> + * an error pointer with -EINTR if the wait was interrupted, -EBUSY
>>> + * if a channel could not be acquired or another error code if channel
>>> + * initialization failed.
>>>   */
>>> -struct host1x_channel *host1x_channel_request(struct device *dev)
>>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait)
>>>  {
>>>      struct host1x *host = dev_get_drvdata(dev->parent);
>>>      struct host1x_channel_list *chlist = &host->channel_list;
>>>      struct host1x_channel *channel;
>>>      int err;
>>>
>>> -    channel = acquire_unused_channel(host);
>>> -    if (!channel)
>>> -        return NULL;
>>> +    channel = acquire_unused_channel(host, wait);
>>> +    if (IS_ERR(channel))
>>> +        return channel;
>>>
>>>      kref_init(&channel->refcount);
>>>      mutex_init(&channel->submitlock);
>>> @@ -168,6 +188,6 @@ struct host1x_channel *host1x_channel_request(struct
>>> device *dev)
>>>
>>>      dev_err(dev, "failed to initialize channel\n");
>>>
>>> -    return NULL;
>>> +    return ERR_PTR(err);
>>>  }
>>>  EXPORT_SYMBOL(host1x_channel_request);
>>> diff --git a/drivers/gpu/host1x/channel.h b/drivers/gpu/host1x/channel.h
>>> index e68a8ae9a670..1f5cf8029b62 100644
>>> --- a/drivers/gpu/host1x/channel.h
>>> +++ b/drivers/gpu/host1x/channel.h
>>> @@ -31,6 +31,7 @@ struct host1x_channel_list {
>>>      struct host1x_channel *channels;
>>>
>>>      struct mutex lock;
>>> +    struct semaphore sema;
>>>      unsigned long *allocated_channels;
>>>  };
>>>
>>> diff --git a/include/linux/host1x.h b/include/linux/host1x.h
>>> index f931d28a68ff..2a34905d4408 100644
>>> --- a/include/linux/host1x.h
>>> +++ b/include/linux/host1x.h
>>> @@ -171,7 +171,7 @@ u32 host1x_syncpt_base_id(struct host1x_syncpt_base *base);
>>>  struct host1x_channel;
>>>  struct host1x_job;
>>>
>>> -struct host1x_channel *host1x_channel_request(struct device *dev);
>>> +struct host1x_channel *host1x_channel_request(struct device *dev, bool wait);
>>>  struct host1x_channel *host1x_channel_get(struct host1x_channel *channel);
>>>  void host1x_channel_put(struct host1x_channel *channel);
>>>  int host1x_job_submit(struct host1x_job *job);
>>>
>>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/10] Dynamic Host1x channel allocation
  2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
@ 2017-11-07 15:34     ` Dmitry Osipenko
  2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
                       ` (6 subsequent siblings)
  7 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-07 15:34 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Hi all,
> 
> this adds support for a new model of hardware channel allocation for
> Host1x/TegraDRM. In the current model, one hardware channel is
> allocated for each client device at probe time. This is simple but
> does not allow for optimal use of hardware resources.
> 
> In the new model, we allocate channels dynamically when a
> "userspace channel", opened using the channel open IOCTL, has pending
> jobs. However, each userspace channel can only have one assigned
> channel at a time, ensuring current serialization behavior is kept.
> As such there is no change in programming model for the userspace.
> 
> The patch adapts VIC to use the new model - GR2D and GR3D are not
> modified, as the older Tegra chips they are found on do not have
> a large number of hardware channels and therefore it is not clear
> if the new model is beneficial (and I don't have access to those
> chips to test it out).
> 

I think it should be useful regardless of channels number and the main benefit
is probably that jobs arbitration is done on HW, that should prevent one process
to block other other for a long time.

> Tested using the host1x_test test suite, and also by running
> the performance test of host1x_test in parallel.
> 
> Thanks,
> Mikko
> 
> Mikko Perttunen (10):
>   gpu: host1x: Parameterize channel aperture size
>   gpu: host1x: Print MLOCK state in debug dumps on T186
>   gpu: host1x: Add lock around channel allocation
>   gpu: host1x: Lock classes during job submission
>   gpu: host1x: Add job done callback
>   drm/tegra: Deliver job completion callback to client
>   drm/tegra: Make syncpoints be per-context
>   drm/tegra: Implement dynamic channel allocation model
>   drm/tegra: Boot VIC in runtime resume
>   gpu: host1x: Optionally block when acquiring channel
> 
>  drivers/gpu/drm/tegra/drm.c                    |  82 +++++++++++++++--
>  drivers/gpu/drm/tegra/drm.h                    |  12 ++-
>  drivers/gpu/drm/tegra/gr2d.c                   |   8 +-
>  drivers/gpu/drm/tegra/gr3d.c                   |   8 +-
>  drivers/gpu/drm/tegra/vic.c                    | 120 ++++++++++++------------
>  drivers/gpu/host1x/cdma.c                      |  45 ++++++---
>  drivers/gpu/host1x/cdma.h                      |   1 +
>  drivers/gpu/host1x/channel.c                   |  47 ++++++++--
>  drivers/gpu/host1x/channel.h                   |   3 +
>  drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
>  drivers/gpu/host1x/hw/channel_hw.c             |  74 +++++++++++----
>  drivers/gpu/host1x/hw/debug_hw_1x06.c          |  18 +++-
>  drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/hw_host1x01_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x02_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x04_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x05_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
>  drivers/gpu/host1x/hw/hw_host1x06_vm.h         |   2 +
>  include/linux/host1x.h                         |   6 +-
>  28 files changed, 517 insertions(+), 118 deletions(-)
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 00/10] Dynamic Host1x channel allocation
@ 2017-11-07 15:34     ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-07 15:34 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> Hi all,
> 
> this adds support for a new model of hardware channel allocation for
> Host1x/TegraDRM. In the current model, one hardware channel is
> allocated for each client device at probe time. This is simple but
> does not allow for optimal use of hardware resources.
> 
> In the new model, we allocate channels dynamically when a
> "userspace channel", opened using the channel open IOCTL, has pending
> jobs. However, each userspace channel can only have one assigned
> channel at a time, ensuring current serialization behavior is kept.
> As such there is no change in programming model for the userspace.
> 
> The patch adapts VIC to use the new model - GR2D and GR3D are not
> modified, as the older Tegra chips they are found on do not have
> a large number of hardware channels and therefore it is not clear
> if the new model is beneficial (and I don't have access to those
> chips to test it out).
> 

I think it should be useful regardless of channels number and the main benefit
is probably that jobs arbitration is done on HW, that should prevent one process
to block other other for a long time.

> Tested using the host1x_test test suite, and also by running
> the performance test of host1x_test in parallel.
> 
> Thanks,
> Mikko
> 
> Mikko Perttunen (10):
>   gpu: host1x: Parameterize channel aperture size
>   gpu: host1x: Print MLOCK state in debug dumps on T186
>   gpu: host1x: Add lock around channel allocation
>   gpu: host1x: Lock classes during job submission
>   gpu: host1x: Add job done callback
>   drm/tegra: Deliver job completion callback to client
>   drm/tegra: Make syncpoints be per-context
>   drm/tegra: Implement dynamic channel allocation model
>   drm/tegra: Boot VIC in runtime resume
>   gpu: host1x: Optionally block when acquiring channel
> 
>  drivers/gpu/drm/tegra/drm.c                    |  82 +++++++++++++++--
>  drivers/gpu/drm/tegra/drm.h                    |  12 ++-
>  drivers/gpu/drm/tegra/gr2d.c                   |   8 +-
>  drivers/gpu/drm/tegra/gr3d.c                   |   8 +-
>  drivers/gpu/drm/tegra/vic.c                    | 120 ++++++++++++------------
>  drivers/gpu/host1x/cdma.c                      |  45 ++++++---
>  drivers/gpu/host1x/cdma.h                      |   1 +
>  drivers/gpu/host1x/channel.c                   |  47 ++++++++--
>  drivers/gpu/host1x/channel.h                   |   3 +
>  drivers/gpu/host1x/hw/cdma_hw.c                | 122 +++++++++++++++++++++++++
>  drivers/gpu/host1x/hw/channel_hw.c             |  74 +++++++++++----
>  drivers/gpu/host1x/hw/debug_hw_1x06.c          |  18 +++-
>  drivers/gpu/host1x/hw/host1x01_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x02_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x04_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x05_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/host1x06_hardware.h      |  10 ++
>  drivers/gpu/host1x/hw/hw_host1x01_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x01_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x02_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x02_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x04_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x04_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x05_channel.h    |   2 +
>  drivers/gpu/host1x/hw/hw_host1x05_sync.h       |   6 ++
>  drivers/gpu/host1x/hw/hw_host1x06_hypervisor.h |   5 +
>  drivers/gpu/host1x/hw/hw_host1x06_vm.h         |   2 +
>  include/linux/host1x.h                         |   6 +-
>  28 files changed, 517 insertions(+), 118 deletions(-)
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-11-07 12:28       ` Mikko Perttunen
@ 2017-11-07 21:23             ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-07 21:23 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 07.11.2017 15:28, Mikko Perttunen wrote:
> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>> ...
>>>
>>> +static int mlock_id_for_class(unsigned int class)
>>> +{
>>> +#if HOST1X_HW >= 6
>>> +    switch (class)
>>> +    {
>>> +    case HOST1X_CLASS_HOST1X:
>>> +        return 0;
>>> +    case HOST1X_CLASS_VIC:
>>> +        return 17;
>>
>> What is the meaning of returned ID values that you have defined here? Why VIC
>> should have different ID on T186?
> 
> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
> a specific class. Therefore we must map that correctly.
> 

Okay.

>>
>>> +    default:
>>> +        return -EINVAL;
>>> +    }
>>> +#else
>>> +    switch (class)
>>> +    {
>>> +    case HOST1X_CLASS_HOST1X:
>>> +        return 0;
>>> +    case HOST1X_CLASS_GR2D:
>>> +        return 1;
>>> +    case HOST1X_CLASS_GR2D_SB:
>>> +        return 2;
>>
>> Note that we are allowing to switch 2d classes in the same jobs context and
>> currently jobs class is somewhat hardcoded to GR2D.
>>
>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>> trigger execution of different classes simultaneously? Would syncpoint
>> differentiate classes on OP_DONE event?
> 
> Good point, we might need to use the same lock for these two.
> 
>>
>> I suppose that MLOCK (the module lock) implies the whole module locking,
>> wouldn't it make sense to just use the module ID's defined in the TRM?
> 
> Can you point out where these are defined?

See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
@ 2017-11-07 21:23             ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-07 21:23 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 07.11.2017 15:28, Mikko Perttunen wrote:
> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>> ...
>>>
>>> +static int mlock_id_for_class(unsigned int class)
>>> +{
>>> +#if HOST1X_HW >= 6
>>> +    switch (class)
>>> +    {
>>> +    case HOST1X_CLASS_HOST1X:
>>> +        return 0;
>>> +    case HOST1X_CLASS_VIC:
>>> +        return 17;
>>
>> What is the meaning of returned ID values that you have defined here? Why VIC
>> should have different ID on T186?
> 
> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
> a specific class. Therefore we must map that correctly.
> 

Okay.

>>
>>> +    default:
>>> +        return -EINVAL;
>>> +    }
>>> +#else
>>> +    switch (class)
>>> +    {
>>> +    case HOST1X_CLASS_HOST1X:
>>> +        return 0;
>>> +    case HOST1X_CLASS_GR2D:
>>> +        return 1;
>>> +    case HOST1X_CLASS_GR2D_SB:
>>> +        return 2;
>>
>> Note that we are allowing to switch 2d classes in the same jobs context and
>> currently jobs class is somewhat hardcoded to GR2D.
>>
>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>> trigger execution of different classes simultaneously? Would syncpoint
>> differentiate classes on OP_DONE event?
> 
> Good point, we might need to use the same lock for these two.
> 
>>
>> I suppose that MLOCK (the module lock) implies the whole module locking,
>> wouldn't it make sense to just use the module ID's defined in the TRM?
> 
> Can you point out where these are defined?

See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-07 15:29           ` Dmitry Osipenko
@ 2017-11-10 21:15                 ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-10 21:15 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 07.11.2017 18:29, Dmitry Osipenko wrote:
> On 07.11.2017 16:11, Mikko Perttunen wrote:
>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>> the userspace.
>>>>
>>>
>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>> when all patching and checks are completed? Note that right now we have locking
>>> around submission in DRM, which I suppose should go away by making locking fine
>>> grained.
>>
>> That would be possible, but I don't think it should matter much since contention
>> here should not be the common case.
>>
>>>
>>> Or maybe it would be more optimal to just iterate over channels, like I
>>> suggested before [0]?
>>
>> Somehow I hadn't noticed this before, but this would break the invariant of
>> having one client/class per channel.
>>
> 
> Yes, currently there is a weak relation of channel and clients device, but seems
> channels device is only used for printing dev_* messages and device could be
> borrowed from the channels job. I don't see any real point of hardwiring channel
> to a specific device or client.

Although, it won't work with syncpoint assignment to channel.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-10 21:15                 ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-10 21:15 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 07.11.2017 18:29, Dmitry Osipenko wrote:
> On 07.11.2017 16:11, Mikko Perttunen wrote:
>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>> the userspace.
>>>>
>>>
>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>> when all patching and checks are completed? Note that right now we have locking
>>> around submission in DRM, which I suppose should go away by making locking fine
>>> grained.
>>
>> That would be possible, but I don't think it should matter much since contention
>> here should not be the common case.
>>
>>>
>>> Or maybe it would be more optimal to just iterate over channels, like I
>>> suggested before [0]?
>>
>> Somehow I hadn't noticed this before, but this would break the invariant of
>> having one client/class per channel.
>>
> 
> Yes, currently there is a weak relation of channel and clients device, but seems
> channels device is only used for printing dev_* messages and device could be
> borrowed from the channels job. I don't see any real point of hardwiring channel
> to a specific device or client.

Although, it won't work with syncpoint assignment to channel.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-10 21:15                 ` Dmitry Osipenko
@ 2017-11-12 11:23                     ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-12 11:23 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 11.11.2017 00:15, Dmitry Osipenko wrote:
> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>> the userspace.
>>>>>
>>>>
>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>> when all patching and checks are completed? Note that right now we have locking
>>>> around submission in DRM, which I suppose should go away by making locking fine
>>>> grained.
>>>
>>> That would be possible, but I don't think it should matter much since contention
>>> here should not be the common case.
>>>
>>>>
>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>> suggested before [0]?
>>>
>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>> having one client/class per channel.
>>>
>>
>> Yes, currently there is a weak relation of channel and clients device, but seems
>> channels device is only used for printing dev_* messages and device could be
>> borrowed from the channels job. I don't see any real point of hardwiring channel
>> to a specific device or client.
> 
> Although, it won't work with syncpoint assignment to channel.

On the other hand.. it should work if one syncpoint could be assigned to
multiple channels, couldn't it?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-12 11:23                     ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-12 11:23 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 11.11.2017 00:15, Dmitry Osipenko wrote:
> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>> the userspace.
>>>>>
>>>>
>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>> when all patching and checks are completed? Note that right now we have locking
>>>> around submission in DRM, which I suppose should go away by making locking fine
>>>> grained.
>>>
>>> That would be possible, but I don't think it should matter much since contention
>>> here should not be the common case.
>>>
>>>>
>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>> suggested before [0]?
>>>
>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>> having one client/class per channel.
>>>
>>
>> Yes, currently there is a weak relation of channel and clients device, but seems
>> channels device is only used for printing dev_* messages and device could be
>> borrowed from the channels job. I don't see any real point of hardwiring channel
>> to a specific device or client.
> 
> Although, it won't work with syncpoint assignment to channel.

On the other hand.. it should work if one syncpoint could be assigned to
multiple channels, couldn't it?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
  2017-11-07 12:29       ` Mikko Perttunen
@ 2017-11-13 11:49             ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-13 11:49 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 07.11.2017 15:29, Mikko Perttunen wrote:
> On 05.11.2017 19:43, Dmitry Osipenko wrote:
>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>> In the traditional channel allocation model, a single hardware channel
>>> was allocated for each client. This is simple from an implementation
>>> perspective but prevents use of hardware scheduling.
>>>
>>> This patch implements a channel allocation model where when a user
>>> submits a job for a context, a hardware channel is allocated for
>>> that context. The same channel is kept for as long as there are
>>> incomplete jobs for that context. This way we can use hardware
>>> scheduling and channel isolation between userspace processes, but
>>> also prevent idling contexts from taking up hardware resources.
>>>
>>
>> The dynamic channels resources (pushbuf) allocation is very expensive,
>> neglecting all benefits that this model should bring at least in non-IOMMU case.
>> We could have statically preallocated channels resources or defer resources
>> freeing.
> 
> This is true. I'll try to figure out a nice way to keep the pushbuf allocations.

One variant could be to have all channels resources statically preallocated in a
non-IOMMU case because CMA is expensive. Then you should measure the allocations
impact in a case of IOMMU allocations and if it is significant, maybe implement
Host1x PM autosuspend, releasing all channels when Host1x become idle.

I think the above should be efficient and easy to implement.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model
@ 2017-11-13 11:49             ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-13 11:49 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 07.11.2017 15:29, Mikko Perttunen wrote:
> On 05.11.2017 19:43, Dmitry Osipenko wrote:
>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>> In the traditional channel allocation model, a single hardware channel
>>> was allocated for each client. This is simple from an implementation
>>> perspective but prevents use of hardware scheduling.
>>>
>>> This patch implements a channel allocation model where when a user
>>> submits a job for a context, a hardware channel is allocated for
>>> that context. The same channel is kept for as long as there are
>>> incomplete jobs for that context. This way we can use hardware
>>> scheduling and channel isolation between userspace processes, but
>>> also prevent idling contexts from taking up hardware resources.
>>>
>>
>> The dynamic channels resources (pushbuf) allocation is very expensive,
>> neglecting all benefits that this model should bring at least in non-IOMMU case.
>> We could have statically preallocated channels resources or defer resources
>> freeing.
> 
> This is true. I'll try to figure out a nice way to keep the pushbuf allocations.

One variant could be to have all channels resources statically preallocated in a
non-IOMMU case because CMA is expensive. Then you should measure the allocations
impact in a case of IOMMU allocations and if it is significant, maybe implement
Host1x PM autosuspend, releasing all channels when Host1x become idle.

I think the above should be efficient and easy to implement.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/10] drm/tegra: Deliver job completion callback to client
  2017-11-05 11:01     ` Mikko Perttunen
@ 2017-11-16 16:33         ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-16 16:33 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.11.2017 14:01, Mikko Perttunen wrote:
> To allow client drivers to free resources when jobs have completed,
> deliver job completion callbacks to them. This requires adding
> reference counting to context objects, as job completion can happen
> after the userspace application has closed the context. As such,
> also add kref-based refcounting for contexts.
> 

It's very likely that we will need contexts kref'ing for other things as well,
would be nice if you could factor out kref addition into a standalone patch.

> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
> ---
>  drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 2cdd054520bf..3e2a4a19412e 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  	return 0;
>  }
>  
> -static void tegra_drm_context_free(struct tegra_drm_context *context)
> +static void tegra_drm_context_free(struct kref *ref)
>  {
> +	struct tegra_drm_context *context =
> +		container_of(ref, struct tegra_drm_context, ref);
> +
>  	context->client->ops->close_channel(context);
>  	kfree(context);
>  }
> @@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>  	return 0;
>  }
>  
> +static void tegra_drm_job_done(struct host1x_job *job)
> +{
> +	struct tegra_drm_context *context = job->callback_data;
> +
> +	if (context->client->ops->submit_done)
> +		context->client->ops->submit_done(context);
> +
> +	kref_put(&context->ref, tegra_drm_context_free);
> +}
> +
>  int tegra_drm_submit(struct tegra_drm_context *context,
>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>  		     struct drm_file *file)
> @@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	job->syncpt_id = syncpt.id;
>  	job->timeout = 10000;
>  
> +	job->done = tegra_drm_job_done;
> +	job->callback_data = context;
> +
>  	if (args->timeout && args->timeout < 10000)
>  		job->timeout = args->timeout;
>  
> @@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	if (err)
>  		goto fail;
>  
> +	kref_get(&context->ref);
> +
>  	err = host1x_job_submit(job);
>  	if (err) {
> +		kref_put(&context->ref, tegra_drm_context_free);
>  		host1x_job_unpin(job);
>  		goto fail;
>  	}
> @@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>  	if (err < 0)
>  		kfree(context);
>  
> +	kref_init(&context->ref);
> +

Would be a bit cleaner to move kref_init into tegra_client_open() where the rest
of context variables getting initialized.

>  	mutex_unlock(&fpriv->lock);
>  	return err;
>  }
> @@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>  	}
>  
>  	idr_remove(&fpriv->contexts, context->id);
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  
>  unlock:
>  	mutex_unlock(&fpriv->lock);
> @@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
>  {
>  	struct tegra_drm_context *context = p;
>  
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 063f5d397526..079aebb3fb38 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -13,6 +13,7 @@
>  #include <uapi/drm/tegra_drm.h>
>  #include <linux/host1x.h>
>  #include <linux/iova.h>
> +#include <linux/kref.h>
>  #include <linux/of_gpio.h>
>  
>  #include <drm/drmP.h>
> @@ -74,6 +75,8 @@ struct tegra_drm {
>  struct tegra_drm_client;
>  
>  struct tegra_drm_context {
> +	struct kref ref;
> +
>  	struct tegra_drm_client *client;
>  	struct host1x_channel *channel;
>  	unsigned int id;
> @@ -88,6 +91,7 @@ struct tegra_drm_client_ops {
>  	int (*submit)(struct tegra_drm_context *context,
>  		      struct drm_tegra_submit *args, struct drm_device *drm,
>  		      struct drm_file *file);
> +	void (*submit_done)(struct tegra_drm_context *context);
>  };
>  
>  int tegra_drm_submit(struct tegra_drm_context *context,
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/10] drm/tegra: Deliver job completion callback to client
@ 2017-11-16 16:33         ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-16 16:33 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> To allow client drivers to free resources when jobs have completed,
> deliver job completion callbacks to them. This requires adding
> reference counting to context objects, as job completion can happen
> after the userspace application has closed the context. As such,
> also add kref-based refcounting for contexts.
> 

It's very likely that we will need contexts kref'ing for other things as well,
would be nice if you could factor out kref addition into a standalone patch.

> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 2cdd054520bf..3e2a4a19412e 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  	return 0;
>  }
>  
> -static void tegra_drm_context_free(struct tegra_drm_context *context)
> +static void tegra_drm_context_free(struct kref *ref)
>  {
> +	struct tegra_drm_context *context =
> +		container_of(ref, struct tegra_drm_context, ref);
> +
>  	context->client->ops->close_channel(context);
>  	kfree(context);
>  }
> @@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>  	return 0;
>  }
>  
> +static void tegra_drm_job_done(struct host1x_job *job)
> +{
> +	struct tegra_drm_context *context = job->callback_data;
> +
> +	if (context->client->ops->submit_done)
> +		context->client->ops->submit_done(context);
> +
> +	kref_put(&context->ref, tegra_drm_context_free);
> +}
> +
>  int tegra_drm_submit(struct tegra_drm_context *context,
>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>  		     struct drm_file *file)
> @@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	job->syncpt_id = syncpt.id;
>  	job->timeout = 10000;
>  
> +	job->done = tegra_drm_job_done;
> +	job->callback_data = context;
> +
>  	if (args->timeout && args->timeout < 10000)
>  		job->timeout = args->timeout;
>  
> @@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	if (err)
>  		goto fail;
>  
> +	kref_get(&context->ref);
> +
>  	err = host1x_job_submit(job);
>  	if (err) {
> +		kref_put(&context->ref, tegra_drm_context_free);
>  		host1x_job_unpin(job);
>  		goto fail;
>  	}
> @@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>  	if (err < 0)
>  		kfree(context);
>  
> +	kref_init(&context->ref);
> +

Would be a bit cleaner to move kref_init into tegra_client_open() where the rest
of context variables getting initialized.

>  	mutex_unlock(&fpriv->lock);
>  	return err;
>  }
> @@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>  	}
>  
>  	idr_remove(&fpriv->contexts, context->id);
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  
>  unlock:
>  	mutex_unlock(&fpriv->lock);
> @@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
>  {
>  	struct tegra_drm_context *context = p;
>  
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  
>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 063f5d397526..079aebb3fb38 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -13,6 +13,7 @@
>  #include <uapi/drm/tegra_drm.h>
>  #include <linux/host1x.h>
>  #include <linux/iova.h>
> +#include <linux/kref.h>
>  #include <linux/of_gpio.h>
>  
>  #include <drm/drmP.h>
> @@ -74,6 +75,8 @@ struct tegra_drm {
>  struct tegra_drm_client;
>  
>  struct tegra_drm_context {
> +	struct kref ref;
> +
>  	struct tegra_drm_client *client;
>  	struct host1x_channel *channel;
>  	unsigned int id;
> @@ -88,6 +91,7 @@ struct tegra_drm_client_ops {
>  	int (*submit)(struct tegra_drm_context *context,
>  		      struct drm_tegra_submit *args, struct drm_device *drm,
>  		      struct drm_file *file);
> +	void (*submit_done)(struct tegra_drm_context *context);
>  };
>  
>  int tegra_drm_submit(struct tegra_drm_context *context,
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/10] drm/tegra: Deliver job completion callback to client
  2017-11-05 11:01     ` Mikko Perttunen
  (?)
  (?)
@ 2017-11-16 16:40     ` Dmitry Osipenko
       [not found]       ` <1afa1ba9-3103-3672-2e15-fb8c7de2520b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  -1 siblings, 1 reply; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-16 16:40 UTC (permalink / raw)
  To: Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.11.2017 14:01, Mikko Perttunen wrote:
> To allow client drivers to free resources when jobs have completed,
> deliver job completion callbacks to them. This requires adding
> reference counting to context objects, as job completion can happen
> after the userspace application has closed the context. As such,
> also add kref-based refcounting for contexts.
> 
> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>  2 files changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 2cdd054520bf..3e2a4a19412e 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>  	return 0;
>  }
>  
> -static void tegra_drm_context_free(struct tegra_drm_context *context)
> +static void tegra_drm_context_free(struct kref *ref)
>  {
> +	struct tegra_drm_context *context =
> +		container_of(ref, struct tegra_drm_context, ref);
> +
>  	context->client->ops->close_channel(context);
>  	kfree(context);
>  }
> @@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>  	return 0;
>  }
>  
> +static void tegra_drm_job_done(struct host1x_job *job)
> +{
> +	struct tegra_drm_context *context = job->callback_data;
> +
> +	if (context->client->ops->submit_done)
> +		context->client->ops->submit_done(context);
> +
> +	kref_put(&context->ref, tegra_drm_context_free);
> +}
> +
>  int tegra_drm_submit(struct tegra_drm_context *context,
>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>  		     struct drm_file *file)
> @@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	job->syncpt_id = syncpt.id;
>  	job->timeout = 10000;
>  
> +	job->done = tegra_drm_job_done;
> +	job->callback_data = context;
> +
>  	if (args->timeout && args->timeout < 10000)
>  		job->timeout = args->timeout;
>  
> @@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  	if (err)
>  		goto fail;
>  
> +	kref_get(&context->ref);
> +
>  	err = host1x_job_submit(job);
>  	if (err) {
> +		kref_put(&context->ref, tegra_drm_context_free);
>  		host1x_job_unpin(job);
>  		goto fail;
>  	}
> @@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>  	if (err < 0)
>  		kfree(context);
>  
> +	kref_init(&context->ref);
> +
>  	mutex_unlock(&fpriv->lock);
>  	return err;
>  }
> @@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>  	}
>  
>  	idr_remove(&fpriv->contexts, context->id);
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  
>  unlock:
>  	mutex_unlock(&fpriv->lock);
> @@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
>  {
>  	struct tegra_drm_context *context = p;
>  
> -	tegra_drm_context_free(context);
> +	kref_put(&context->ref, tegra_drm_context_free);
>  

Probably won't hurt to add and use tegra_drm_context_get()/tegra_drm_context_put().

>  	return 0;
>  }
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 063f5d397526..079aebb3fb38 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -13,6 +13,7 @@
>  #include <uapi/drm/tegra_drm.h>
>  #include <linux/host1x.h>
>  #include <linux/iova.h>
> +#include <linux/kref.h>
>  #include <linux/of_gpio.h>
>  
>  #include <drm/drmP.h>
> @@ -74,6 +75,8 @@ struct tegra_drm {
>  struct tegra_drm_client;
>  
>  struct tegra_drm_context {
> +	struct kref ref;
> +
>  	struct tegra_drm_client *client;
>  	struct host1x_channel *channel;
>  	unsigned int id;
> @@ -88,6 +91,7 @@ struct tegra_drm_client_ops {
>  	int (*submit)(struct tegra_drm_context *context,
>  		      struct drm_tegra_submit *args, struct drm_device *drm,
>  		      struct drm_file *file);
> +	void (*submit_done)(struct tegra_drm_context *context);
>  };
>  
>  int tegra_drm_submit(struct tegra_drm_context *context,
> 

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/10] drm/tegra: Deliver job completion callback to client
  2017-11-16 16:40     ` Dmitry Osipenko
@ 2017-11-29  9:09           ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-29  9:09 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 16.11.2017 18:40, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
>> To allow client drivers to free resources when jobs have completed,
>> deliver job completion callbacks to them. This requires adding
>> reference counting to context objects, as job completion can happen
>> after the userspace application has closed the context. As such,
>> also add kref-based refcounting for contexts.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
>> ---
>>  drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
>>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>>  2 files changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 2cdd054520bf..3e2a4a19412e 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  	return 0;
>>  }
>>
>> -static void tegra_drm_context_free(struct tegra_drm_context *context)
>> +static void tegra_drm_context_free(struct kref *ref)
>>  {
>> +	struct tegra_drm_context *context =
>> +		container_of(ref, struct tegra_drm_context, ref);
>> +
>>  	context->client->ops->close_channel(context);
>>  	kfree(context);
>>  }
>> @@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>>  	return 0;
>>  }
>>
>> +static void tegra_drm_job_done(struct host1x_job *job)
>> +{
>> +	struct tegra_drm_context *context = job->callback_data;
>> +
>> +	if (context->client->ops->submit_done)
>> +		context->client->ops->submit_done(context);
>> +
>> +	kref_put(&context->ref, tegra_drm_context_free);
>> +}
>> +
>>  int tegra_drm_submit(struct tegra_drm_context *context,
>>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>>  		     struct drm_file *file)
>> @@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>  	job->syncpt_id = syncpt.id;
>>  	job->timeout = 10000;
>>
>> +	job->done = tegra_drm_job_done;
>> +	job->callback_data = context;
>> +
>>  	if (args->timeout && args->timeout < 10000)
>>  		job->timeout = args->timeout;
>>
>> @@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>  	if (err)
>>  		goto fail;
>>
>> +	kref_get(&context->ref);
>> +
>>  	err = host1x_job_submit(job);
>>  	if (err) {
>> +		kref_put(&context->ref, tegra_drm_context_free);
>>  		host1x_job_unpin(job);
>>  		goto fail;
>>  	}
>> @@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>>  	if (err < 0)
>>  		kfree(context);
>>
>> +	kref_init(&context->ref);
>> +
>>  	mutex_unlock(&fpriv->lock);
>>  	return err;
>>  }
>> @@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>>  	}
>>
>>  	idr_remove(&fpriv->contexts, context->id);
>> -	tegra_drm_context_free(context);
>> +	kref_put(&context->ref, tegra_drm_context_free);
>>
>>  unlock:
>>  	mutex_unlock(&fpriv->lock);
>> @@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
>>  {
>>  	struct tegra_drm_context *context = p;
>>
>> -	tegra_drm_context_free(context);
>> +	kref_put(&context->ref, tegra_drm_context_free);
>>
>
> Probably won't hurt to add and use tegra_drm_context_get()/tegra_drm_context_put().
>

Yeah, maybe we have enough places where this is called for it to make sense.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/10] drm/tegra: Deliver job completion callback to client
@ 2017-11-29  9:09           ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-29  9:09 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 16.11.2017 18:40, Dmitry Osipenko wrote:
> On 05.11.2017 14:01, Mikko Perttunen wrote:
>> To allow client drivers to free resources when jobs have completed,
>> deliver job completion callbacks to them. This requires adding
>> reference counting to context objects, as job completion can happen
>> after the userspace application has closed the context. As such,
>> also add kref-based refcounting for contexts.
>>
>> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
>> ---
>>  drivers/gpu/drm/tegra/drm.c | 27 ++++++++++++++++++++++++---
>>  drivers/gpu/drm/tegra/drm.h |  4 ++++
>>  2 files changed, 28 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
>> index 2cdd054520bf..3e2a4a19412e 100644
>> --- a/drivers/gpu/drm/tegra/drm.c
>> +++ b/drivers/gpu/drm/tegra/drm.c
>> @@ -281,8 +281,11 @@ static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  	return 0;
>>  }
>>
>> -static void tegra_drm_context_free(struct tegra_drm_context *context)
>> +static void tegra_drm_context_free(struct kref *ref)
>>  {
>> +	struct tegra_drm_context *context =
>> +		container_of(ref, struct tegra_drm_context, ref);
>> +
>>  	context->client->ops->close_channel(context);
>>  	kfree(context);
>>  }
>> @@ -379,6 +382,16 @@ static int host1x_waitchk_copy_from_user(struct host1x_waitchk *dest,
>>  	return 0;
>>  }
>>
>> +static void tegra_drm_job_done(struct host1x_job *job)
>> +{
>> +	struct tegra_drm_context *context = job->callback_data;
>> +
>> +	if (context->client->ops->submit_done)
>> +		context->client->ops->submit_done(context);
>> +
>> +	kref_put(&context->ref, tegra_drm_context_free);
>> +}
>> +
>>  int tegra_drm_submit(struct tegra_drm_context *context,
>>  		     struct drm_tegra_submit *args, struct drm_device *drm,
>>  		     struct drm_file *file)
>> @@ -560,6 +573,9 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>  	job->syncpt_id = syncpt.id;
>>  	job->timeout = 10000;
>>
>> +	job->done = tegra_drm_job_done;
>> +	job->callback_data = context;
>> +
>>  	if (args->timeout && args->timeout < 10000)
>>  		job->timeout = args->timeout;
>>
>> @@ -567,8 +583,11 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>>  	if (err)
>>  		goto fail;
>>
>> +	kref_get(&context->ref);
>> +
>>  	err = host1x_job_submit(job);
>>  	if (err) {
>> +		kref_put(&context->ref, tegra_drm_context_free);
>>  		host1x_job_unpin(job);
>>  		goto fail;
>>  	}
>> @@ -717,6 +736,8 @@ static int tegra_open_channel(struct drm_device *drm, void *data,
>>  	if (err < 0)
>>  		kfree(context);
>>
>> +	kref_init(&context->ref);
>> +
>>  	mutex_unlock(&fpriv->lock);
>>  	return err;
>>  }
>> @@ -738,7 +759,7 @@ static int tegra_close_channel(struct drm_device *drm, void *data,
>>  	}
>>
>>  	idr_remove(&fpriv->contexts, context->id);
>> -	tegra_drm_context_free(context);
>> +	kref_put(&context->ref, tegra_drm_context_free);
>>
>>  unlock:
>>  	mutex_unlock(&fpriv->lock);
>> @@ -1026,7 +1047,7 @@ static int tegra_drm_context_cleanup(int id, void *p, void *data)
>>  {
>>  	struct tegra_drm_context *context = p;
>>
>> -	tegra_drm_context_free(context);
>> +	kref_put(&context->ref, tegra_drm_context_free);
>>
>
> Probably won't hurt to add and use tegra_drm_context_get()/tegra_drm_context_put().
>

Yeah, maybe we have enough places where this is called for it to make sense.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-12 11:23                     ` Dmitry Osipenko
  (?)
@ 2017-11-29  9:10                     ` Mikko Perttunen
  2017-11-29 12:18                       ` Dmitry Osipenko
  -1 siblings, 1 reply; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-29  9:10 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 12.11.2017 13:23, Dmitry Osipenko wrote:
> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>> the userspace.
>>>>>>
>>>>>
>>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>>> when all patching and checks are completed? Note that right now we have locking
>>>>> around submission in DRM, which I suppose should go away by making locking fine
>>>>> grained.
>>>>
>>>> That would be possible, but I don't think it should matter much since contention
>>>> here should not be the common case.
>>>>
>>>>>
>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>> suggested before [0]?
>>>>
>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>> having one client/class per channel.
>>>>
>>>
>>> Yes, currently there is a weak relation of channel and clients device, but seems
>>> channels device is only used for printing dev_* messages and device could be
>>> borrowed from the channels job. I don't see any real point of hardwiring channel
>>> to a specific device or client.
>>
>> Although, it won't work with syncpoint assignment to channel.
>
> On the other hand.. it should work if one syncpoint could be assigned to
> multiple channels, couldn't it?

A syncpoint can only be mapped to a single channel, so unfortunately 
this won't work.

Mikko

> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-29  9:10                     ` Mikko Perttunen
@ 2017-11-29 12:18                       ` Dmitry Osipenko
       [not found]                         ` <07e28b40-dd2b-774f-2d07-3b5d6cf08c46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-29 12:18 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 29.11.2017 12:10, Mikko Perttunen wrote:
> On 12.11.2017 13:23, Dmitry Osipenko wrote:
>> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>>> the userspace.
>>>>>>>
>>>>>>
>>>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>>>> when all patching and checks are completed? Note that right now we have
>>>>>> locking
>>>>>> around submission in DRM, which I suppose should go away by making locking
>>>>>> fine
>>>>>> grained.
>>>>>
>>>>> That would be possible, but I don't think it should matter much since
>>>>> contention
>>>>> here should not be the common case.
>>>>>
>>>>>>
>>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>>> suggested before [0]?
>>>>>
>>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>>> having one client/class per channel.
>>>>>
>>>>
>>>> Yes, currently there is a weak relation of channel and clients device, but
>>>> seems
>>>> channels device is only used for printing dev_* messages and device could be
>>>> borrowed from the channels job. I don't see any real point of hardwiring
>>>> channel
>>>> to a specific device or client.
>>>
>>> Although, it won't work with syncpoint assignment to channel.
>>
>> On the other hand.. it should work if one syncpoint could be assigned to
>> multiple channels, couldn't it?
> 
> A syncpoint can only be mapped to a single channel, so unfortunately this won't
> work.
Okay, in DRM we are requesting syncpoint on channels 'open' and syncpoint
assignment happens on jobs submission. So firstly submitted job will assign
syncpoint to the first channel and second job would re-assign syncpoint to a
second channel while first job is still in-progress, how is it going to work?

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-29 12:18                       ` Dmitry Osipenko
@ 2017-11-29 12:25                             ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-29 12:25 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 29.11.2017 14:18, Dmitry Osipenko wrote:
> On 29.11.2017 12:10, Mikko Perttunen wrote:
>> On 12.11.2017 13:23, Dmitry Osipenko wrote:
>>> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>>>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>>>> the userspace.
>>>>>>>>
>>>>>>>
>>>>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>>>>> when all patching and checks are completed? Note that right now we have
>>>>>>> locking
>>>>>>> around submission in DRM, which I suppose should go away by making locking
>>>>>>> fine
>>>>>>> grained.
>>>>>>
>>>>>> That would be possible, but I don't think it should matter much since
>>>>>> contention
>>>>>> here should not be the common case.
>>>>>>
>>>>>>>
>>>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>>>> suggested before [0]?
>>>>>>
>>>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>>>> having one client/class per channel.
>>>>>>
>>>>>
>>>>> Yes, currently there is a weak relation of channel and clients device, but
>>>>> seems
>>>>> channels device is only used for printing dev_* messages and device could be
>>>>> borrowed from the channels job. I don't see any real point of hardwiring
>>>>> channel
>>>>> to a specific device or client.
>>>>
>>>> Although, it won't work with syncpoint assignment to channel.
>>>
>>> On the other hand.. it should work if one syncpoint could be assigned to
>>> multiple channels, couldn't it?
>>
>> A syncpoint can only be mapped to a single channel, so unfortunately this won't
>> work.
> Okay, in DRM we are requesting syncpoint on channels 'open' and syncpoint
> assignment happens on jobs submission. So firstly submitted job will assign
> syncpoint to the first channel and second job would re-assign syncpoint to a
> second channel while first job is still in-progress, how is it going to work?
>

When a context is created, it's assigned both a syncpoint and channel 
and this pair stays for as long as the context is alive (i.e. as long as 
there are jobs), so even if the syncpoint is reassigned to a channel at 
every submit, it is always assigned to the same channel, so nothing 
breaks. Multiple contexts cannot share syncpoints so things work out.

Obviously this is not ideal as we currently never unassign syncpoints 
but at least it is not broken.

Mikko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-29 12:25                             ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-11-29 12:25 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 29.11.2017 14:18, Dmitry Osipenko wrote:
> On 29.11.2017 12:10, Mikko Perttunen wrote:
>> On 12.11.2017 13:23, Dmitry Osipenko wrote:
>>> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>>>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>>>> the userspace.
>>>>>>>>
>>>>>>>
>>>>>>> Wouldn't it be more optimal to request channel and block after job's pining,
>>>>>>> when all patching and checks are completed? Note that right now we have
>>>>>>> locking
>>>>>>> around submission in DRM, which I suppose should go away by making locking
>>>>>>> fine
>>>>>>> grained.
>>>>>>
>>>>>> That would be possible, but I don't think it should matter much since
>>>>>> contention
>>>>>> here should not be the common case.
>>>>>>
>>>>>>>
>>>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>>>> suggested before [0]?
>>>>>>
>>>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>>>> having one client/class per channel.
>>>>>>
>>>>>
>>>>> Yes, currently there is a weak relation of channel and clients device, but
>>>>> seems
>>>>> channels device is only used for printing dev_* messages and device could be
>>>>> borrowed from the channels job. I don't see any real point of hardwiring
>>>>> channel
>>>>> to a specific device or client.
>>>>
>>>> Although, it won't work with syncpoint assignment to channel.
>>>
>>> On the other hand.. it should work if one syncpoint could be assigned to
>>> multiple channels, couldn't it?
>>
>> A syncpoint can only be mapped to a single channel, so unfortunately this won't
>> work.
> Okay, in DRM we are requesting syncpoint on channels 'open' and syncpoint
> assignment happens on jobs submission. So firstly submitted job will assign
> syncpoint to the first channel and second job would re-assign syncpoint to a
> second channel while first job is still in-progress, how is it going to work?
>

When a context is created, it's assigned both a syncpoint and channel 
and this pair stays for as long as the context is alive (i.e. as long as 
there are jobs), so even if the syncpoint is reassigned to a channel at 
every submit, it is always assigned to the same channel, so nothing 
breaks. Multiple contexts cannot share syncpoints so things work out.

Obviously this is not ideal as we currently never unassign syncpoints 
but at least it is not broken.

Mikko

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
  2017-11-29 12:25                             ` Mikko Perttunen
@ 2017-11-29 12:37                                 ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-29 12:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 29.11.2017 15:25, Mikko Perttunen wrote:
> On 29.11.2017 14:18, Dmitry Osipenko wrote:
>> On 29.11.2017 12:10, Mikko Perttunen wrote:
>>> On 12.11.2017 13:23, Dmitry Osipenko wrote:
>>>> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>>>>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>>>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>>>>> the userspace.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wouldn't it be more optimal to request channel and block after job's
>>>>>>>> pining,
>>>>>>>> when all patching and checks are completed? Note that right now we have
>>>>>>>> locking
>>>>>>>> around submission in DRM, which I suppose should go away by making locking
>>>>>>>> fine
>>>>>>>> grained.
>>>>>>>
>>>>>>> That would be possible, but I don't think it should matter much since
>>>>>>> contention
>>>>>>> here should not be the common case.
>>>>>>>
>>>>>>>>
>>>>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>>>>> suggested before [0]?
>>>>>>>
>>>>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>>>>> having one client/class per channel.
>>>>>>>
>>>>>>
>>>>>> Yes, currently there is a weak relation of channel and clients device, but
>>>>>> seems
>>>>>> channels device is only used for printing dev_* messages and device could be
>>>>>> borrowed from the channels job. I don't see any real point of hardwiring
>>>>>> channel
>>>>>> to a specific device or client.
>>>>>
>>>>> Although, it won't work with syncpoint assignment to channel.
>>>>
>>>> On the other hand.. it should work if one syncpoint could be assigned to
>>>> multiple channels, couldn't it?
>>>
>>> A syncpoint can only be mapped to a single channel, so unfortunately this won't
>>> work.
>> Okay, in DRM we are requesting syncpoint on channels 'open' and syncpoint
>> assignment happens on jobs submission. So firstly submitted job will assign
>> syncpoint to the first channel and second job would re-assign syncpoint to a
>> second channel while first job is still in-progress, how is it going to work?
>>
> 
> When a context is created, it's assigned both a syncpoint and channel and this
> pair stays for as long as the context is alive (i.e. as long as there are jobs),
> so even if the syncpoint is reassigned to a channel at every submit, it is
> always assigned to the same channel, so nothing breaks. Multiple contexts cannot
> share syncpoints so things work out.
> 
> Obviously this is not ideal as we currently never unassign syncpoints but at
> least it is not broken.

Right, I forgot that you made tegra_drm_context_get_channel() to re-use
requested channel if there are pending jobs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel
@ 2017-11-29 12:37                                 ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-11-29 12:37 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 29.11.2017 15:25, Mikko Perttunen wrote:
> On 29.11.2017 14:18, Dmitry Osipenko wrote:
>> On 29.11.2017 12:10, Mikko Perttunen wrote:
>>> On 12.11.2017 13:23, Dmitry Osipenko wrote:
>>>> On 11.11.2017 00:15, Dmitry Osipenko wrote:
>>>>> On 07.11.2017 18:29, Dmitry Osipenko wrote:
>>>>>> On 07.11.2017 16:11, Mikko Perttunen wrote:
>>>>>>> On 05.11.2017 19:14, Dmitry Osipenko wrote:
>>>>>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>>>>>> Add an option to host1x_channel_request to interruptibly wait for a
>>>>>>>>> free channel. This allows IOCTLs that acquire a channel to block
>>>>>>>>> the userspace.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wouldn't it be more optimal to request channel and block after job's
>>>>>>>> pining,
>>>>>>>> when all patching and checks are completed? Note that right now we have
>>>>>>>> locking
>>>>>>>> around submission in DRM, which I suppose should go away by making locking
>>>>>>>> fine
>>>>>>>> grained.
>>>>>>>
>>>>>>> That would be possible, but I don't think it should matter much since
>>>>>>> contention
>>>>>>> here should not be the common case.
>>>>>>>
>>>>>>>>
>>>>>>>> Or maybe it would be more optimal to just iterate over channels, like I
>>>>>>>> suggested before [0]?
>>>>>>>
>>>>>>> Somehow I hadn't noticed this before, but this would break the invariant of
>>>>>>> having one client/class per channel.
>>>>>>>
>>>>>>
>>>>>> Yes, currently there is a weak relation of channel and clients device, but
>>>>>> seems
>>>>>> channels device is only used for printing dev_* messages and device could be
>>>>>> borrowed from the channels job. I don't see any real point of hardwiring
>>>>>> channel
>>>>>> to a specific device or client.
>>>>>
>>>>> Although, it won't work with syncpoint assignment to channel.
>>>>
>>>> On the other hand.. it should work if one syncpoint could be assigned to
>>>> multiple channels, couldn't it?
>>>
>>> A syncpoint can only be mapped to a single channel, so unfortunately this won't
>>> work.
>> Okay, in DRM we are requesting syncpoint on channels 'open' and syncpoint
>> assignment happens on jobs submission. So firstly submitted job will assign
>> syncpoint to the first channel and second job would re-assign syncpoint to a
>> second channel while first job is still in-progress, how is it going to work?
>>
> 
> When a context is created, it's assigned both a syncpoint and channel and this
> pair stays for as long as the context is alive (i.e. as long as there are jobs),
> so even if the syncpoint is reassigned to a channel at every submit, it is
> always assigned to the same channel, so nothing breaks. Multiple contexts cannot
> share syncpoints so things work out.
> 
> Obviously this is not ideal as we currently never unassign syncpoints but at
> least it is not broken.

Right, I forgot that you made tegra_drm_context_get_channel() to re-use
requested channel if there are pending jobs.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-11-07 21:23             ` Dmitry Osipenko
@ 2017-12-05 13:21                 ` Mikko Perttunen
  -1 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-12-05 13:21 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 07.11.2017 23:23, Dmitry Osipenko wrote:
> On 07.11.2017 15:28, Mikko Perttunen wrote:
>> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>> ...
>>>>
>>>> +static int mlock_id_for_class(unsigned int class)
>>>> +{
>>>> +#if HOST1X_HW >= 6
>>>> +    switch (class)
>>>> +    {
>>>> +    case HOST1X_CLASS_HOST1X:
>>>> +        return 0;
>>>> +    case HOST1X_CLASS_VIC:
>>>> +        return 17;
>>>
>>> What is the meaning of returned ID values that you have defined here? Why VIC
>>> should have different ID on T186?
>>
>> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
>> a specific class. Therefore we must map that correctly.
>>
>
> Okay.
>
>>>
>>>> +    default:
>>>> +        return -EINVAL;
>>>> +    }
>>>> +#else
>>>> +    switch (class)
>>>> +    {
>>>> +    case HOST1X_CLASS_HOST1X:
>>>> +        return 0;
>>>> +    case HOST1X_CLASS_GR2D:
>>>> +        return 1;
>>>> +    case HOST1X_CLASS_GR2D_SB:
>>>> +        return 2;
>>>
>>> Note that we are allowing to switch 2d classes in the same jobs context and
>>> currently jobs class is somewhat hardcoded to GR2D.
>>>
>>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>>> trigger execution of different classes simultaneously? Would syncpoint
>>> differentiate classes on OP_DONE event?
>>
>> Good point, we might need to use the same lock for these two.
>>
>>>
>>> I suppose that MLOCK (the module lock) implies the whole module locking,
>>> wouldn't it make sense to just use the module ID's defined in the TRM?
>>
>> Can you point out where these are defined?
>
> See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
> HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
> HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.

These values look like they would work on T20, but at least on T124 the 
module numbering for modules we want to lock goes above the number of 
MLOCKs so the indexing scheme would not work there..

Mikko

> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
@ 2017-12-05 13:21                 ` Mikko Perttunen
  0 siblings, 0 replies; 52+ messages in thread
From: Mikko Perttunen @ 2017-12-05 13:21 UTC (permalink / raw)
  To: Dmitry Osipenko, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 07.11.2017 23:23, Dmitry Osipenko wrote:
> On 07.11.2017 15:28, Mikko Perttunen wrote:
>> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>> ...
>>>>
>>>> +static int mlock_id_for_class(unsigned int class)
>>>> +{
>>>> +#if HOST1X_HW >= 6
>>>> +    switch (class)
>>>> +    {
>>>> +    case HOST1X_CLASS_HOST1X:
>>>> +        return 0;
>>>> +    case HOST1X_CLASS_VIC:
>>>> +        return 17;
>>>
>>> What is the meaning of returned ID values that you have defined here? Why VIC
>>> should have different ID on T186?
>>
>> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
>> a specific class. Therefore we must map that correctly.
>>
>
> Okay.
>
>>>
>>>> +    default:
>>>> +        return -EINVAL;
>>>> +    }
>>>> +#else
>>>> +    switch (class)
>>>> +    {
>>>> +    case HOST1X_CLASS_HOST1X:
>>>> +        return 0;
>>>> +    case HOST1X_CLASS_GR2D:
>>>> +        return 1;
>>>> +    case HOST1X_CLASS_GR2D_SB:
>>>> +        return 2;
>>>
>>> Note that we are allowing to switch 2d classes in the same jobs context and
>>> currently jobs class is somewhat hardcoded to GR2D.
>>>
>>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>>> trigger execution of different classes simultaneously? Would syncpoint
>>> differentiate classes on OP_DONE event?
>>
>> Good point, we might need to use the same lock for these two.
>>
>>>
>>> I suppose that MLOCK (the module lock) implies the whole module locking,
>>> wouldn't it make sense to just use the module ID's defined in the TRM?
>>
>> Can you point out where these are defined?
>
> See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
> HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
> HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.

These values look like they would work on T20, but at least on T124 the 
module numbering for modules we want to lock goes above the number of 
MLOCKs so the indexing scheme would not work there..

Mikko

> --
> To unsubscribe from this list: send the line "unsubscribe linux-tegra" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
  2017-12-05 13:21                 ` Mikko Perttunen
@ 2017-12-05 13:43                     ` Dmitry Osipenko
  -1 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-12-05 13:43 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen,
	thierry.reding-Re5JQEeQqe8AvxtiuMwx3w,
	jonathanh-DDmLM1+adcrQT0dZR+AlfA
  Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 05.12.2017 16:21, Mikko Perttunen wrote:
> On 07.11.2017 23:23, Dmitry Osipenko wrote:
>> On 07.11.2017 15:28, Mikko Perttunen wrote:
>>> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>> ...
>>>>>
>>>>> +static int mlock_id_for_class(unsigned int class)
>>>>> +{
>>>>> +#if HOST1X_HW >= 6
>>>>> +    switch (class)
>>>>> +    {
>>>>> +    case HOST1X_CLASS_HOST1X:
>>>>> +        return 0;
>>>>> +    case HOST1X_CLASS_VIC:
>>>>> +        return 17;
>>>>
>>>> What is the meaning of returned ID values that you have defined here? Why VIC
>>>> should have different ID on T186?
>>>
>>> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
>>> a specific class. Therefore we must map that correctly.
>>>
>>
>> Okay.
>>
>>>>
>>>>> +    default:
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +#else
>>>>> +    switch (class)
>>>>> +    {
>>>>> +    case HOST1X_CLASS_HOST1X:
>>>>> +        return 0;
>>>>> +    case HOST1X_CLASS_GR2D:
>>>>> +        return 1;
>>>>> +    case HOST1X_CLASS_GR2D_SB:
>>>>> +        return 2;
>>>>
>>>> Note that we are allowing to switch 2d classes in the same jobs context and
>>>> currently jobs class is somewhat hardcoded to GR2D.
>>>>
>>>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>>>> trigger execution of different classes simultaneously? Would syncpoint
>>>> differentiate classes on OP_DONE event?
>>>
>>> Good point, we might need to use the same lock for these two.
>>>
>>>>
>>>> I suppose that MLOCK (the module lock) implies the whole module locking,
>>>> wouldn't it make sense to just use the module ID's defined in the TRM?
>>>
>>> Can you point out where these are defined?
>>
>> See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
>> HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
>> HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.
> 
> These values look like they would work on T20, but at least on T124 the module
> numbering for modules we want to lock goes above the number of MLOCKs so the
> indexing scheme would not work there..
> 

Indeed, for some reason I was thinking that there are 32 MLOCK's instead of 16.

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/10] gpu: host1x: Lock classes during job submission
@ 2017-12-05 13:43                     ` Dmitry Osipenko
  0 siblings, 0 replies; 52+ messages in thread
From: Dmitry Osipenko @ 2017-12-05 13:43 UTC (permalink / raw)
  To: Mikko Perttunen, Mikko Perttunen, thierry.reding, jonathanh
  Cc: dri-devel, linux-tegra, linux-kernel

On 05.12.2017 16:21, Mikko Perttunen wrote:
> On 07.11.2017 23:23, Dmitry Osipenko wrote:
>> On 07.11.2017 15:28, Mikko Perttunen wrote:
>>> On 05.11.2017 18:46, Dmitry Osipenko wrote:
>>>> On 05.11.2017 14:01, Mikko Perttunen wrote:
>>>>> ...
>>>>>
>>>>> +static int mlock_id_for_class(unsigned int class)
>>>>> +{
>>>>> +#if HOST1X_HW >= 6
>>>>> +    switch (class)
>>>>> +    {
>>>>> +    case HOST1X_CLASS_HOST1X:
>>>>> +        return 0;
>>>>> +    case HOST1X_CLASS_VIC:
>>>>> +        return 17;
>>>>
>>>> What is the meaning of returned ID values that you have defined here? Why VIC
>>>> should have different ID on T186?
>>>
>>> On T186, MLOCKs are not "generic" - the HW knows that each MLOCK corresponds to
>>> a specific class. Therefore we must map that correctly.
>>>
>>
>> Okay.
>>
>>>>
>>>>> +    default:
>>>>> +        return -EINVAL;
>>>>> +    }
>>>>> +#else
>>>>> +    switch (class)
>>>>> +    {
>>>>> +    case HOST1X_CLASS_HOST1X:
>>>>> +        return 0;
>>>>> +    case HOST1X_CLASS_GR2D:
>>>>> +        return 1;
>>>>> +    case HOST1X_CLASS_GR2D_SB:
>>>>> +        return 2;
>>>>
>>>> Note that we are allowing to switch 2d classes in the same jobs context and
>>>> currently jobs class is somewhat hardcoded to GR2D.
>>>>
>>>> Even though that GR2D and GR2D_SB use different register banks, is it okay to
>>>> trigger execution of different classes simultaneously? Would syncpoint
>>>> differentiate classes on OP_DONE event?
>>>
>>> Good point, we might need to use the same lock for these two.
>>>
>>>>
>>>> I suppose that MLOCK (the module lock) implies the whole module locking,
>>>> wouldn't it make sense to just use the module ID's defined in the TRM?
>>>
>>> Can you point out where these are defined?
>>
>> See INDMODID / REGF_MODULEID fields of HOST1X_CHANNEL_INDOFF2_0 /
>> HOST1X_SYNC_REGF_ADDR_0 registers, bit numbers of HOST1X_SYNC_INTSTATUS_0 /
>> HOST1X_SYNC_INTC0MASK_0 / HOST1X_SYNC_MOD_TEARDOWN_0.
> 
> These values look like they would work on T20, but at least on T124 the module
> numbering for modules we want to lock goes above the number of MLOCKs so the
> indexing scheme would not work there..
> 

Indeed, for some reason I was thinking that there are 32 MLOCK's instead of 16.

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2017-12-05 13:43 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-05 11:01 [PATCH 00/10] Dynamic Host1x channel allocation Mikko Perttunen
2017-11-05 11:01 ` [PATCH 02/10] gpu: host1x: Print MLOCK state in debug dumps on T186 Mikko Perttunen
2017-11-05 11:01 ` [PATCH 03/10] gpu: host1x: Add lock around channel allocation Mikko Perttunen
2017-11-05 11:01 ` [PATCH 04/10] gpu: host1x: Lock classes during job submission Mikko Perttunen
2017-11-05 11:01   ` Mikko Perttunen
     [not found]   ` <20171105110118.15142-5-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-05 16:46     ` Dmitry Osipenko
2017-11-05 16:46       ` Dmitry Osipenko
2017-11-07 12:28       ` Mikko Perttunen
     [not found]         ` <ef08d3d8-94a7-8804-c339-5310719333f3-/1wQRMveznE@public.gmane.org>
2017-11-07 21:23           ` Dmitry Osipenko
2017-11-07 21:23             ` Dmitry Osipenko
     [not found]             ` <dc39398b-ea49-6e97-28ba-652f8b49db44-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-12-05 13:21               ` Mikko Perttunen
2017-12-05 13:21                 ` Mikko Perttunen
     [not found]                 ` <2b4d9283-dabe-9e1f-f8cb-6ddbc16e3f0f-/1wQRMveznE@public.gmane.org>
2017-12-05 13:43                   ` Dmitry Osipenko
2017-12-05 13:43                     ` Dmitry Osipenko
2017-11-05 11:01 ` [PATCH 05/10] gpu: host1x: Add job done callback Mikko Perttunen
     [not found] ` <20171105110118.15142-1-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-05 11:01   ` [PATCH 01/10] gpu: host1x: Parameterize channel aperture size Mikko Perttunen
2017-11-05 11:01     ` Mikko Perttunen
2017-11-05 11:01   ` [PATCH 06/10] drm/tegra: Deliver job completion callback to client Mikko Perttunen
2017-11-05 11:01     ` Mikko Perttunen
     [not found]     ` <20171105110118.15142-7-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-16 16:33       ` Dmitry Osipenko
2017-11-16 16:33         ` Dmitry Osipenko
2017-11-16 16:40     ` Dmitry Osipenko
     [not found]       ` <1afa1ba9-3103-3672-2e15-fb8c7de2520b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-29  9:09         ` Mikko Perttunen
2017-11-29  9:09           ` Mikko Perttunen
2017-11-05 11:01   ` [PATCH 07/10] drm/tegra: Make syncpoints be per-context Mikko Perttunen
2017-11-05 11:01     ` Mikko Perttunen
2017-11-07 15:34   ` [PATCH 00/10] Dynamic Host1x channel allocation Dmitry Osipenko
2017-11-07 15:34     ` Dmitry Osipenko
2017-11-05 11:01 ` [PATCH 08/10] drm/tegra: Implement dynamic channel allocation model Mikko Perttunen
     [not found]   ` <20171105110118.15142-9-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-05 17:43     ` Dmitry Osipenko
2017-11-05 17:43       ` Dmitry Osipenko
2017-11-07 12:29       ` Mikko Perttunen
     [not found]         ` <38fcf947-0d5d-e2c7-f49f-9efce5eeb1a3-/1wQRMveznE@public.gmane.org>
2017-11-13 11:49           ` Dmitry Osipenko
2017-11-13 11:49             ` Dmitry Osipenko
2017-11-05 11:01 ` [PATCH 09/10] drm/tegra: Boot VIC in runtime resume Mikko Perttunen
2017-11-05 11:01 ` [PATCH 10/10] gpu: host1x: Optionally block when acquiring channel Mikko Perttunen
2017-11-05 11:01   ` Mikko Perttunen
     [not found]   ` <20171105110118.15142-11-mperttunen-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2017-11-05 17:14     ` Dmitry Osipenko
2017-11-05 17:14       ` Dmitry Osipenko
     [not found]       ` <9c5676eb-ba6f-c187-29e4-7b331bd3962f-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-07 13:11         ` Mikko Perttunen
2017-11-07 13:11           ` Mikko Perttunen
2017-11-07 15:29           ` Dmitry Osipenko
     [not found]             ` <1b35ec93-167b-3436-0ff2-5e2e0886aea7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-10 21:15               ` Dmitry Osipenko
2017-11-10 21:15                 ` Dmitry Osipenko
     [not found]                 ` <dcb8c4ef-9eb9-556f-cc96-651a50636afa-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-12 11:23                   ` Dmitry Osipenko
2017-11-12 11:23                     ` Dmitry Osipenko
2017-11-29  9:10                     ` Mikko Perttunen
2017-11-29 12:18                       ` Dmitry Osipenko
     [not found]                         ` <07e28b40-dd2b-774f-2d07-3b5d6cf08c46-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-11-29 12:25                           ` Mikko Perttunen
2017-11-29 12:25                             ` Mikko Perttunen
     [not found]                             ` <a4adb6ac-b72e-3f9b-fc6c-2a56bc6537ce-/1wQRMveznE@public.gmane.org>
2017-11-29 12:37                               ` Dmitry Osipenko
2017-11-29 12:37                                 ` Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.