[PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later
@ 2019-02-01 13:28 Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 01/16] gpu: host1x: Set up stream ID table Thierry Reding
                   ` (16 more replies)
  0 siblings, 17 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra186 and later are different from earlier generations in that they
use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
slightly differently in that the geometry for IOMMU domains is set only
after a device was attached to it. This is to make sure that the SMMU
instance that the domain belongs to is known, because each instance can
have a different input address space (i.e. geometry).

Work around this by moving all IOVA allocations to a point where the
geometry of the domain is properly initialized.

This second version of the series addresses all review comments and adds
a number of patches that will actually allow host1x to work with an SMMU
enabled on Tegra186. The patches also add programming required to
address the full 40 bits of address space.

The third version of the series fixes the 40-bit addressing code by
making sure that wide opcodes are always written atomically to the push
buffer. Another pair of patches are introduced to fix a long-standing
bug where the HOST1X_CHANNEL_DMAEND register wasn't properly programmed
and one push buffer memory optimization that avoid wasting almost one
full memory page per push buffer.

This supersedes the following patch:

	https://patchwork.kernel.org/patch/10775579/

Thierry

Thierry Reding (16):
  gpu: host1x: Set up stream ID table
  gpu: host1x: Program the channel stream ID
  gpu: host1x: Introduce support for wide opcodes
  gpu: host1x: Support 40-bit addressing
  gpu: host1x: Use direct DMA with IOMMU API usage
  gpu: host1x: Restrict IOVA space to DMA mask
  gpu: host1x: Support 40-bit addressing on Tegra186
  gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  gpu: host1x: Optimize CDMA push buffer memory usage
  drm/tegra: Store parent pointer in Tegra DRM clients
  drm/tegra: vic: Load firmware on demand
  drm/tegra: Setup shared IOMMU domain after initialization
  drm/tegra: Restrict IOVA space to DMA mask
  drm/tegra: vic: Do not clear driver data
  drm/tegra: vic: Support stream ID register programming
  arm64: tegra: Enable SMMU for VIC on Tegra186

 arch/arm64/boot/dts/nvidia/tegra186.dtsi    |   1 +
 drivers/gpu/drm/tegra/drm.c                 |  57 +++++----
 drivers/gpu/drm/tegra/drm.h                 |   1 +
 drivers/gpu/drm/tegra/vic.c                 |  75 ++++++++---
 drivers/gpu/drm/tegra/vic.h                 |   9 ++
 drivers/gpu/host1x/cdma.c                   | 132 ++++++++++++++++++--
 drivers/gpu/host1x/cdma.h                   |   2 +
 drivers/gpu/host1x/dev.c                    |  49 +++++++-
 drivers/gpu/host1x/dev.h                    |   8 ++
 drivers/gpu/host1x/hw/cdma_hw.c             |  32 ++++-
 drivers/gpu/host1x/hw/channel_hw.c          |  42 ++++++-
 drivers/gpu/host1x/hw/host1x06_hardware.h   |   6 +
 drivers/gpu/host1x/hw/host1x07_hardware.h   |   6 +
 drivers/gpu/host1x/hw/hw_host1x06_channel.h |  11 ++
 drivers/gpu/host1x/hw/hw_host1x07_channel.h |  11 ++
 include/trace/events/host1x.h               |  26 ++++
 16 files changed, 404 insertions(+), 64 deletions(-)
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x06_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x07_channel.h

-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v3 01/16] gpu: host1x: Set up stream ID table
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 02/16] gpu: host1x: Program the channel stream ID Thierry Reding
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

In order to enable the MMIO path stream ID protection provided by the
incarnation of host1x found in Tegra186 and later, the host1x must be
provided with the list of stream ID register offsets for each of its
clients. Some clients (such as VIC) have multiple stream ID registers
that are assumed to be contiguous. The host1x is programmed with the
base offset and a limit which provide the range of registers that the
host1x needs to monitor for writes.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/dev.h |  8 ++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 419d8929a98f..4c044ee54fe6 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -120,6 +120,15 @@ static const struct host1x_info host1x05_info = {
 	.dma_mask = DMA_BIT_MASK(34),
 };
 
+static const struct host1x_sid_entry tegra186_sid_table[] = {
+	{
+		/* VIC */
+		.base = 0x1af0,
+		.offset = 0x30,
+		.limit = 0x34
+	},
+};
+
 static const struct host1x_info host1x06_info = {
 	.nb_channels = 63,
 	.nb_pts = 576,
@@ -129,6 +138,17 @@ static const struct host1x_info host1x06_info = {
 	.sync_offset = 0x0,
 	.dma_mask = DMA_BIT_MASK(34),
 	.has_hypervisor = true,
+	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
+	.sid_table = tegra186_sid_table,
+};
+
+static const struct host1x_sid_entry tegra194_sid_table[] = {
+	{
+		/* VIC */
+		.base = 0x1af0,
+		.offset = 0x30,
+		.limit = 0x34
+	},
 };
 
 static const struct host1x_info host1x07_info = {
@@ -140,6 +160,8 @@ static const struct host1x_info host1x07_info = {
 	.sync_offset = 0x0,
 	.dma_mask = DMA_BIT_MASK(40),
 	.has_hypervisor = true,
+	.num_sid_entries = ARRAY_SIZE(tegra194_sid_table),
+	.sid_table = tegra194_sid_table,
 };
 
 static const struct of_device_id host1x_of_match[] = {
@@ -154,6 +176,19 @@ static const struct of_device_id host1x_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, host1x_of_match);
 
+static void host1x_setup_sid_table(struct host1x *host)
+{
+	const struct host1x_info *info = host->info;
+	unsigned int i;
+
+	for (i = 0; i < info->num_sid_entries; i++) {
+		const struct host1x_sid_entry *entry = &info->sid_table[i];
+
+		host1x_hypervisor_writel(host, entry->offset, entry->base);
+		host1x_hypervisor_writel(host, entry->limit, entry->base + 4);
+	}
+}
+
 static int host1x_probe(struct platform_device *pdev)
 {
 	struct host1x *host;
@@ -316,6 +351,9 @@ static int host1x_probe(struct platform_device *pdev)
 
 	host1x_debug_init(host);
 
+	if (host->info->has_hypervisor)
+		host1x_setup_sid_table(host);
+
 	err = host1x_register(host);
 	if (err < 0)
 		goto fail_deinit_intr;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 36f44ffebe73..05216a7e4830 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -94,6 +94,12 @@ struct host1x_intr_ops {
 	int (*free_syncpt_irq)(struct host1x *host);
 };
 
+struct host1x_sid_entry {
+	unsigned int base;
+	unsigned int offset;
+	unsigned int limit;
+};
+
 struct host1x_info {
 	unsigned int nb_channels; /* host1x: number of channels supported */
 	unsigned int nb_pts; /* host1x: number of syncpoints supported */
@@ -103,6 +109,8 @@ struct host1x_info {
 	unsigned int sync_offset; /* offset of syncpoint registers */
 	u64 dma_mask; /* mask of addressable memory */
 	bool has_hypervisor; /* has hypervisor registers */
+	unsigned int num_sid_entries;
+	const struct host1x_sid_entry *sid_table;
 };
 
 struct host1x {
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 02/16] gpu: host1x: Program the channel stream ID
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 01/16] gpu: host1x: Set up stream ID table Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 03/16] gpu: host1x: Introduce support for wide opcodes Thierry Reding
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

When processing command streams, make sure the host1x's stream ID is
programmed for the channel so that addresses are properly translated
through the SMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c          | 12 ++++++++++++
 drivers/gpu/host1x/hw/host1x06_hardware.h   |  1 +
 drivers/gpu/host1x/hw/host1x07_hardware.h   |  1 +
 drivers/gpu/host1x/hw/hw_host1x06_channel.h | 11 +++++++++++
 drivers/gpu/host1x/hw/hw_host1x07_channel.h | 11 +++++++++++
 5 files changed, 36 insertions(+)
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x06_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x07_channel.h

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 95ea81172a83..384f6ac91afa 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -89,6 +89,16 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
 			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
 }
 
+static void host1x_channel_set_streamid(struct host1x_channel *channel)
+{
+#if HOST1X_HW >= 6
+	struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
+	u32 sid = spec->ids[0] & 0xffff;
+
+	host1x_ch_writel(channel, sid, HOST1X_CHANNEL_SMMU_STREAMID);
+#endif
+}
+
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
@@ -120,6 +130,8 @@ static int channel_submit(struct host1x_job *job)
 		goto error;
 	}
 
+	host1x_channel_set_streamid(ch);
+
 	/* begin a CDMA submit */
 	err = host1x_cdma_begin(&ch->cdma, job);
 	if (err) {
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 3039c92ea605..eab753b91f24 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 
+#include "hw_host1x06_channel.h"
 #include "hw_host1x06_uclass.h"
 #include "hw_host1x06_vm.h"
 #include "hw_host1x06_hypervisor.h"
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
index 1353e7ab71dd..a79f57dc87bb 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 
+#include "hw_host1x07_channel.h"
 #include "hw_host1x07_uclass.h"
 #include "hw_host1x07_vm.h"
 #include "hw_host1x07_hypervisor.h"
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_channel.h b/drivers/gpu/host1x/hw/hw_host1x06_channel.h
new file mode 100644
index 000000000000..18ae1c57bbea
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x06_channel.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_HW_HOST1X06_CHANNEL_H
+#define HOST1X_HW_HOST1X06_CHANNEL_H
+
+#define HOST1X_CHANNEL_SMMU_STREAMID 0x084
+
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x07_channel.h b/drivers/gpu/host1x/hw/hw_host1x07_channel.h
new file mode 100644
index 000000000000..96fa72bbd7ab
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x07_channel.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_HW_HOST1X07_CHANNEL_H
+#define HOST1X_HW_HOST1X07_CHANNEL_H
+
+#define HOST1X_CHANNEL_SMMU_STREAMID 0x084
+
+#endif
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 03/16] gpu: host1x: Introduce support for wide opcodes
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 01/16] gpu: host1x: Set up stream ID table Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 02/16] gpu: host1x: Program the channel stream ID Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 04/16] gpu: host1x: Support 40-bit addressing Thierry Reding
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The CDMA push buffer can currently only handle opcodes that take a
single word parameter. However, the host1x implementation on Tegra186
and later supports opcodes that require multiple words as parameters.

Unfortunately the way the push buffer is structured, these wide opcodes
cannot simply be composed of two regular opcodes because that could
result in the wide opcode being split across the end of the push buffer
and the final RESTART opcode required to wrap the push buffer around
would break the wide opcode.

One way to fix this would be to remove the concept of slots to simplify
push buffer operations. However, that's not entirely trivial and should
be done in a separate patch. For now, simply use a different function
to push four-word opcodes into the push buffer. Technically only three
words are pushed, with the fourth word used as padding to preserve the
2-word alignment required by the slots abstraction. The fourth word is
always a NOP opcode.

Additional care must be taken when the end of the push buffer is
reached. If a four-word opcode doesn't fit into the push buffer without
being split by the boundary, NOP opcodes will be introduced and the new
wide opcode placed at the beginning of the push buffer.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/cdma.c     | 92 +++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/cdma.h     |  2 +
 include/trace/events/host1x.h | 26 ++++++++++
 3 files changed, 120 insertions(+)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 91df51e631b2..e2d106fa3c0b 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -217,6 +217,45 @@ unsigned int host1x_cdma_wait_locked(struct host1x_cdma *cdma,
 	return 0;
 }
 
+/*
+ * Sleep (if necessary) until the push buffer has enough free space.
+ *
+ * Must be called with the cdma lock held.
+ */
+int host1x_cdma_wait_pushbuffer_space(struct host1x *host1x,
+				      struct host1x_cdma *cdma,
+				      unsigned int needed)
+{
+	while (true) {
+		struct push_buffer *pb = &cdma->push_buffer;
+		unsigned int space;
+
+		space = host1x_pushbuffer_space(pb);
+		if (space >= needed)
+			break;
+
+		trace_host1x_wait_cdma(dev_name(cdma_to_channel(cdma)->dev),
+				       CDMA_EVENT_PUSH_BUFFER_SPACE);
+
+		host1x_hw_cdma_flush(host1x, cdma);
+
+		/* If somebody has managed to already start waiting, yield */
+		if (cdma->event != CDMA_EVENT_NONE) {
+			mutex_unlock(&cdma->lock);
+			schedule();
+			mutex_lock(&cdma->lock);
+			continue;
+		}
+
+		cdma->event = CDMA_EVENT_PUSH_BUFFER_SPACE;
+
+		mutex_unlock(&cdma->lock);
+		down(&cdma->sem);
+		mutex_lock(&cdma->lock);
+	}
+
+	return 0;
+}
 /*
  * Start timer that tracks the time spent by the job.
  * Must be called with the cdma lock held.
@@ -509,6 +548,59 @@ void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2)
 	host1x_pushbuffer_push(pb, op1, op2);
 }
 
+/*
+ * Push four words into two consecutive push buffer slots. Note that extra
+ * care needs to be taken not to split the two slots across the end of the
+ * push buffer. Otherwise the RESTART opcode at the end of the push buffer
+ * that ensures processing will restart at the beginning will break up the
+ * four words.
+ *
+ * Blocks as necessary if the push buffer is full.
+ */
+void host1x_cdma_push_wide(struct host1x_cdma *cdma, u32 op1, u32 op2,
+			   u32 op3, u32 op4)
+{
+	struct host1x_channel *channel = cdma_to_channel(cdma);
+	struct host1x *host1x = cdma_to_host1x(cdma);
+	struct push_buffer *pb = &cdma->push_buffer;
+	unsigned int needed = 2, extra = 0, i;
+	unsigned int space = cdma->slots_free;
+
+	if (host1x_debug_trace_cmdbuf)
+		trace_host1x_cdma_push_wide(dev_name(channel->dev), op1, op2,
+					    op3, op4);
+
+	/* compute number of extra slots needed for padding */
+	if (pb->pos + 16 > pb->size) {
+		extra = (pb->size - pb->pos) / 8;
+		needed += extra;
+	}
+
+	host1x_cdma_wait_pushbuffer_space(host1x, cdma, needed);
+	space = host1x_pushbuffer_space(pb);
+
+	cdma->slots_free = space - needed;
+	cdma->slots_used += needed;
+
+	/*
+	 * Note that we rely on the fact that this is only used to submit wide
+	 * gather opcodes, which consist of 3 words, and they are padded with
+	 * a NOP to avoid having to deal with fractional slots (a slot always
+	 * represents 2 words). The fourth opcode passed to this function will
+	 * therefore always be a NOP.
+	 *
+	 * This works around a slight ambiguity when it comes to opcodes. For
+	 * all current host1x incarnations the NOP opcode uses the exact same
+	 * encoding (0x20000000), so we could hard-code the value here, but a
+	 * new incarnation may change it and break that assumption.
+	 */
+	for (i = 0; i < extra; i++)
+		host1x_pushbuffer_push(pb, op4, op4);
+
+	host1x_pushbuffer_push(pb, op1, op2);
+	host1x_pushbuffer_push(pb, op3, op4);
+}
+
 /*
  * End a cdma submit
  * Kick off DMA, add job to the sync queue, and a number of slots to be freed
diff --git a/drivers/gpu/host1x/cdma.h b/drivers/gpu/host1x/cdma.h
index e97e17b82370..ebf6c6a06541 100644
--- a/drivers/gpu/host1x/cdma.h
+++ b/drivers/gpu/host1x/cdma.h
@@ -90,6 +90,8 @@ int host1x_cdma_init(struct host1x_cdma *cdma);
 int host1x_cdma_deinit(struct host1x_cdma *cdma);
 int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job);
 void host1x_cdma_push(struct host1x_cdma *cdma, u32 op1, u32 op2);
+void host1x_cdma_push_wide(struct host1x_cdma *cdma, u32 op1, u32 op2,
+			   u32 op3, u32 op4);
 void host1x_cdma_end(struct host1x_cdma *cdma, struct host1x_job *job);
 void host1x_cdma_update(struct host1x_cdma *cdma);
 void host1x_cdma_peek(struct host1x_cdma *cdma, u32 dmaget, int slot,
diff --git a/include/trace/events/host1x.h b/include/trace/events/host1x.h
index a37ef73092e5..3d340b6f1ea3 100644
--- a/include/trace/events/host1x.h
+++ b/include/trace/events/host1x.h
@@ -80,6 +80,32 @@ TRACE_EVENT(host1x_cdma_push,
 		__entry->name, __entry->op1, __entry->op2)
 );
 
+TRACE_EVENT(host1x_cdma_push_wide,
+	TP_PROTO(const char *name, u32 op1, u32 op2, u32 op3, u32 op4),
+
+	TP_ARGS(name, op1, op2, op3, op4),
+
+	TP_STRUCT__entry(
+		__field(const char *, name)
+		__field(u32, op1)
+		__field(u32, op2)
+		__field(u32, op3)
+		__field(u32, op4)
+	),
+
+	TP_fast_assign(
+		__entry->name = name;
+		__entry->op1 = op1;
+		__entry->op2 = op2;
+		__entry->op3 = op3;
+		__entry->op4 = op4;
+	),
+
+	TP_printk("name=%s, op1=%08x, op2=%08x, op3=%08x op4=%08x",
+		__entry->name, __entry->op1, __entry->op2, __entry->op3,
+		__entry->op4)
+);
+
 TRACE_EVENT(host1x_cdma_push_gather,
 	TP_PROTO(const char *name, struct host1x_bo *bo,
 			u32 words, u32 offset, void *cmdbuf),
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 04/16] gpu: host1x: Support 40-bit addressing
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (2 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 03/16] gpu: host1x: Introduce support for wide opcodes Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra186 and later support 40 bits of address space. Additional
registers need to be programmed to store the full 40 bits of push
buffer addresses.

Since command stream gathers can also reside in buffers in a 40-bit
address space, a new variant of the GATHER opcode is also introduced.
It takes two parameters: the first parameter contains the lower 32
bits of the address and the second parameter contains bits 32 to 39.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/hw/cdma_hw.c           | 32 ++++++++++++++++++-----
 drivers/gpu/host1x/hw/channel_hw.c        | 30 ++++++++++++++++++---
 drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
 drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
 4 files changed, 62 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index ce320534cbed..485aef5761af 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -68,20 +68,31 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
 static void cdma_start(struct host1x_cdma *cdma)
 {
 	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u64 start, end;
 
 	if (cdma->running)
 		return;
 
 	cdma->last_pos = cdma->push_buffer.pos;
+	start = cdma->push_buffer.dma;
+	end = start + cdma->push_buffer.size + 4;
 
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
 			 HOST1X_CHANNEL_DMACTRL);
 
 	/* set base, put and end pointer */
-	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
+#endif
 	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
-	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
-			 HOST1X_CHANNEL_DMAEND);
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, 0, HOST1X_CHANNEL_DMAPUT_HI);
+#endif
+	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
+#endif
 
 	/* reset GET */
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
@@ -104,6 +115,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
 {
 	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u64 start, end;
 
 	if (cdma->running)
 		return;
@@ -113,10 +125,18 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
 			 HOST1X_CHANNEL_DMACTRL);
 
+	start = cdma->push_buffer.dma;
+	end = start + cdma->push_buffer.size + 4;
+
 	/* set base, end pointer (all of memory) */
-	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
-	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size,
-			 HOST1X_CHANNEL_DMAEND);
+	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
+#endif
+	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
+#endif
 
 	/* set GET, by loading the value in PUT (then reset GET) */
 	host1x_ch_writel(ch, getptr, HOST1X_CHANNEL_DMAPUT);
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 384f6ac91afa..73f11a418ee6 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -60,15 +60,37 @@ static void trace_write_gather(struct host1x_cdma *cdma, struct host1x_bo *bo,
 static void submit_gathers(struct host1x_job *job)
 {
 	struct host1x_cdma *cdma = &job->channel->cdma;
+#if HOST1X_HW < 6
+	struct device *dev = job->channel->dev;
+#endif
 	unsigned int i;
 
 	for (i = 0; i < job->num_gathers; i++) {
 		struct host1x_job_gather *g = &job->gathers[i];
-		u32 op1 = host1x_opcode_gather(g->words);
-		u32 op2 = g->base + g->offset;
+		dma_addr_t addr = g->base + g->offset;
+		u32 op2, op3;
+
+		op2 = lower_32_bits(addr);
+		op3 = upper_32_bits(addr);
+
+		trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+		if (op3 != 0) {
+#if HOST1X_HW >= 6
+			u32 op1 = host1x_opcode_gather_wide(g->words);
+			u32 op4 = HOST1X_OPCODE_NOP;
 
-		trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
-		host1x_cdma_push(cdma, op1, op2);
+			host1x_cdma_push_wide(cdma, op1, op2, op3, op4);
+#else
+			dev_err(dev, "invalid gather for push buffer %pad\n",
+				&addr);
+			continue;
+#endif
+		} else {
+			u32 op1 = host1x_opcode_gather(g->words);
+
+			host1x_cdma_push(cdma, op1, op2);
+		}
 	}
 }
 
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index eab753b91f24..dd37b10c8d04 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_gather_wide(unsigned count)
+{
+	return (12 << 28) | count;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
index a79f57dc87bb..9f6da4ee5443 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_gather_wide(unsigned count)
+{
+	return (12 << 28) | count;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (3 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 04/16] gpu: host1x: Support 40-bit addressing Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:46   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

If we use the IOMMU API directly to map buffers into host1x' IOVA space,
we must make sure that the DMA API doesn't already set up a mapping, or
else translation will fail.

The direct DMA API allows us to allocate memory that will not be mapped
through an IOMMU automatically.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/cdma.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index e2d106fa3c0b..a96c4dd1e449 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 
 #include <asm/cacheflush.h>
 #include <linux/device.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-mapping.h>
 #include <linux/host1x.h>
 #include <linux/interrupt.h>
@@ -70,6 +71,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
  */
 static int host1x_pushbuffer_init(struct push_buffer *pb)
 {
+	unsigned long attrs = DMA_ATTR_WRITE_COMBINE;
 	struct host1x_cdma *cdma = pb_to_cdma(pb);
 	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct iova *alloc;
@@ -91,8 +93,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 
 		size = iova_align(&host1x->iova, size);
 
-		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
-					  GFP_KERNEL);
+		pb->mapped = dma_direct_alloc(host1x->dev, size, &pb->phys,
+					      GFP_KERNEL, attrs);
 		if (!pb->mapped)
 			return -ENOMEM;
 
@@ -127,7 +129,10 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 iommu_free_iova:
 	__free_iova(&host1x->iova, alloc);
 iommu_free_mem:
-	dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
+	if (host1x->domain)
+		dma_direct_free(host1x->dev, size, pb->mapped, pb->phys, attrs);
+	else
+		dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
 
 	return err;
 }
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (4 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:47   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 07/16] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that is not addressable by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 4c044ee54fe6..544b67f2b3ff 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -283,6 +283,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host->group = iommu_group_get(&pdev->dev);
 	if (host->group) {
 		struct iommu_domain_geometry *geometry;
+		u64 mask = dma_get_mask(host->dev);
+		dma_addr_t start, end;
 		unsigned long order;
 
 		err = iova_cache_get();
@@ -310,11 +312,12 @@ static int host1x_probe(struct platform_device *pdev)
 		}
 
 		geometry = &host->domain->geometry;
+		start = geometry->aperture_start & mask;
+		end = geometry->aperture_end & mask;
 
 		order = __ffs(host->domain->pgsize_bitmap);
-		init_iova_domain(&host->iova, 1UL << order,
-				 geometry->aperture_start >> order);
-		host->iova_end = geometry->aperture_end;
+		init_iova_domain(&host->iova, 1UL << order, start >> order);
+		host->iova_end = end;
 	}
 
 skip_iommu:
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 07/16] gpu: host1x: Support 40-bit addressing on Tegra186
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (5 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND Thierry Reding
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The host1x and clients instantiated on Tegra186 support addressing 40
bits of memory.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 544b67f2b3ff..ee3c7b81a29d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -136,7 +136,7 @@ static const struct host1x_info host1x06_info = {
 	.nb_bases = 16,
 	.init = host1x06_init,
 	.sync_offset = 0x0,
-	.dma_mask = DMA_BIT_MASK(34),
+	.dma_mask = DMA_BIT_MASK(40),
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
 	.sid_table = tegra186_sid_table,
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (6 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 07/16] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:37   ` Dmitry Osipenko
  2019-02-01 14:47   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage Thierry Reding
                   ` (8 subsequent siblings)
  16 siblings, 2 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
absolute address. This can cause SMMU faults if the CDMA fetches past a
pushbuffer's IOMMU mapping.

Properly setting the DMAEND prevents the CDMA from fetching beyond that
address and avoid such issues. This is currently not observed because a
whole (almost) page of essentially scratch space absorbs any excessive
prefetching by CDMA. However, changing the number of slots in the push
buffer can trigger these SMMU faults.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index 485aef5761af..a24c090ac96f 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
 
 	cdma->last_pos = cdma->push_buffer.pos;
 	start = cdma->push_buffer.dma;
-	end = start + cdma->push_buffer.size + 4;
+	end = cdma->push_buffer.size + 4;
 
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
 			 HOST1X_CHANNEL_DMACTRL);
@@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
 			 HOST1X_CHANNEL_DMACTRL);
 
 	start = cdma->push_buffer.dma;
-	end = start + cdma->push_buffer.size + 4;
+	end = cdma->push_buffer.size + 4;
 
 	/* set base, end pointer (all of memory) */
 	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (7 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:48   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The host1x CDMA push buffer is terminated by a special opcode (RESTART)
that tells the CDMA to wrap around to the beginning of the push buffer.
To accomodate the RESTART opcode, an extra 4 bytes are allocated on top
of the 512 * 8 = 4096 bytes needed for the 512 slots (1 slot = 2 words)
that are used for other commands passed to CDMA. This requires that two
memory pages are allocated, but most of the second page (4092 bytes) is
never used.

Decrease the number of slots to 511 so that the RESTART opcode fits
within the page. Adjust the push buffer wraparound code to take into
account push buffer sizes that are not a power of two.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/cdma.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index a96c4dd1e449..50c1370b56c7 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -42,7 +42,17 @@
  * means that the push buffer is full, not empty.
  */
 
-#define HOST1X_PUSHBUFFER_SLOTS	512
+/*
+ * Typically the commands written into the push buffer are a pair of words. We
+ * use slots to represent each of these pairs and to simplify things. Note the
+ * strange number of slots allocated here. 512 slots will fit exactly within a
+ * single memory page. We also need one additional word at the end of the push
+ * buffer for the RESTART opcode that will instruct the CDMA to jump back to
+ * the beginning of the push buffer. With 512 slots, this means that we'll use
+ * 2 memory pages and waste 4092 bytes of the second page that will never be
+ * used.
+ */
+#define HOST1X_PUSHBUFFER_SLOTS	511
 
 /*
  * Clean up push buffer resources
@@ -148,7 +158,10 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
 	WARN_ON(pb->pos == pb->fence);
 	*(p++) = op1;
 	*(p++) = op2;
-	pb->pos = (pb->pos + 8) & (pb->size - 1);
+	pb->pos += 8;
+
+	if (pb->pos >= pb->size)
+		pb->pos -= pb->size;
 }
 
 /*
@@ -158,7 +171,10 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
 static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
 {
 	/* Advance the next write position */
-	pb->fence = (pb->fence + slots * 8) & (pb->size - 1);
+	pb->fence += slots * 8;
+
+	if (pb->fence >= pb->size)
+		pb->fence -= pb->size;
 }
 
 /*
@@ -166,7 +182,12 @@ static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
  */
 static u32 host1x_pushbuffer_space(struct push_buffer *pb)
 {
-	return ((pb->fence - pb->pos) & (pb->size - 1)) / 8;
+	unsigned int fence = pb->fence;
+
+	if (pb->fence < pb->pos)
+		fence += pb->size;
+
+	return (fence - pb->pos) / 8;
 }
 
 /*
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (8 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:48   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 11/16] drm/tegra: vic: Load firmware on demand Thierry Reding
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra DRM clients need access to their parent, so store a pointer to it
upon registration. It's technically possible to get at this by going via
the host1x client's parent and getting the driver data, but that's quite
complicated and not very transparent. It's much more straightforward and
natural to let the children know about their parent.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/gpu/drm/tegra/drm.c | 2 ++
 drivers/gpu/drm/tegra/drm.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 4b70ce664c41..61dcbd218ffc 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra,
 {
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
+	client->drm = tegra;
 	mutex_unlock(&tegra->clients_lock);
 
 	return 0;
@@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 {
 	mutex_lock(&tegra->clients_lock);
 	list_del_init(&client->list);
+	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index 7370f7a0fdb8..70154c253d45 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 struct tegra_drm_client {
 	struct host1x_client base;
 	struct list_head list;
+	struct tegra_drm *drm;
 
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 11/16] drm/tegra: vic: Load firmware on demand
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (9 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Loading the firmware requires an allocation of IOVA space to make sure
that the VIC's Falcon microcontroller can read the firmware if address
translation via the SMMU is enabled.

However, the allocation currently happens at a time where the geometry
of an IOMMU domain may not have been initialized yet. This happens for
example on Tegra186 and later where an ARM SMMU is used. Domains which
are created by the ARM SMMU driver postpone the geometry setup until a
device is attached to the domain. This is because IOMMU domains aren't
attached to a specific IOMMU instance at allocation time and hence the
input address space, which defines the geometry, is not known yet.

Work around this by postponing the firmware load until it is needed at
the time where a channel is opened to the VIC. At this time the shared
IOMMU domain's geometry has been properly initialized.

As a byproduct this allows the Tegra DRM to be created in the absence
of VIC firmware, since the VIC initialization no longer fails if the
firmware can't be found.

Based on an earlier patch by Dmitry Osipenko <digetx@gmail.com>.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
---
 drivers/gpu/drm/tegra/vic.c | 53 +++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index d47983deb1cf..739b894661ab 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client)
 		vic->domain = tegra->domain;
 	}
 
-	if (!vic->falcon.data) {
-		vic->falcon.data = tegra;
-		err = falcon_load_firmware(&vic->falcon);
-		if (err < 0)
-			goto detach;
-	}
-
 	vic->channel = host1x_channel_request(client->dev);
 	if (!vic->channel) {
 		err = -ENOMEM;
@@ -246,6 +239,30 @@ static const struct host1x_client_ops vic_client_ops = {
 	.exit = vic_exit,
 };
 
+static int vic_load_firmware(struct vic *vic)
+{
+	int err;
+
+	if (vic->falcon.data)
+		return 0;
+
+	vic->falcon.data = vic->client.drm;
+
+	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
+	if (err < 0)
+		goto cleanup;
+
+	err = falcon_load_firmware(&vic->falcon);
+	if (err < 0)
+		goto cleanup;
+
+	return 0;
+
+cleanup:
+	vic->falcon.data = NULL;
+	return err;
+}
+
 static int vic_open_channel(struct tegra_drm_client *client,
 			    struct tegra_drm_context *context)
 {
@@ -256,19 +273,25 @@ static int vic_open_channel(struct tegra_drm_client *client,
 	if (err < 0)
 		return err;
 
+	err = vic_load_firmware(vic);
+	if (err < 0)
+		goto rpm_put;
+
 	err = vic_boot(vic);
-	if (err < 0) {
-		pm_runtime_put(vic->dev);
-		return err;
-	}
+	if (err < 0)
+		goto rpm_put;
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		pm_runtime_put(vic->dev);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto rpm_put;
 	}
 
 	return 0;
+
+rpm_put:
+	pm_runtime_put(vic->dev);
+	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
@@ -372,10 +395,6 @@ static int vic_probe(struct platform_device *pdev)
 	if (err < 0)
 		return err;
 
-	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
-	if (err < 0)
-		goto exit_falcon;
-
 	platform_set_drvdata(pdev, vic);
 
 	INIT_LIST_HEAD(&vic->client.base.list);
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (10 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 11/16] drm/tegra: vic: Load firmware on demand Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:48   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Move initialization of the shared IOMMU domain after the host1x device
has been initialized. At this point all the Tegra DRM clients have been
attached to the shared IOMMU domain.

This is important because Tegra186 and later use an ARM SMMU, for which
the driver defers setting up the geometry for a domain until a device is
attached to it. This is to ensure that the domain is properly set up for
a specific ARM SMMU instance, which is unknown at allocation time.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 61dcbd218ffc..271c7a5fc954 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 		return -ENOMEM;
 
 	if (iommu_present(&platform_bus_type)) {
-		u64 carveout_start, carveout_end, gem_start, gem_end;
-		struct iommu_domain_geometry *geometry;
-		unsigned long order;
-
 		tegra->domain = iommu_domain_alloc(&platform_bus_type);
 		if (!tegra->domain) {
 			err = -ENOMEM;
@@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 		err = iova_cache_get();
 		if (err < 0)
 			goto domain;
-
-		geometry = &tegra->domain->geometry;
-		gem_start = geometry->aperture_start;
-		gem_end = geometry->aperture_end - CARVEOUT_SZ;
-		carveout_start = gem_end + 1;
-		carveout_end = geometry->aperture_end;
-
-		order = __ffs(tegra->domain->pgsize_bitmap);
-		init_iova_domain(&tegra->carveout.domain, 1UL << order,
-				 carveout_start >> order);
-
-		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
-		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
-
-		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
-		mutex_init(&tegra->mm_lock);
-
-		DRM_DEBUG("IOMMU apertures:\n");
-		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
-		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
-			  carveout_end);
 	}
 
 	mutex_init(&tegra->clients_lock);
@@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 	if (err < 0)
 		goto fbdev;
 
+	if (tegra->domain) {
+		u64 carveout_start, carveout_end, gem_start, gem_end;
+		dma_addr_t start, end;
+		unsigned long order;
+
+		start = tegra->domain->geometry.aperture_start;
+		end = tegra->domain->geometry.aperture_end;
+
+		gem_start = start;
+		gem_end = end - CARVEOUT_SZ;
+		carveout_start = gem_end + 1;
+		carveout_end = end;
+
+		order = __ffs(tegra->domain->pgsize_bitmap);
+		init_iova_domain(&tegra->carveout.domain, 1UL << order,
+				 carveout_start >> order);
+
+		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
+		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
+
+		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
+		mutex_init(&tegra->mm_lock);
+
+		DRM_DEBUG("IOMMU apertures:\n");
+		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
+		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
+			  carveout_end);
+	}
+
 	if (tegra->hub) {
 		err = tegra_display_hub_prepare(tegra->hub);
 		if (err < 0)
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (11 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:49   ` Dmitry Osipenko
  2019-02-01 13:28 ` [PATCH v3 14/16] drm/tegra: vic: Do not clear driver data Thierry Reding
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that cannot be addressed by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device. Note that, technically, the IOVA space needs to be restricted to
the intersection of the DMA masks for all clients that are attached to
the IOMMU domain. In practice using the DMA mask of the host1x device is
sufficient because all host1x clients share the same DMA mask.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 271c7a5fc954..0c5f1e6a0446 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	if (tegra->domain) {
 		u64 carveout_start, carveout_end, gem_start, gem_end;
+		u64 dma_mask = dma_get_mask(&device->dev);
 		dma_addr_t start, end;
 		unsigned long order;
 
-		start = tegra->domain->geometry.aperture_start;
-		end = tegra->domain->geometry.aperture_end;
+		start = tegra->domain->geometry.aperture_start & dma_mask;
+		end = tegra->domain->geometry.aperture_end & dma_mask;
 
 		gem_start = start;
 		gem_end = end - CARVEOUT_SZ;
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 14/16] drm/tegra: vic: Do not clear driver data
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (12 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 15/16] drm/tegra: vic: Support stream ID register programming Thierry Reding
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Upon driver failure, the driver core will take care of clearing the
driver data, so there's no need to do so explicitly in the driver.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/vic.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 739b894661ab..55a8cc162e9d 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -412,7 +412,6 @@ static int vic_probe(struct platform_device *pdev)
 	err = host1x_client_register(&vic->client.base);
 	if (err < 0) {
 		dev_err(dev, "failed to register host1x client: %d\n", err);
-		platform_set_drvdata(pdev, NULL);
 		goto exit_falcon;
 	}
 
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 15/16] drm/tegra: vic: Support stream ID register programming
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (13 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 14/16] drm/tegra: vic: Do not clear driver data Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 13:28 ` [PATCH v3 16/16] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
  2019-02-01 14:46 ` [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The version of VIC found in Tegra186 and later incorporates improvements
with regards to context isolation. As part of those improvements, stream
ID registers were added that allow to specify separate stream IDs for
the Falcon microcontroller and the VIC memory interface.

While it is possible to also set the stream ID dynamically at runtime to
allow userspace contexts to be completely separated, this commit doesn't
implement that yet. Instead, the static VIC stream ID is programmed when
the Falcon is booted. This ensures that memory accesses by the Falcon or
the VIC are properly translated via the SMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/vic.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/tegra/vic.h |  9 +++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 55a8cc162e9d..aef8c16bfbee 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -26,6 +26,7 @@
 struct vic_config {
 	const char *firmware;
 	unsigned int version;
+	bool supports_sid;
 };
 
 struct vic {
@@ -105,6 +106,22 @@ static int vic_boot(struct vic *vic)
 	if (vic->booted)
 		return 0;
 
+	if (vic->config->supports_sid) {
+		struct iommu_fwspec *spec = dev_iommu_fwspec_get(vic->dev);
+		u32 value;
+
+		value = TRANSCFG_ATT(1, TRANSCFG_SID_FALCON) |
+			TRANSCFG_ATT(0, TRANSCFG_SID_HW);
+		vic_writel(vic, value, VIC_TFBIF_TRANSCFG);
+
+		if (spec->num_ids > 0) {
+			value = spec->ids[0] & 0xffff;
+
+			vic_writel(vic, value, VIC_THI_STREAMID0);
+			vic_writel(vic, value, VIC_THI_STREAMID1);
+		}
+	}
+
 	/* setup clockgating registers */
 	vic_writel(vic, CG_IDLE_CG_DLY_CNT(4) |
 			CG_IDLE_CG_EN |
@@ -314,6 +331,7 @@ static const struct tegra_drm_client_ops vic_ops = {
 static const struct vic_config vic_t124_config = {
 	.firmware = NVIDIA_TEGRA_124_VIC_FIRMWARE,
 	.version = 0x40,
+	.supports_sid = false,
 };
 
 #define NVIDIA_TEGRA_210_VIC_FIRMWARE "nvidia/tegra210/vic04_ucode.bin"
@@ -321,6 +339,7 @@ static const struct vic_config vic_t124_config = {
 static const struct vic_config vic_t210_config = {
 	.firmware = NVIDIA_TEGRA_210_VIC_FIRMWARE,
 	.version = 0x21,
+	.supports_sid = false,
 };
 
 #define NVIDIA_TEGRA_186_VIC_FIRMWARE "nvidia/tegra186/vic04_ucode.bin"
@@ -328,6 +347,7 @@ static const struct vic_config vic_t210_config = {
 static const struct vic_config vic_t186_config = {
 	.firmware = NVIDIA_TEGRA_186_VIC_FIRMWARE,
 	.version = 0x18,
+	.supports_sid = true,
 };
 
 #define NVIDIA_TEGRA_194_VIC_FIRMWARE "nvidia/tegra194/vic.bin"
@@ -335,6 +355,7 @@ static const struct vic_config vic_t186_config = {
 static const struct vic_config vic_t194_config = {
 	.firmware = NVIDIA_TEGRA_194_VIC_FIRMWARE,
 	.version = 0x19,
+	.supports_sid = true,
 };
 
 static const struct of_device_id vic_match[] = {
diff --git a/drivers/gpu/drm/tegra/vic.h b/drivers/gpu/drm/tegra/vic.h
index 21844817a7e1..017584340dd6 100644
--- a/drivers/gpu/drm/tegra/vic.h
+++ b/drivers/gpu/drm/tegra/vic.h
@@ -17,11 +17,20 @@
 
 /* VIC registers */
 
+#define VIC_THI_STREAMID0	0x00000030
+#define VIC_THI_STREAMID1	0x00000034
+
 #define NV_PVIC_MISC_PRI_VIC_CG			0x000016d0
 #define CG_IDLE_CG_DLY_CNT(val)			((val & 0x3f) << 0)
 #define CG_IDLE_CG_EN				(1 << 6)
 #define CG_WAKEUP_DLY_CNT(val)			((val & 0xf) << 16)
 
+#define VIC_TFBIF_TRANSCFG	0x00002044
+#define  TRANSCFG_ATT(i, v)	(((v) & 0x3) << (i * 4))
+#define  TRANSCFG_SID_HW	0
+#define  TRANSCFG_SID_PHY	1
+#define  TRANSCFG_SID_FALCON	2
+
 /* Firmware offsets */
 
 #define VIC_UCODE_FCE_HEADER_OFFSET		(6*4)
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v3 16/16] arm64: tegra: Enable SMMU for VIC on Tegra186
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (14 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 15/16] drm/tegra: vic: Support stream ID register programming Thierry Reding
@ 2019-02-01 13:28 ` Thierry Reding
  2019-02-01 14:46 ` [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
  16 siblings, 0 replies; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:28 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Enable address translation for VIC via the SMMU on Tegra186.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 66ea7e7c79f5..67b9f8ed3c9b 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -799,6 +799,7 @@
 			reset-names = "vic";
 
 			power-domains = <&bpmp TEGRA186_POWER_DOMAIN_VIC>;
+			iommus = <&smmu TEGRA186_SID_VIC>;
 		};
 
 		dsib: dsi@15400000 {
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:28 ` [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND Thierry Reding
@ 2019-02-01 13:37   ` Dmitry Osipenko
  2019-02-01 13:41     ` Thierry Reding
  2019-02-01 14:47   ` Dmitry Osipenko
  1 sibling, 1 reply; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 13:37 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
> absolute address. This can cause SMMU faults if the CDMA fetches past a
> pushbuffer's IOMMU mapping.
> 
> Properly setting the DMAEND prevents the CDMA from fetching beyond that
> address and avoid such issues. This is currently not observed because a
> whole (almost) page of essentially scratch space absorbs any excessive
> prefetching by CDMA. However, changing the number of slots in the push
> buffer can trigger these SMMU faults.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index 485aef5761af..a24c090ac96f 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
>  
>  	cdma->last_pos = cdma->push_buffer.pos;
>  	start = cdma->push_buffer.dma;
> -	end = start + cdma->push_buffer.size + 4;
> +	end = cdma->push_buffer.size + 4;
>  
>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>  			 HOST1X_CHANNEL_DMACTRL);
> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>  			 HOST1X_CHANNEL_DMACTRL);
>  
>  	start = cdma->push_buffer.dma;
> -	end = start + cdma->push_buffer.size + 4;
> +	end = cdma->push_buffer.size + 4;
>  
>  	/* set base, end pointer (all of memory) */
>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
> 

This seems fixes problem that was added by a previous patch in this series, "gpu: host1x: Support 40-bit addressing". What about to just squash the patches? 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:37   ` Dmitry Osipenko
@ 2019-02-01 13:41     ` Thierry Reding
  2019-02-01 13:48       ` Dmitry Osipenko
  0 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 13:41 UTC (permalink / raw)
  To: Dmitry Osipenko; +Cc: linux-tegra, dri-devel, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 2302 bytes --]

On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:
> 01.02.2019 16:28, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
> > the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
> > absolute address. This can cause SMMU faults if the CDMA fetches past a
> > pushbuffer's IOMMU mapping.
> > 
> > Properly setting the DMAEND prevents the CDMA from fetching beyond that
> > address and avoid such issues. This is currently not observed because a
> > whole (almost) page of essentially scratch space absorbs any excessive
> > prefetching by CDMA. However, changing the number of slots in the push
> > buffer can trigger these SMMU faults.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> > index 485aef5761af..a24c090ac96f 100644
> > --- a/drivers/gpu/host1x/hw/cdma_hw.c
> > +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> > @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
> >  
> >  	cdma->last_pos = cdma->push_buffer.pos;
> >  	start = cdma->push_buffer.dma;
> > -	end = start + cdma->push_buffer.size + 4;
> > +	end = cdma->push_buffer.size + 4;
> >  
> >  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
> >  			 HOST1X_CHANNEL_DMACTRL);
> > @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
> >  			 HOST1X_CHANNEL_DMACTRL);
> >  
> >  	start = cdma->push_buffer.dma;
> > -	end = start + cdma->push_buffer.size + 4;
> > +	end = cdma->push_buffer.size + 4;
> >  
> >  	/* set base, end pointer (all of memory) */
> >  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
> > 
> 
> This seems fixes problem that was added by a previous patch in this
> series, "gpu: host1x: Support 40-bit addressing". What about to just
> squash the patches? 

This actually fixes a bug that's always been there. This just happens to
touch the same lines as an earlier patch as a result of some refactoring
that the earlier patch did.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:41     ` Thierry Reding
@ 2019-02-01 13:48       ` Dmitry Osipenko
  2019-02-01 14:10         ` Thierry Reding
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 13:48 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:41, Thierry Reding пишет:
> On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:
>> 01.02.2019 16:28, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
>>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
>>> absolute address. This can cause SMMU faults if the CDMA fetches past a
>>> pushbuffer's IOMMU mapping.
>>>
>>> Properly setting the DMAEND prevents the CDMA from fetching beyond that
>>> address and avoid such issues. This is currently not observed because a
>>> whole (almost) page of essentially scratch space absorbs any excessive
>>> prefetching by CDMA. However, changing the number of slots in the push
>>> buffer can trigger these SMMU faults.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
>>> index 485aef5761af..a24c090ac96f 100644
>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
>>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
>>>  
>>>  	cdma->last_pos = cdma->push_buffer.pos;
>>>  	start = cdma->push_buffer.dma;
>>> -	end = start + cdma->push_buffer.size + 4;
>>> +	end = cdma->push_buffer.size + 4;
>>>  
>>>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>>>  			 HOST1X_CHANNEL_DMACTRL);
>>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>>>  			 HOST1X_CHANNEL_DMACTRL);
>>>  
>>>  	start = cdma->push_buffer.dma;
>>> -	end = start + cdma->push_buffer.size + 4;
>>> +	end = cdma->push_buffer.size + 4;
>>>  
>>>  	/* set base, end pointer (all of memory) */
>>>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
>>>
>>
>> This seems fixes problem that was added by a previous patch in this
>> series, "gpu: host1x: Support 40-bit addressing". What about to just
>> squash the patches? 
> 
> This actually fixes a bug that's always been there. This just happens to
> touch the same lines as an earlier patch as a result of some refactoring
> that the earlier patch did.

Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite difficult to spot that bug by looking at the code, good that it got caught. Was this bug triggered by trying to move IOVA allocation to the end of the AS?

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:48       ` Dmitry Osipenko
@ 2019-02-01 14:10         ` Thierry Reding
  2019-02-01 14:40           ` Dmitry Osipenko
  0 siblings, 1 reply; 30+ messages in thread
From: Thierry Reding @ 2019-02-01 14:10 UTC (permalink / raw)
  To: Dmitry Osipenko; +Cc: linux-tegra, dri-devel, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 4119 bytes --]

On Fri, Feb 01, 2019 at 04:48:56PM +0300, Dmitry Osipenko wrote:
> 01.02.2019 16:41, Thierry Reding пишет:
> > On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:
> >> 01.02.2019 16:28, Thierry Reding пишет:
> >>> From: Thierry Reding <treding@nvidia.com>
> >>>
> >>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
> >>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
> >>> absolute address. This can cause SMMU faults if the CDMA fetches past a
> >>> pushbuffer's IOMMU mapping.
> >>>
> >>> Properly setting the DMAEND prevents the CDMA from fetching beyond that
> >>> address and avoid such issues. This is currently not observed because a
> >>> whole (almost) page of essentially scratch space absorbs any excessive
> >>> prefetching by CDMA. However, changing the number of slots in the push
> >>> buffer can trigger these SMMU faults.
> >>>
> >>> Signed-off-by: Thierry Reding <treding@nvidia.com>
> >>> ---
> >>>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> >>> index 485aef5761af..a24c090ac96f 100644
> >>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> >>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> >>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
> >>>  
> >>>  	cdma->last_pos = cdma->push_buffer.pos;
> >>>  	start = cdma->push_buffer.dma;
> >>> -	end = start + cdma->push_buffer.size + 4;
> >>> +	end = cdma->push_buffer.size + 4;
> >>>  
> >>>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
> >>>  			 HOST1X_CHANNEL_DMACTRL);
> >>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
> >>>  			 HOST1X_CHANNEL_DMACTRL);
> >>>  
> >>>  	start = cdma->push_buffer.dma;
> >>> -	end = start + cdma->push_buffer.size + 4;
> >>> +	end = cdma->push_buffer.size + 4;
> >>>  
> >>>  	/* set base, end pointer (all of memory) */
> >>>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
> >>>
> >>
> >> This seems fixes problem that was added by a previous patch in this
> >> series, "gpu: host1x: Support 40-bit addressing". What about to just
> >> squash the patches? 
> > 
> > This actually fixes a bug that's always been there. This just happens to
> > touch the same lines as an earlier patch as a result of some refactoring
> > that the earlier patch did.
> 
> Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite
> difficult to spot that bug by looking at the code, good that it got
> caught. Was this bug triggered by trying to move IOVA allocation to
> the end of the AS?

This was actually triggered because I noticed that we were using 512
slots in the push buffer, which means 4096 bytes, but then we needed
that extra 4 bytes for the RESTART opcode, which means that we're
currently allocating 8192 bytes of which only 4092 are being used.

So I thought: "Well, we should be able to live with 511 slots per push
buffer and save a full memory page. This is an easy optimization!" Boy
was I wrong... After making that change I started to see SMMU faults
for the address immediately after the push buffer mapping. I think the
same error happens regardless of where the push buffer is located. The
reason for the SMMU faults seem to be that CDMA happily keeps on pre-
fetching (a lot of) commands before it wraps around because of the
RESTART opcode. DMAEND is designed as a mechanism to prevent CDMA from
excessively fetching commands, but if you program a really large value
into it, it'll add that really large value to the DMASTART as offset
and the mechanism doesn't work anymore.

We don't currently see this because the 4092 bytes in the "scratch"
page of the push buffer allocation are enough to absorb the prefetching
that CDMA does.

We would also likely never see it happen in non-SMMU cases because the
CDMA would just keep on prefetching whatever memory happened to be after
the push buffer.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 14:10         ` Thierry Reding
@ 2019-02-01 14:40           ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:40 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 17:10, Thierry Reding пишет:
> On Fri, Feb 01, 2019 at 04:48:56PM +0300, Dmitry Osipenko wrote:
>> 01.02.2019 16:41, Thierry Reding пишет:
>>> On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:
>>>> 01.02.2019 16:28, Thierry Reding пишет:
>>>>> From: Thierry Reding <treding@nvidia.com>
>>>>>
>>>>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
>>>>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
>>>>> absolute address. This can cause SMMU faults if the CDMA fetches past a
>>>>> pushbuffer's IOMMU mapping.
>>>>>
>>>>> Properly setting the DMAEND prevents the CDMA from fetching beyond that
>>>>> address and avoid such issues. This is currently not observed because a
>>>>> whole (almost) page of essentially scratch space absorbs any excessive
>>>>> prefetching by CDMA. However, changing the number of slots in the push
>>>>> buffer can trigger these SMMU faults.
>>>>>
>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>>> ---
>>>>>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
>>>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> index 485aef5761af..a24c090ac96f 100644
>>>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
>>>>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
>>>>>  
>>>>>  	cdma->last_pos = cdma->push_buffer.pos;
>>>>>  	start = cdma->push_buffer.dma;
>>>>> -	end = start + cdma->push_buffer.size + 4;
>>>>> +	end = cdma->push_buffer.size + 4;
>>>>>  
>>>>>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>>>>>  			 HOST1X_CHANNEL_DMACTRL);
>>>>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>>>>>  			 HOST1X_CHANNEL_DMACTRL);
>>>>>  
>>>>>  	start = cdma->push_buffer.dma;
>>>>> -	end = start + cdma->push_buffer.size + 4;
>>>>> +	end = cdma->push_buffer.size + 4;
>>>>>  
>>>>>  	/* set base, end pointer (all of memory) */
>>>>>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
>>>>>
>>>>
>>>> This seems fixes problem that was added by a previous patch in this
>>>> series, "gpu: host1x: Support 40-bit addressing". What about to just
>>>> squash the patches? 
>>>
>>> This actually fixes a bug that's always been there. This just happens to
>>> touch the same lines as an earlier patch as a result of some refactoring
>>> that the earlier patch did.
>>
>> Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite
>> difficult to spot that bug by looking at the code, good that it got
>> caught. Was this bug triggered by trying to move IOVA allocation to
>> the end of the AS?
> 
> This was actually triggered because I noticed that we were using 512
> slots in the push buffer, which means 4096 bytes, but then we needed
> that extra 4 bytes for the RESTART opcode, which means that we're
> currently allocating 8192 bytes of which only 4092 are being used.
> 
> So I thought: "Well, we should be able to live with 511 slots per push
> buffer and save a full memory page. This is an easy optimization!" Boy
> was I wrong... After making that change I started to see SMMU faults
> for the address immediately after the push buffer mapping. I think the
> same error happens regardless of where the push buffer is located. The
> reason for the SMMU faults seem to be that CDMA happily keeps on pre-
> fetching (a lot of) commands before it wraps around because of the
> RESTART opcode. DMAEND is designed as a mechanism to prevent CDMA from
> excessively fetching commands, but if you program a really large value
> into it, it'll add that really large value to the DMASTART as offset
> and the mechanism doesn't work anymore.
> 
> We don't currently see this because the 4092 bytes in the "scratch"
> page of the push buffer allocation are enough to absorb the prefetching
> that CDMA does.
> 
> We would also likely never see it happen in non-SMMU cases because the
> CDMA would just keep on prefetching whatever memory happened to be after
> the push buffer.

Yeah, that's what usually happens when the code is getting improved. Good job! :)
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (15 preceding siblings ...)
  2019-02-01 13:28 ` [PATCH v3 16/16] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
@ 2019-02-01 14:46 ` Dmitry Osipenko
  16 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:46 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra186 and later are different from earlier generations in that they
> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
> slightly differently in that the geometry for IOMMU domains is set only
> after a device was attached to it. This is to make sure that the SMMU
> instance that the domain belongs to is known, because each instance can
> have a different input address space (i.e. geometry).
> 
> Work around this by moving all IOVA allocations to a point where the
> geometry of the domain is properly initialized.
> 
> This second version of the series addresses all review comments and adds
> a number of patches that will actually allow host1x to work with an SMMU
> enabled on Tegra186. The patches also add programming required to
> address the full 40 bits of address space.
> 
> The third version of the series fixes the 40-bit addressing code by
> making sure that wide opcodes are always written atomically to the push
> buffer. Another pair of patches are introduced to fix a long-standing
> bug where the HOST1X_CHANNEL_DMAEND register wasn't properly programmed
> and one push buffer memory optimization that avoid wasting almost one
> full memory page per push buffer.
> 
> This supersedes the following patch:
> 
> 	https://patchwork.kernel.org/patch/10775579/
> 
> Thierry
> 
> Thierry Reding (16):
>   gpu: host1x: Set up stream ID table
>   gpu: host1x: Program the channel stream ID
>   gpu: host1x: Introduce support for wide opcodes
>   gpu: host1x: Support 40-bit addressing
>   gpu: host1x: Use direct DMA with IOMMU API usage
>   gpu: host1x: Restrict IOVA space to DMA mask
>   gpu: host1x: Support 40-bit addressing on Tegra186
>   gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
>   gpu: host1x: Optimize CDMA push buffer memory usage
>   drm/tegra: Store parent pointer in Tegra DRM clients
>   drm/tegra: vic: Load firmware on demand
>   drm/tegra: Setup shared IOMMU domain after initialization
>   drm/tegra: Restrict IOVA space to DMA mask
>   drm/tegra: vic: Do not clear driver data
>   drm/tegra: vic: Support stream ID register programming
>   arm64: tegra: Enable SMMU for VIC on Tegra186
> 
>  arch/arm64/boot/dts/nvidia/tegra186.dtsi    |   1 +
>  drivers/gpu/drm/tegra/drm.c                 |  57 +++++----
>  drivers/gpu/drm/tegra/drm.h                 |   1 +
>  drivers/gpu/drm/tegra/vic.c                 |  75 ++++++++---
>  drivers/gpu/drm/tegra/vic.h                 |   9 ++
>  drivers/gpu/host1x/cdma.c                   | 132 ++++++++++++++++++--
>  drivers/gpu/host1x/cdma.h                   |   2 +
>  drivers/gpu/host1x/dev.c                    |  49 +++++++-
>  drivers/gpu/host1x/dev.h                    |   8 ++
>  drivers/gpu/host1x/hw/cdma_hw.c             |  32 ++++-
>  drivers/gpu/host1x/hw/channel_hw.c          |  42 ++++++-
>  drivers/gpu/host1x/hw/host1x06_hardware.h   |   6 +
>  drivers/gpu/host1x/hw/host1x07_hardware.h   |   6 +
>  drivers/gpu/host1x/hw/hw_host1x06_channel.h |  11 ++
>  drivers/gpu/host1x/hw/hw_host1x07_channel.h |  11 ++
>  include/trace/events/host1x.h               |  26 ++++
>  16 files changed, 404 insertions(+), 64 deletions(-)
>  create mode 100644 drivers/gpu/host1x/hw/hw_host1x06_channel.h
>  create mode 100644 drivers/gpu/host1x/hw/hw_host1x07_channel.h
> 

I gave a test to this series on T20 and T30. Opentegra works, grate tests work, glxgears are spinning.. everything working fine.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage
  2019-02-01 13:28 ` [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
@ 2019-02-01 14:46   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:46 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> If we use the IOMMU API directly to map buffers into host1x' IOVA space,
> we must make sure that the DMA API doesn't already set up a mapping, or
> else translation will fail.
> 
> The direct DMA API allows us to allocate memory that will not be mapped
> through an IOMMU automatically.
> 
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/cdma.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index e2d106fa3c0b..a96c4dd1e449 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -19,6 +19,7 @@
>  
>  #include <asm/cacheflush.h>
>  #include <linux/device.h>
> +#include <linux/dma-direct.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/host1x.h>
>  #include <linux/interrupt.h>
> @@ -70,6 +71,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
>   */
>  static int host1x_pushbuffer_init(struct push_buffer *pb)
>  {
> +	unsigned long attrs = DMA_ATTR_WRITE_COMBINE;
>  	struct host1x_cdma *cdma = pb_to_cdma(pb);
>  	struct host1x *host1x = cdma_to_host1x(cdma);
>  	struct iova *alloc;
> @@ -91,8 +93,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
>  
>  		size = iova_align(&host1x->iova, size);
>  
> -		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
> -					  GFP_KERNEL);
> +		pb->mapped = dma_direct_alloc(host1x->dev, size, &pb->phys,
> +					      GFP_KERNEL, attrs);
>  		if (!pb->mapped)
>  			return -ENOMEM;
>  
> @@ -127,7 +129,10 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
>  iommu_free_iova:
>  	__free_iova(&host1x->iova, alloc);
>  iommu_free_mem:
> -	dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
> +	if (host1x->domain)
> +		dma_direct_free(host1x->dev, size, pb->mapped, pb->phys, attrs);
> +	else
> +		dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
>  
>  	return err;
>  }
> 

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask
  2019-02-01 13:28 ` [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-02-01 14:47   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:47 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> On Tegra186 and later, the ARM SMMU provides an input address space that
> is 48 bits wide. However, memory clients can only address up to 40 bits.
> If the geometry is used as-is, allocations of IOVA space can end up in a
> region that is not addressable by the memory clients.
> 
> To fix this, restrict the IOVA space to the DMA mask of the host1x
> device.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/dev.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index 4c044ee54fe6..544b67f2b3ff 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -283,6 +283,8 @@ static int host1x_probe(struct platform_device *pdev)
>  	host->group = iommu_group_get(&pdev->dev);
>  	if (host->group) {
>  		struct iommu_domain_geometry *geometry;
> +		u64 mask = dma_get_mask(host->dev);
> +		dma_addr_t start, end;
>  		unsigned long order;
>  
>  		err = iova_cache_get();
> @@ -310,11 +312,12 @@ static int host1x_probe(struct platform_device *pdev)
>  		}
>  
>  		geometry = &host->domain->geometry;
> +		start = geometry->aperture_start & mask;
> +		end = geometry->aperture_end & mask;
>  
>  		order = __ffs(host->domain->pgsize_bitmap);
> -		init_iova_domain(&host->iova, 1UL << order,
> -				 geometry->aperture_start >> order);
> -		host->iova_end = geometry->aperture_end;
> +		init_iova_domain(&host->iova, 1UL << order, start >> order);
> +		host->iova_end = end;
>  	}
>  
>  skip_iommu:
> 

For older Tegra's:

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND
  2019-02-01 13:28 ` [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND Thierry Reding
  2019-02-01 13:37   ` Dmitry Osipenko
@ 2019-02-01 14:47   ` Dmitry Osipenko
  1 sibling, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:47 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
> absolute address. This can cause SMMU faults if the CDMA fetches past a
> pushbuffer's IOMMU mapping.
> 
> Properly setting the DMAEND prevents the CDMA from fetching beyond that
> address and avoid such issues. This is currently not observed because a
> whole (almost) page of essentially scratch space absorbs any excessive
> prefetching by CDMA. However, changing the number of slots in the push
> buffer can trigger these SMMU faults.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index 485aef5761af..a24c090ac96f 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)
>  
>  	cdma->last_pos = cdma->push_buffer.pos;
>  	start = cdma->push_buffer.dma;
> -	end = start + cdma->push_buffer.size + 4;
> +	end = cdma->push_buffer.size + 4;
>  
>  	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>  			 HOST1X_CHANNEL_DMACTRL);
> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)
>  			 HOST1X_CHANNEL_DMACTRL);
>  
>  	start = cdma->push_buffer.dma;
> -	end = start + cdma->push_buffer.size + 4;
> +	end = cdma->push_buffer.size + 4;
>  
>  	/* set base, end pointer (all of memory) */
>  	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
> 

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage
  2019-02-01 13:28 ` [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage Thierry Reding
@ 2019-02-01 14:48   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:48 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> The host1x CDMA push buffer is terminated by a special opcode (RESTART)
> that tells the CDMA to wrap around to the beginning of the push buffer.
> To accomodate the RESTART opcode, an extra 4 bytes are allocated on top
> of the 512 * 8 = 4096 bytes needed for the 512 slots (1 slot = 2 words)
> that are used for other commands passed to CDMA. This requires that two
> memory pages are allocated, but most of the second page (4092 bytes) is
> never used.
> 
> Decrease the number of slots to 511 so that the RESTART opcode fits
> within the page. Adjust the push buffer wraparound code to take into
> account push buffer sizes that are not a power of two.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/cdma.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index a96c4dd1e449..50c1370b56c7 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -42,7 +42,17 @@
>   * means that the push buffer is full, not empty.
>   */
>  
> -#define HOST1X_PUSHBUFFER_SLOTS	512
> +/*
> + * Typically the commands written into the push buffer are a pair of words. We
> + * use slots to represent each of these pairs and to simplify things. Note the
> + * strange number of slots allocated here. 512 slots will fit exactly within a
> + * single memory page. We also need one additional word at the end of the push
> + * buffer for the RESTART opcode that will instruct the CDMA to jump back to
> + * the beginning of the push buffer. With 512 slots, this means that we'll use
> + * 2 memory pages and waste 4092 bytes of the second page that will never be
> + * used.
> + */
> +#define HOST1X_PUSHBUFFER_SLOTS	511
>  
>  /*
>   * Clean up push buffer resources
> @@ -148,7 +158,10 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
>  	WARN_ON(pb->pos == pb->fence);
>  	*(p++) = op1;
>  	*(p++) = op2;
> -	pb->pos = (pb->pos + 8) & (pb->size - 1);
> +	pb->pos += 8;
> +
> +	if (pb->pos >= pb->size)
> +		pb->pos -= pb->size;
>  }
>  
>  /*
> @@ -158,7 +171,10 @@ static void host1x_pushbuffer_push(struct push_buffer *pb, u32 op1, u32 op2)
>  static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
>  {
>  	/* Advance the next write position */
> -	pb->fence = (pb->fence + slots * 8) & (pb->size - 1);
> +	pb->fence += slots * 8;
> +
> +	if (pb->fence >= pb->size)
> +		pb->fence -= pb->size;
>  }
>  
>  /*
> @@ -166,7 +182,12 @@ static void host1x_pushbuffer_pop(struct push_buffer *pb, unsigned int slots)
>   */
>  static u32 host1x_pushbuffer_space(struct push_buffer *pb)
>  {
> -	return ((pb->fence - pb->pos) & (pb->size - 1)) / 8;
> +	unsigned int fence = pb->fence;
> +
> +	if (pb->fence < pb->pos)
> +		fence += pb->size;
> +
> +	return (fence - pb->pos) / 8;
>  }
>  
>  /*
> 

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients
  2019-02-01 13:28 ` [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
@ 2019-02-01 14:48   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:48 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra DRM clients need access to their parent, so store a pointer to it
> upon registration. It's technically possible to get at this by going via
> the host1x client's parent and getting the driver data, but that's quite
> complicated and not very transparent. It's much more straightforward and
> natural to let the children know about their parent.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 2 ++
>  drivers/gpu/drm/tegra/drm.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 4b70ce664c41..61dcbd218ffc 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra,
>  {
>  	mutex_lock(&tegra->clients_lock);
>  	list_add_tail(&client->list, &tegra->clients);
> +	client->drm = tegra;
>  	mutex_unlock(&tegra->clients_lock);
>  
>  	return 0;
> @@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  {
>  	mutex_lock(&tegra->clients_lock);
>  	list_del_init(&client->list);
> +	client->drm = NULL;
>  	mutex_unlock(&tegra->clients_lock);
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index 7370f7a0fdb8..70154c253d45 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  struct tegra_drm_client {
>  	struct host1x_client base;
>  	struct list_head list;
> +	struct tegra_drm *drm;
>  
>  	unsigned int version;
>  	const struct tegra_drm_client_ops *ops;
> 

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization
  2019-02-01 13:28 ` [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
@ 2019-02-01 14:48   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:48 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Move initialization of the shared IOMMU domain after the host1x device
> has been initialized. At this point all the Tegra DRM clients have been
> attached to the shared IOMMU domain.
> 
> This is important because Tegra186 and later use an ARM SMMU, for which
> the driver defers setting up the geometry for a domain until a device is
> attached to it. This is to ensure that the domain is properly set up for
> a specific ARM SMMU instance, which is unknown at allocation time.
> 
> Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++-----------------
>  1 file changed, 29 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 61dcbd218ffc..271c7a5fc954 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  		return -ENOMEM;
>  
>  	if (iommu_present(&platform_bus_type)) {
> -		u64 carveout_start, carveout_end, gem_start, gem_end;
> -		struct iommu_domain_geometry *geometry;
> -		unsigned long order;
> -
>  		tegra->domain = iommu_domain_alloc(&platform_bus_type);
>  		if (!tegra->domain) {
>  			err = -ENOMEM;
> @@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  		err = iova_cache_get();
>  		if (err < 0)
>  			goto domain;
> -
> -		geometry = &tegra->domain->geometry;
> -		gem_start = geometry->aperture_start;
> -		gem_end = geometry->aperture_end - CARVEOUT_SZ;
> -		carveout_start = gem_end + 1;
> -		carveout_end = geometry->aperture_end;
> -
> -		order = __ffs(tegra->domain->pgsize_bitmap);
> -		init_iova_domain(&tegra->carveout.domain, 1UL << order,
> -				 carveout_start >> order);
> -
> -		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
> -		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
> -
> -		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
> -		mutex_init(&tegra->mm_lock);
> -
> -		DRM_DEBUG("IOMMU apertures:\n");
> -		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
> -		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
> -			  carveout_end);
>  	}
>  
>  	mutex_init(&tegra->clients_lock);
> @@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  	if (err < 0)
>  		goto fbdev;
>  
> +	if (tegra->domain) {
> +		u64 carveout_start, carveout_end, gem_start, gem_end;
> +		dma_addr_t start, end;
> +		unsigned long order;
> +
> +		start = tegra->domain->geometry.aperture_start;
> +		end = tegra->domain->geometry.aperture_end;
> +
> +		gem_start = start;
> +		gem_end = end - CARVEOUT_SZ;
> +		carveout_start = gem_end + 1;
> +		carveout_end = end;
> +
> +		order = __ffs(tegra->domain->pgsize_bitmap);
> +		init_iova_domain(&tegra->carveout.domain, 1UL << order,
> +				 carveout_start >> order);
> +
> +		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
> +		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
> +
> +		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
> +		mutex_init(&tegra->mm_lock);
> +
> +		DRM_DEBUG("IOMMU apertures:\n");
> +		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
> +		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
> +			  carveout_end);
> +	}
> +
>  	if (tegra->hub) {
>  		err = tegra_display_hub_prepare(tegra->hub);
>  		if (err < 0)
> 

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask
  2019-02-01 13:28 ` [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-02-01 14:49   ` Dmitry Osipenko
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Osipenko @ 2019-02-01 14:49 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

01.02.2019 16:28, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> On Tegra186 and later, the ARM SMMU provides an input address space that
> is 48 bits wide. However, memory clients can only address up to 40 bits.
> If the geometry is used as-is, allocations of IOVA space can end up in a
> region that cannot be addressed by the memory clients.
> 
> To fix this, restrict the IOVA space to the DMA mask of the host1x
> device. Note that, technically, the IOVA space needs to be restricted to
> the intersection of the DMA masks for all clients that are attached to
> the IOMMU domain. In practice using the DMA mask of the host1x device is
> sufficient because all host1x clients share the same DMA mask.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/drm/tegra/drm.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 271c7a5fc954..0c5f1e6a0446 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
>  
>  	if (tegra->domain) {
>  		u64 carveout_start, carveout_end, gem_start, gem_end;
> +		u64 dma_mask = dma_get_mask(&device->dev);
>  		dma_addr_t start, end;
>  		unsigned long order;
>  
> -		start = tegra->domain->geometry.aperture_start;
> -		end = tegra->domain->geometry.aperture_end;
> +		start = tegra->domain->geometry.aperture_start & dma_mask;
> +		end = tegra->domain->geometry.aperture_end & dma_mask;
>  
>  		gem_start = start;
>  		gem_end = end - CARVEOUT_SZ;
> 

For older Tegra's:

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2019-02-01 14:49 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-01 13:28 [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
2019-02-01 13:28 ` [PATCH v3 01/16] gpu: host1x: Set up stream ID table Thierry Reding
2019-02-01 13:28 ` [PATCH v3 02/16] gpu: host1x: Program the channel stream ID Thierry Reding
2019-02-01 13:28 ` [PATCH v3 03/16] gpu: host1x: Introduce support for wide opcodes Thierry Reding
2019-02-01 13:28 ` [PATCH v3 04/16] gpu: host1x: Support 40-bit addressing Thierry Reding
2019-02-01 13:28 ` [PATCH v3 05/16] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
2019-02-01 14:46   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 06/16] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
2019-02-01 14:47   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 07/16] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
2019-02-01 13:28 ` [PATCH v3 08/16] gpu: host1x: Use correct semantics for HOST1X_CHANNEL_DMAEND Thierry Reding
2019-02-01 13:37   ` Dmitry Osipenko
2019-02-01 13:41     ` Thierry Reding
2019-02-01 13:48       ` Dmitry Osipenko
2019-02-01 14:10         ` Thierry Reding
2019-02-01 14:40           ` Dmitry Osipenko
2019-02-01 14:47   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 09/16] gpu: host1x: Optimize CDMA push buffer memory usage Thierry Reding
2019-02-01 14:48   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 10/16] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
2019-02-01 14:48   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 11/16] drm/tegra: vic: Load firmware on demand Thierry Reding
2019-02-01 13:28 ` [PATCH v3 12/16] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
2019-02-01 14:48   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 13/16] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
2019-02-01 14:49   ` Dmitry Osipenko
2019-02-01 13:28 ` [PATCH v3 14/16] drm/tegra: vic: Do not clear driver data Thierry Reding
2019-02-01 13:28 ` [PATCH v3 15/16] drm/tegra: vic: Support stream ID register programming Thierry Reding
2019-02-01 13:28 ` [PATCH v3 16/16] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
2019-02-01 14:46 ` [PATCH v3 00/16] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.