All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
@ 2019-01-24 18:02 Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 01/13] gpu: host1x: Set up stream ID table Thierry Reding
                   ` (14 more replies)
  0 siblings, 15 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra186 and later are different from earlier generations in that they
use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
slightly differently in that the geometry for IOMMU domains is set only
after a device was attached to it. This is to make sure that the SMMU
instance that the domain belongs to is known, because each instance can
have a different input address space (i.e. geometry).

Work around this by moving all IOVA allocations to a point where the
geometry of the domain is properly initialized.

This second version of the series addresses all review comments and adds
a number of patches that will actually allow host1x to work with an SMMU
enabled on Tegra186. The patches also add programming required to
address the full 40 bits of address space.

This supersedes the following patch:

    https://patchwork.kernel.org/patch/10775579/

Thierry

Thierry Reding (13):
  gpu: host1x: Set up stream ID table
  gpu: host1x: Program the channel stream ID
  gpu: host1x: Support 40-bit addressing
  gpu: host1x: Use direct DMA with IOMMU API usage
  gpu: host1x: Restrict IOVA space to DMA mask
  gpu: host1x: Supports 40-bit addressing on Tegra186
  drm/tegra: Store parent pointer in Tegra DRM clients
  drm/tegra: vic: Load firmware on demand
  drm/tegra: Setup shared IOMMU domain after initialization
  drm/tegra: Restrict IOVA space to DMA mask
  drm/tegra: vic: Do not clear driver data
  drm/tegra: vic: Support stream ID register programming
  arm64: tegra: Enable SMMU for VIC on Tegra186

 arch/arm64/boot/dts/nvidia/tegra186.dtsi    |  1 +
 drivers/gpu/drm/tegra/drm.c                 | 57 +++++++++-------
 drivers/gpu/drm/tegra/drm.h                 |  1 +
 drivers/gpu/drm/tegra/vic.c                 | 75 ++++++++++++++++-----
 drivers/gpu/drm/tegra/vic.h                 |  9 +++
 drivers/gpu/host1x/cdma.c                   | 11 ++-
 drivers/gpu/host1x/dev.c                    | 49 ++++++++++++--
 drivers/gpu/host1x/dev.h                    |  8 +++
 drivers/gpu/host1x/hw/cdma_hw.c             | 13 +++-
 drivers/gpu/host1x/hw/channel_hw.c          | 40 +++++++++--
 drivers/gpu/host1x/hw/host1x06_hardware.h   |  6 ++
 drivers/gpu/host1x/hw/host1x07_hardware.h   |  6 ++
 drivers/gpu/host1x/hw/hw_host1x06_channel.h | 11 +++
 drivers/gpu/host1x/hw/hw_host1x07_channel.h | 11 +++
 14 files changed, 241 insertions(+), 57 deletions(-)
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x06_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x07_channel.h

-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH v2 01/13] gpu: host1x: Set up stream ID table
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 02/13] gpu: host1x: Program the channel stream ID Thierry Reding
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

In order to enable the MMIO path stream ID protection provided by the
incarnation of host1x found in Tegra186 and later, the host1x must be
provided with the list of stream ID register offsets for each of its
clients. Some clients (such as VIC) have multiple stream ID registers
that are assumed to be contiguous. The host1x is programmed with the
base offset and a limit which provide the range of registers that the
host1x needs to monitor for writes.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 38 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/host1x/dev.h |  8 ++++++++
 2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 419d8929a98f..4c044ee54fe6 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -120,6 +120,15 @@ static const struct host1x_info host1x05_info = {
 	.dma_mask = DMA_BIT_MASK(34),
 };
 
+static const struct host1x_sid_entry tegra186_sid_table[] = {
+	{
+		/* VIC */
+		.base = 0x1af0,
+		.offset = 0x30,
+		.limit = 0x34
+	},
+};
+
 static const struct host1x_info host1x06_info = {
 	.nb_channels = 63,
 	.nb_pts = 576,
@@ -129,6 +138,17 @@ static const struct host1x_info host1x06_info = {
 	.sync_offset = 0x0,
 	.dma_mask = DMA_BIT_MASK(34),
 	.has_hypervisor = true,
+	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
+	.sid_table = tegra186_sid_table,
+};
+
+static const struct host1x_sid_entry tegra194_sid_table[] = {
+	{
+		/* VIC */
+		.base = 0x1af0,
+		.offset = 0x30,
+		.limit = 0x34
+	},
 };
 
 static const struct host1x_info host1x07_info = {
@@ -140,6 +160,8 @@ static const struct host1x_info host1x07_info = {
 	.sync_offset = 0x0,
 	.dma_mask = DMA_BIT_MASK(40),
 	.has_hypervisor = true,
+	.num_sid_entries = ARRAY_SIZE(tegra194_sid_table),
+	.sid_table = tegra194_sid_table,
 };
 
 static const struct of_device_id host1x_of_match[] = {
@@ -154,6 +176,19 @@ static const struct of_device_id host1x_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, host1x_of_match);
 
+static void host1x_setup_sid_table(struct host1x *host)
+{
+	const struct host1x_info *info = host->info;
+	unsigned int i;
+
+	for (i = 0; i < info->num_sid_entries; i++) {
+		const struct host1x_sid_entry *entry = &info->sid_table[i];
+
+		host1x_hypervisor_writel(host, entry->offset, entry->base);
+		host1x_hypervisor_writel(host, entry->limit, entry->base + 4);
+	}
+}
+
 static int host1x_probe(struct platform_device *pdev)
 {
 	struct host1x *host;
@@ -316,6 +351,9 @@ static int host1x_probe(struct platform_device *pdev)
 
 	host1x_debug_init(host);
 
+	if (host->info->has_hypervisor)
+		host1x_setup_sid_table(host);
+
 	err = host1x_register(host);
 	if (err < 0)
 		goto fail_deinit_intr;
diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
index 36f44ffebe73..05216a7e4830 100644
--- a/drivers/gpu/host1x/dev.h
+++ b/drivers/gpu/host1x/dev.h
@@ -94,6 +94,12 @@ struct host1x_intr_ops {
 	int (*free_syncpt_irq)(struct host1x *host);
 };
 
+struct host1x_sid_entry {
+	unsigned int base;
+	unsigned int offset;
+	unsigned int limit;
+};
+
 struct host1x_info {
 	unsigned int nb_channels; /* host1x: number of channels supported */
 	unsigned int nb_pts; /* host1x: number of syncpoints supported */
@@ -103,6 +109,8 @@ struct host1x_info {
 	unsigned int sync_offset; /* offset of syncpoint registers */
 	u64 dma_mask; /* mask of addressable memory */
 	bool has_hypervisor; /* has hypervisor registers */
+	unsigned int num_sid_entries;
+	const struct host1x_sid_entry *sid_table;
 };
 
 struct host1x {
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 02/13] gpu: host1x: Program the channel stream ID
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 01/13] gpu: host1x: Set up stream ID table Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing Thierry Reding
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

When processing command streams, make sure the host1x's stream ID is
programmed for the channel so that addresses are properly translated
through the SMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/hw/channel_hw.c          | 12 ++++++++++++
 drivers/gpu/host1x/hw/host1x06_hardware.h   |  1 +
 drivers/gpu/host1x/hw/host1x07_hardware.h   |  1 +
 drivers/gpu/host1x/hw/hw_host1x06_channel.h | 11 +++++++++++
 drivers/gpu/host1x/hw/hw_host1x07_channel.h | 11 +++++++++++
 5 files changed, 36 insertions(+)
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x06_channel.h
 create mode 100644 drivers/gpu/host1x/hw/hw_host1x07_channel.h

diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index 95ea81172a83..ff137fe72d34 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -89,6 +89,16 @@ static inline void synchronize_syncpt_base(struct host1x_job *job)
 			 HOST1X_UCLASS_LOAD_SYNCPT_BASE_VALUE_F(value));
 }
 
+static void host1x_channel_set_streamid(struct host1x_channel *channel)
+{
+#if HOST1X_HW >= 6
+	struct iommu_fwspec *spec = dev_iommu_fwspec_get(channel->dev->parent);
+	u32 sid = spec->ids[0] & 0xffff;
+
+	host1x_ch_writel(channel, sid, HOST1X_CHANNEL_SMMU_STREAMID);
+#endif
+}
+
 static int channel_submit(struct host1x_job *job)
 {
 	struct host1x_channel *ch = job->channel;
@@ -120,6 +130,8 @@ static int channel_submit(struct host1x_job *job)
 		goto error;
 	}
 
+	host1x_channel_set_streamid(channel);
+
 	/* begin a CDMA submit */
 	err = host1x_cdma_begin(&ch->cdma, job);
 	if (err) {
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 3039c92ea605..eab753b91f24 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 
+#include "hw_host1x06_channel.h"
 #include "hw_host1x06_uclass.h"
 #include "hw_host1x06_vm.h"
 #include "hw_host1x06_hypervisor.h"
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
index 1353e7ab71dd..a79f57dc87bb 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -22,6 +22,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 
+#include "hw_host1x07_channel.h"
 #include "hw_host1x07_uclass.h"
 #include "hw_host1x07_vm.h"
 #include "hw_host1x07_hypervisor.h"
diff --git a/drivers/gpu/host1x/hw/hw_host1x06_channel.h b/drivers/gpu/host1x/hw/hw_host1x06_channel.h
new file mode 100644
index 000000000000..18ae1c57bbea
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x06_channel.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_HW_HOST1X06_CHANNEL_H
+#define HOST1X_HW_HOST1X06_CHANNEL_H
+
+#define HOST1X_CHANNEL_SMMU_STREAMID 0x084
+
+#endif
diff --git a/drivers/gpu/host1x/hw/hw_host1x07_channel.h b/drivers/gpu/host1x/hw/hw_host1x07_channel.h
new file mode 100644
index 000000000000..96fa72bbd7ab
--- /dev/null
+++ b/drivers/gpu/host1x/hw/hw_host1x07_channel.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 NVIDIA Corporation.
+ */
+
+#ifndef HOST1X_HW_HOST1X07_CHANNEL_H
+#define HOST1X_HW_HOST1X07_CHANNEL_H
+
+#define HOST1X_CHANNEL_SMMU_STREAMID 0x084
+
+#endif
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 01/13] gpu: host1x: Set up stream ID table Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 02/13] gpu: host1x: Program the channel stream ID Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-25  9:13   ` Mikko Perttunen
  2019-01-24 18:02 ` [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra186 and later support 40 bits of address space. Additional
registers need to be programmed to store the full 40 bits of push
buffer addresses.

Since command stream gathers can also reside in buffers in a 40-bit
address space, a new variant of the GATHER opcode is also introduced.
It takes two parameters: the first parameter contains the lower 32
bits of the address and the second parameter contains bits 32 to 39.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/hw/cdma_hw.c           | 13 ++++++++---
 drivers/gpu/host1x/hw/channel_hw.c        | 28 +++++++++++++++++++----
 drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
 drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
 4 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
index ce320534cbed..6d2b7af2af89 100644
--- a/drivers/gpu/host1x/hw/cdma_hw.c
+++ b/drivers/gpu/host1x/hw/cdma_hw.c
@@ -68,20 +68,27 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
 static void cdma_start(struct host1x_cdma *cdma)
 {
 	struct host1x_channel *ch = cdma_to_channel(cdma);
+	u64 start, end;
 
 	if (cdma->running)
 		return;
 
 	cdma->last_pos = cdma->push_buffer.pos;
+	start = cdma->push_buffer.dma;
+	end = cdma->push_buffer.dma + cdma->push_buffer.size + 4;
 
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
 			 HOST1X_CHANNEL_DMACTRL);
 
 	/* set base, put and end pointer */
-	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
+	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
 	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
-	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
-			 HOST1X_CHANNEL_DMAEND);
+	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
+
+#if HOST1X_HW >= 6
+	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
+	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
+#endif
 
 	/* reset GET */
 	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
index ff137fe72d34..78fb49539e8c 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -64,11 +64,31 @@ static void submit_gathers(struct host1x_job *job)
 
 	for (i = 0; i < job->num_gathers; i++) {
 		struct host1x_job_gather *g = &job->gathers[i];
-		u32 op1 = host1x_opcode_gather(g->words);
-		u32 op2 = g->base + g->offset;
+		dma_addr_t addr = g->base + g->offset;
+		u32 op2, op3;
 
-		trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
-		host1x_cdma_push(cdma, op1, op2);
+		op2 = lower_32_bits(addr);
+		op3 = upper_32_bits(addr);
+
+		trace_write_gather(cdma, g->bo, g->offset, g->words);
+
+		if (op3 != 0) {
+#if HOST1X_HW >= 6
+			u32 op1 = host1x_opcode_gather_wide(g->words);
+			u32 op4 = HOST1X_OPCODE_NOP;
+
+			host1x_cdma_push(cdma, op1, op2);
+			host1x_cdma_push(cdma, op3, op4);
+#else
+			dev_err(dev, "invalid gather for push buffer %pad\n",
+				&addr);
+			continue;
+#endif
+		} else {
+			u32 op1 = host1x_opcode_gather(g->words);
+
+			host1x_cdma_push(cdma, op1, op2);
+		}
 	}
 }
 
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
index eab753b91f24..dd37b10c8d04 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_gather_wide(unsigned count)
+{
+	return (12 << 28) | count;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
index a79f57dc87bb..9f6da4ee5443 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
 	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_gather_wide(unsigned count)
+{
+	return (12 << 28) | count;
+}
+
 #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
 
 #endif
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (2 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-28 14:30   ` Dmitry Osipenko
  2019-01-24 18:02 ` [PATCH v2 05/13] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

If we use the IOMMU API directly to map buffers into host1x' IOVA space,
we must make sure that the DMA API doesn't already set up a mapping, or
else translation will fail.

The direct DMA API allows us to allocate memory that will not be mapped
through an IOMMU automatically.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/cdma.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
index 91df51e631b2..ccde23a0109c 100644
--- a/drivers/gpu/host1x/cdma.c
+++ b/drivers/gpu/host1x/cdma.c
@@ -19,6 +19,7 @@
 
 #include <asm/cacheflush.h>
 #include <linux/device.h>
+#include <linux/dma-direct.h>
 #include <linux/dma-mapping.h>
 #include <linux/host1x.h>
 #include <linux/interrupt.h>
@@ -70,6 +71,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
  */
 static int host1x_pushbuffer_init(struct push_buffer *pb)
 {
+	unsigned long attrs = DMA_ATTR_WRITE_COMBINE;
 	struct host1x_cdma *cdma = pb_to_cdma(pb);
 	struct host1x *host1x = cdma_to_host1x(cdma);
 	struct iova *alloc;
@@ -91,8 +93,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 
 		size = iova_align(&host1x->iova, size);
 
-		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
-					  GFP_KERNEL);
+		pb->mapped = dma_direct_alloc(host1x->dev, size, &pb->phys,
+					      GFP_KERNEL, attrs);
 		if (!pb->mapped)
 			return -ENOMEM;
 
@@ -127,7 +129,10 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
 iommu_free_iova:
 	__free_iova(&host1x->iova, alloc);
 iommu_free_mem:
-	dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
+	if (host1x->domain)
+		dma_direct_free(host1x->dev, size, pb->mapped, pb->phys, attrs);
+	else
+		dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
 
 	return err;
 }
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 05/13] gpu: host1x: Restrict IOVA space to DMA mask
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (3 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 06/13] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that is not addressable by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 4c044ee54fe6..544b67f2b3ff 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -283,6 +283,8 @@ static int host1x_probe(struct platform_device *pdev)
 	host->group = iommu_group_get(&pdev->dev);
 	if (host->group) {
 		struct iommu_domain_geometry *geometry;
+		u64 mask = dma_get_mask(host->dev);
+		dma_addr_t start, end;
 		unsigned long order;
 
 		err = iova_cache_get();
@@ -310,11 +312,12 @@ static int host1x_probe(struct platform_device *pdev)
 		}
 
 		geometry = &host->domain->geometry;
+		start = geometry->aperture_start & mask;
+		end = geometry->aperture_end & mask;
 
 		order = __ffs(host->domain->pgsize_bitmap);
-		init_iova_domain(&host->iova, 1UL << order,
-				 geometry->aperture_start >> order);
-		host->iova_end = geometry->aperture_end;
+		init_iova_domain(&host->iova, 1UL << order, start >> order);
+		host->iova_end = end;
 	}
 
 skip_iommu:
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 06/13] gpu: host1x: Support 40-bit addressing on Tegra186
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (4 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 05/13] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The host1x and clients instantiated on Tegra186 support addressing 40
bits of memory.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/host1x/dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
index 544b67f2b3ff..ee3c7b81a29d 100644
--- a/drivers/gpu/host1x/dev.c
+++ b/drivers/gpu/host1x/dev.c
@@ -136,7 +136,7 @@ static const struct host1x_info host1x06_info = {
 	.nb_bases = 16,
 	.init = host1x06_init,
 	.sync_offset = 0x0,
-	.dma_mask = DMA_BIT_MASK(34),
+	.dma_mask = DMA_BIT_MASK(40),
 	.has_hypervisor = true,
 	.num_sid_entries = ARRAY_SIZE(tegra186_sid_table),
 	.sid_table = tegra186_sid_table,
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (5 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 06/13] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-28 14:10   ` Dmitry Osipenko
  2019-01-24 18:02 ` [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand Thierry Reding
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Tegra DRM clients need access to their parent, so store a pointer to it
upon registration. It's technically possible to get at this by going via
the host1x client's parent and getting the driver data, but that's quite
complicated and not very transparent. It's much more straightforward and
natural to let the children know about their parent.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v2:
- clarify rationale in commit message

 drivers/gpu/drm/tegra/drm.c | 2 ++
 drivers/gpu/drm/tegra/drm.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 4b70ce664c41..61dcbd218ffc 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra,
 {
 	mutex_lock(&tegra->clients_lock);
 	list_add_tail(&client->list, &tegra->clients);
+	client->drm = tegra;
 	mutex_unlock(&tegra->clients_lock);
 
 	return 0;
@@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
 {
 	mutex_lock(&tegra->clients_lock);
 	list_del_init(&client->list);
+	client->drm = NULL;
 	mutex_unlock(&tegra->clients_lock);
 
 	return 0;
diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index f1763b4d5b5f..2c809755bca7 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
 struct tegra_drm_client {
 	struct host1x_client base;
 	struct list_head list;
+	struct tegra_drm *drm;
 
 	unsigned int version;
 	const struct tegra_drm_client_ops *ops;
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (6 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-28 14:12   ` Dmitry Osipenko
  2019-01-24 18:02 ` [PATCH v2 09/13] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Loading the firmware requires an allocation of IOVA space to make sure
that the VIC's Falcon microcontroller can read the firmware if address
translation via the SMMU is enabled.

However, the allocation currently happens at a time where the geometry
of an IOMMU domain may not have been initialized yet. This happens for
example on Tegra186 and later where an ARM SMMU is used. Domains which
are created by the ARM SMMU driver postpone the geometry setup until a
device is attached to the domain. This is because IOMMU domains aren't
attached to a specific IOMMU instance at allocation time and hence the
input address space, which defines the geometry, is not known yet.

Work around this by postponing the firmware load until it is needed at
the time where a channel is opened to the VIC. At this time the shared
IOMMU domain's geometry has been properly initialized.

As a byproduct this allows the Tegra DRM to be created in the absence
of VIC firmware, since the VIC initialization no longer fails if the
firmware can't be found.

Based on an earlier patch by Dmitry Osipenko <digetx@gmail.com>.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
Changes in v2:
- actually request firmware on demand

 drivers/gpu/drm/tegra/vic.c | 53 +++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index d47983deb1cf..9cb10d1e923a 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client)
 		vic->domain = tegra->domain;
 	}
 
-	if (!vic->falcon.data) {
-		vic->falcon.data = tegra;
-		err = falcon_load_firmware(&vic->falcon);
-		if (err < 0)
-			goto detach;
-	}
-
 	vic->channel = host1x_channel_request(client->dev);
 	if (!vic->channel) {
 		err = -ENOMEM;
@@ -246,6 +239,30 @@ static const struct host1x_client_ops vic_client_ops = {
 	.exit = vic_exit,
 };
 
+static int vic_load_firmware(struct vic *vic)
+{
+	int err;
+
+	if (vic->falcon.data)
+		return 0;
+
+	vic->falcon.data = vic->client->drm;
+
+	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
+	if (err < 0)
+		goto cleanup;
+
+	err = falcon_load_firmware(&vic->falcon);
+	if (err < 0)
+		goto cleanup;
+
+	return 0;
+
+cleanup:
+	vic->falcon.data = NULL;
+	return err;
+}
+
 static int vic_open_channel(struct tegra_drm_client *client,
 			    struct tegra_drm_context *context)
 {
@@ -256,19 +273,25 @@ static int vic_open_channel(struct tegra_drm_client *client,
 	if (err < 0)
 		return err;
 
+	err = vic_load_firmware(vic);
+	if (err < 0)
+		goto rpm_put;
+
 	err = vic_boot(vic);
-	if (err < 0) {
-		pm_runtime_put(vic->dev);
-		return err;
-	}
+	if (err < 0)
+		goto rpm_put;
 
 	context->channel = host1x_channel_get(vic->channel);
 	if (!context->channel) {
-		pm_runtime_put(vic->dev);
-		return -ENOMEM;
+		err = -ENOMEM;
+		goto rpm_put;
 	}
 
 	return 0;
+
+rpm_put:
+	pm_runtime_put(vic->dev);
+	return err;
 }
 
 static void vic_close_channel(struct tegra_drm_context *context)
@@ -372,10 +395,6 @@ static int vic_probe(struct platform_device *pdev)
 	if (err < 0)
 		return err;
 
-	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
-	if (err < 0)
-		goto exit_falcon;
-
 	platform_set_drvdata(pdev, vic);
 
 	INIT_LIST_HEAD(&vic->client.base.list);
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 09/13] drm/tegra: Setup shared IOMMU domain after initialization
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (7 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 10/13] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Move initialization of the shared IOMMU domain after the host1x device
has been initialized. At this point all the Tegra DRM clients have been
attached to the shared IOMMU domain.

This is important because Tegra186 and later use an ARM SMMU, for which
the driver defers setting up the geometry for a domain until a device is
attached to it. This is to ensure that the domain is properly set up for
a specific ARM SMMU instance, which is unknown at allocation time.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 54 ++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 61dcbd218ffc..271c7a5fc954 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -92,10 +92,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 		return -ENOMEM;
 
 	if (iommu_present(&platform_bus_type)) {
-		u64 carveout_start, carveout_end, gem_start, gem_end;
-		struct iommu_domain_geometry *geometry;
-		unsigned long order;
-
 		tegra->domain = iommu_domain_alloc(&platform_bus_type);
 		if (!tegra->domain) {
 			err = -ENOMEM;
@@ -105,27 +101,6 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 		err = iova_cache_get();
 		if (err < 0)
 			goto domain;
-
-		geometry = &tegra->domain->geometry;
-		gem_start = geometry->aperture_start;
-		gem_end = geometry->aperture_end - CARVEOUT_SZ;
-		carveout_start = gem_end + 1;
-		carveout_end = geometry->aperture_end;
-
-		order = __ffs(tegra->domain->pgsize_bitmap);
-		init_iova_domain(&tegra->carveout.domain, 1UL << order,
-				 carveout_start >> order);
-
-		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
-		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
-
-		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
-		mutex_init(&tegra->mm_lock);
-
-		DRM_DEBUG("IOMMU apertures:\n");
-		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
-		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
-			  carveout_end);
 	}
 
 	mutex_init(&tegra->clients_lock);
@@ -159,6 +134,35 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 	if (err < 0)
 		goto fbdev;
 
+	if (tegra->domain) {
+		u64 carveout_start, carveout_end, gem_start, gem_end;
+		dma_addr_t start, end;
+		unsigned long order;
+
+		start = tegra->domain->geometry.aperture_start;
+		end = tegra->domain->geometry.aperture_end;
+
+		gem_start = start;
+		gem_end = end - CARVEOUT_SZ;
+		carveout_start = gem_end + 1;
+		carveout_end = end;
+
+		order = __ffs(tegra->domain->pgsize_bitmap);
+		init_iova_domain(&tegra->carveout.domain, 1UL << order,
+				 carveout_start >> order);
+
+		tegra->carveout.shift = iova_shift(&tegra->carveout.domain);
+		tegra->carveout.limit = carveout_end >> tegra->carveout.shift;
+
+		drm_mm_init(&tegra->mm, gem_start, gem_end - gem_start + 1);
+		mutex_init(&tegra->mm_lock);
+
+		DRM_DEBUG("IOMMU apertures:\n");
+		DRM_DEBUG("  GEM: %#llx-%#llx\n", gem_start, gem_end);
+		DRM_DEBUG("  Carveout: %#llx-%#llx\n", carveout_start,
+			  carveout_end);
+	}
+
 	if (tegra->hub) {
 		err = tegra_display_hub_prepare(tegra->hub);
 		if (err < 0)
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 10/13] drm/tegra: Restrict IOVA space to DMA mask
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (8 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 09/13] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 11/13] drm/tegra: vic: Do not clear driver data Thierry Reding
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

On Tegra186 and later, the ARM SMMU provides an input address space that
is 48 bits wide. However, memory clients can only address up to 40 bits.
If the geometry is used as-is, allocations of IOVA space can end up in a
region that cannot be addressed by the memory clients.

To fix this, restrict the IOVA space to the DMA mask of the host1x
device. Note that, technically, the IOVA space needs to be restricted to
the intersection of the DMA masks for all clients that are attached to
the IOMMU domain. In practice using the DMA mask of the host1x device is
sufficient because all host1x clients share the same DMA mask.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/drm.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 271c7a5fc954..0c5f1e6a0446 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -136,11 +136,12 @@ static int tegra_drm_load(struct drm_device *drm, unsigned long flags)
 
 	if (tegra->domain) {
 		u64 carveout_start, carveout_end, gem_start, gem_end;
+		u64 dma_mask = dma_get_mask(&device->dev);
 		dma_addr_t start, end;
 		unsigned long order;
 
-		start = tegra->domain->geometry.aperture_start;
-		end = tegra->domain->geometry.aperture_end;
+		start = tegra->domain->geometry.aperture_start & dma_mask;
+		end = tegra->domain->geometry.aperture_end & dma_mask;
 
 		gem_start = start;
 		gem_end = end - CARVEOUT_SZ;
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 11/13] drm/tegra: vic: Do not clear driver data
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (9 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 10/13] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 12/13] drm/tegra: vic: Support stream ID register programming Thierry Reding
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Upon driver failure, the driver core will take care of clearing the
driver data, so there's no need to do so explicitly in the driver.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/vic.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 9cb10d1e923a..d20fcbf6196d 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -412,7 +412,6 @@ static int vic_probe(struct platform_device *pdev)
 	err = host1x_client_register(&vic->client.base);
 	if (err < 0) {
 		dev_err(dev, "failed to register host1x client: %d\n", err);
-		platform_set_drvdata(pdev, NULL);
 		goto exit_falcon;
 	}
 
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 12/13] drm/tegra: vic: Support stream ID register programming
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (10 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 11/13] drm/tegra: vic: Do not clear driver data Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 18:02 ` [PATCH v2 13/13] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

The version of VIC found in Tegra186 and later incorporates improvements
with regards to context isolation. As part of those improvements, stream
ID registers were added that allow to specify separate stream IDs for
the Falcon microcontroller and the VIC memory interface.

While it is possible to also set the stream ID dynamically at runtime to
allow userspace contexts to be completely separated, this commit doesn't
implement that yet. Instead, the static VIC stream ID is programmed when
the Falcon is booted. This ensures that memory accesses by the Falcon or
the VIC are properly translated via the SMMU.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 drivers/gpu/drm/tegra/vic.c | 21 +++++++++++++++++++++
 drivers/gpu/drm/tegra/vic.h |  9 +++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index d20fcbf6196d..a615be2c02e1 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -26,6 +26,7 @@
 struct vic_config {
 	const char *firmware;
 	unsigned int version;
+	bool supports_sid;
 };
 
 struct vic {
@@ -105,6 +106,22 @@ static int vic_boot(struct vic *vic)
 	if (vic->booted)
 		return 0;
 
+	if (vic->config->supports_sid) {
+		struct iommu_fwspec *spec = dev_iommu_fwspec_get(vic->dev);
+		u32 value;
+
+		value = TRANSCFG_ATT(1, TRANSCFG_SID_FALCON) |
+			TRANSCFG_ATT(0, TRANSCFG_SID_HW);
+		vic_writel(vic, value, VIC_TFBIF_TRANSCFG);
+
+		if (spec->num_ids > 0) {
+			value = spec->ids[0] & 0xffff;
+
+			vic_writel(vic, value, VIC_THI_STREAMID0);
+			vic_writel(vic, value, VIC_THI_STREAMID1);
+		}
+	}
+
 	/* setup clockgating registers */
 	vic_writel(vic, CG_IDLE_CG_DLY_CNT(4) |
 			CG_IDLE_CG_EN |
@@ -314,6 +331,7 @@ static const struct tegra_drm_client_ops vic_ops = {
 static const struct vic_config vic_t124_config = {
 	.firmware = NVIDIA_TEGRA_124_VIC_FIRMWARE,
 	.version = 0x40,
+	.supports_sid = false,
 };
 
 #define NVIDIA_TEGRA_210_VIC_FIRMWARE "nvidia/tegra210/vic04_ucode.bin"
@@ -321,6 +339,7 @@ static const struct vic_config vic_t124_config = {
 static const struct vic_config vic_t210_config = {
 	.firmware = NVIDIA_TEGRA_210_VIC_FIRMWARE,
 	.version = 0x21,
+	.supports_sid = false,
 };
 
 #define NVIDIA_TEGRA_186_VIC_FIRMWARE "nvidia/tegra186/vic04_ucode.bin"
@@ -328,6 +347,7 @@ static const struct vic_config vic_t210_config = {
 static const struct vic_config vic_t186_config = {
 	.firmware = NVIDIA_TEGRA_186_VIC_FIRMWARE,
 	.version = 0x18,
+	.supports_sid = true,
 };
 
 #define NVIDIA_TEGRA_194_VIC_FIRMWARE "nvidia/tegra194/vic.bin"
@@ -335,6 +355,7 @@ static const struct vic_config vic_t186_config = {
 static const struct vic_config vic_t194_config = {
 	.firmware = NVIDIA_TEGRA_194_VIC_FIRMWARE,
 	.version = 0x19,
+	.supports_sid = true,
 };
 
 static const struct of_device_id vic_match[] = {
diff --git a/drivers/gpu/drm/tegra/vic.h b/drivers/gpu/drm/tegra/vic.h
index 21844817a7e1..017584340dd6 100644
--- a/drivers/gpu/drm/tegra/vic.h
+++ b/drivers/gpu/drm/tegra/vic.h
@@ -17,11 +17,20 @@
 
 /* VIC registers */
 
+#define VIC_THI_STREAMID0	0x00000030
+#define VIC_THI_STREAMID1	0x00000034
+
 #define NV_PVIC_MISC_PRI_VIC_CG			0x000016d0
 #define CG_IDLE_CG_DLY_CNT(val)			((val & 0x3f) << 0)
 #define CG_IDLE_CG_EN				(1 << 6)
 #define CG_WAKEUP_DLY_CNT(val)			((val & 0xf) << 16)
 
+#define VIC_TFBIF_TRANSCFG	0x00002044
+#define  TRANSCFG_ATT(i, v)	(((v) & 0x3) << (i * 4))
+#define  TRANSCFG_SID_HW	0
+#define  TRANSCFG_SID_PHY	1
+#define  TRANSCFG_SID_FALCON	2
+
 /* Firmware offsets */
 
 #define VIC_UCODE_FCE_HEADER_OFFSET		(6*4)
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH v2 13/13] arm64: tegra: Enable SMMU for VIC on Tegra186
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (11 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 12/13] drm/tegra: vic: Support stream ID register programming Thierry Reding
@ 2019-01-24 18:02 ` Thierry Reding
  2019-01-24 21:38 ` [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
  2019-01-24 21:53 ` Dmitry Osipenko
  14 siblings, 0 replies; 27+ messages in thread
From: Thierry Reding @ 2019-01-24 18:02 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

From: Thierry Reding <treding@nvidia.com>

Enable address translation for VIC via the SMMU on Tegra186.

Signed-off-by: Thierry Reding <treding@nvidia.com>
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index 22815db4a3ed..c5cda13c8195 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -799,6 +799,7 @@
 			reset-names = "vic";
 
 			power-domains = <&bpmp TEGRA186_POWER_DOMAIN_VIC>;
+			iommus = <&smmu TEGRA186_SID_VIC>;
 		};
 
 		dsib: dsi@15400000 {
-- 
2.19.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (12 preceding siblings ...)
  2019-01-24 18:02 ` [PATCH v2 13/13] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
@ 2019-01-24 21:38 ` Dmitry Osipenko
  2019-01-25  9:23   ` Thierry Reding
  2019-01-24 21:53 ` Dmitry Osipenko
  14 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-24 21:38 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen; +Cc: linux-tegra, dri-devel

24.01.2019 21:02, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra186 and later are different from earlier generations in that they
> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
> slightly differently in that the geometry for IOMMU domains is set only
> after a device was attached to it. This is to make sure that the SMMU
> instance that the domain belongs to is known, because each instance can
> have a different input address space (i.e. geometry).
> 
> Work around this by moving all IOVA allocations to a point where the
> geometry of the domain is properly initialized.
> 
> This second version of the series addresses all review comments and adds
> a number of patches that will actually allow host1x to work with an SMMU
> enabled on Tegra186. The patches also add programming required to
> address the full 40 bits of address space.
> 
> This supersedes the following patch:
> 
>     https://patchwork.kernel.org/patch/10775579/

My understanding is that falcon won't boot because source DMA address of the firmware isn't set up correctly if the address is >32bit. Please correct me if I'm wrong, otherwise that need to be addressed in this series as well for completeness.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
                   ` (13 preceding siblings ...)
  2019-01-24 21:38 ` [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
@ 2019-01-24 21:53 ` Dmitry Osipenko
  2019-01-25  8:57   ` Mikko Perttunen
  14 siblings, 1 reply; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-24 21:53 UTC (permalink / raw)
  To: Thierry Reding, Mikko Perttunen; +Cc: linux-tegra, dri-devel

24.01.2019 21:02, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra186 and later are different from earlier generations in that they
> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
> slightly differently in that the geometry for IOMMU domains is set only
> after a device was attached to it. This is to make sure that the SMMU
> instance that the domain belongs to is known, because each instance can
> have a different input address space (i.e. geometry).
> 
> Work around this by moving all IOVA allocations to a point where the
> geometry of the domain is properly initialized.
> 
> This second version of the series addresses all review comments and adds
> a number of patches that will actually allow host1x to work with an SMMU
> enabled on Tegra186. The patches also add programming required to
> address the full 40 bits of address space.
> 
> This supersedes the following patch:
> 
>     https://patchwork.kernel.org/patch/10775579/

Secondly, seems there are additional restrictions for the host1x jobs on T186, at least T186 TRM suggests so. In particular looks like each client is hardwired to a specific sync point and to a specific channel. Or maybe there is assumption that upstream kernel could work only in a hypervisor mode or with all protections disable. Could you please clarify?
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-24 21:53 ` Dmitry Osipenko
@ 2019-01-25  8:57   ` Mikko Perttunen
  2019-01-25 13:18     ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Mikko Perttunen @ 2019-01-25  8:57 UTC (permalink / raw)
  To: Dmitry Osipenko, Thierry Reding, Mikko Perttunen; +Cc: linux-tegra, dri-devel

On 24.1.2019 23.53, Dmitry Osipenko wrote:
> 24.01.2019 21:02, Thierry Reding пишет:
>> From: Thierry Reding <treding@nvidia.com>
>>
>> Tegra186 and later are different from earlier generations in that they
>> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
>> slightly differently in that the geometry for IOMMU domains is set only
>> after a device was attached to it. This is to make sure that the SMMU
>> instance that the domain belongs to is known, because each instance can
>> have a different input address space (i.e. geometry).
>>
>> Work around this by moving all IOVA allocations to a point where the
>> geometry of the domain is properly initialized.
>>
>> This second version of the series addresses all review comments and adds
>> a number of patches that will actually allow host1x to work with an SMMU
>> enabled on Tegra186. The patches also add programming required to
>> address the full 40 bits of address space.
>>
>> This supersedes the following patch:
>>
>>      https://patchwork.kernel.org/patch/10775579/
> 
> Secondly, seems there are additional restrictions for the host1x jobs on T186, at least T186 TRM suggests so. In particular looks like each client is hardwired to a specific sync point and to a specific channel. Or maybe there is assumption that upstream kernel could work only in a hypervisor mode or with all protections disable. Could you please clarify?
> 

There are no such syncpoint/channel restrictions. The upstream driver 
indeed currently only supports the case where there is no "hypervisor" 
(that is, server process that allocates host1x resources) running and 
the kernel has access to the Host1x COMMON/"hypervisor" register aperture.

Adding support for the situation where this is not the case shouldn't be 
very difficult, but we currently don't have any upstream platforms where 
the Host1x server exists (it's only there on automotive platforms).

Mikko
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing
  2019-01-24 18:02 ` [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing Thierry Reding
@ 2019-01-25  9:13   ` Mikko Perttunen
  2019-01-25  9:20     ` Thierry Reding
  0 siblings, 1 reply; 27+ messages in thread
From: Mikko Perttunen @ 2019-01-25  9:13 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

On 24.1.2019 20.02, Thierry Reding wrote:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra186 and later support 40 bits of address space. Additional
> registers need to be programmed to store the full 40 bits of push
> buffer addresses.
> 
> Since command stream gathers can also reside in buffers in a 40-bit
> address space, a new variant of the GATHER opcode is also introduced.
> It takes two parameters: the first parameter contains the lower 32
> bits of the address and the second parameter contains bits 32 to 39.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>   drivers/gpu/host1x/hw/cdma_hw.c           | 13 ++++++++---
>   drivers/gpu/host1x/hw/channel_hw.c        | 28 +++++++++++++++++++----
>   drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
>   drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
>   4 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> index ce320534cbed..6d2b7af2af89 100644
> --- a/drivers/gpu/host1x/hw/cdma_hw.c
> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> @@ -68,20 +68,27 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
>   static void cdma_start(struct host1x_cdma *cdma)
>   {
>   	struct host1x_channel *ch = cdma_to_channel(cdma);
> +	u64 start, end;
>   
>   	if (cdma->running)
>   		return;
>   
>   	cdma->last_pos = cdma->push_buffer.pos;
> +	start = cdma->push_buffer.dma;
> +	end = cdma->push_buffer.dma + cdma->push_buffer.size + 4;
>   
>   	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>   			 HOST1X_CHANNEL_DMACTRL);
>   
>   	/* set base, put and end pointer */
> -	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
> +	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
>   	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
> -	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
> -			 HOST1X_CHANNEL_DMAEND);
> +	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
> +
> +#if HOST1X_HW >= 6
> +	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
> +	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
> +#endif
>   
>   	/* reset GET */
>   	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> index ff137fe72d34..78fb49539e8c 100644
> --- a/drivers/gpu/host1x/hw/channel_hw.c
> +++ b/drivers/gpu/host1x/hw/channel_hw.c
> @@ -64,11 +64,31 @@ static void submit_gathers(struct host1x_job *job)
>   
>   	for (i = 0; i < job->num_gathers; i++) {
>   		struct host1x_job_gather *g = &job->gathers[i];
> -		u32 op1 = host1x_opcode_gather(g->words);
> -		u32 op2 = g->base + g->offset;
> +		dma_addr_t addr = g->base + g->offset;
> +		u32 op2, op3;
>   
> -		trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
> -		host1x_cdma_push(cdma, op1, op2);
> +		op2 = lower_32_bits(addr);
> +		op3 = upper_32_bits(addr);
> +
> +		trace_write_gather(cdma, g->bo, g->offset, g->words);
> +
> +		if (op3 != 0) {
> +#if HOST1X_HW >= 6
> +			u32 op1 = host1x_opcode_gather_wide(g->words);
> +			u32 op4 = HOST1X_OPCODE_NOP;
> +
> +			host1x_cdma_push(cdma, op1, op2);
> +			host1x_cdma_push(cdma, op3, op4);

This will break if the first push goes as the last slot of the 
ringbuffer and the second push goes as the first slot of the ringbuffer.

Otherwise looks good to me.

> +#else
> +			dev_err(dev, "invalid gather for push buffer %pad\n",
> +				&addr);
> +			continue;
> +#endif
> +		} else {
> +			u32 op1 = host1x_opcode_gather(g->words);
> +
> +			host1x_cdma_push(cdma, op1, op2);
> +		}
>   	}
>   }
>   
> diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h b/drivers/gpu/host1x/hw/host1x06_hardware.h
> index eab753b91f24..dd37b10c8d04 100644
> --- a/drivers/gpu/host1x/hw/host1x06_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
> @@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>   	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>   }
>   
> +static inline u32 host1x_opcode_gather_wide(unsigned count)
> +{
> +	return (12 << 28) | count;
> +}
> +
>   #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>   
>   #endif
> diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h b/drivers/gpu/host1x/hw/host1x07_hardware.h
> index a79f57dc87bb..9f6da4ee5443 100644
> --- a/drivers/gpu/host1x/hw/host1x07_hardware.h
> +++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
> @@ -138,6 +138,11 @@ static inline u32 host1x_opcode_gather_incr(unsigned offset, unsigned count)
>   	return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
>   }
>   
> +static inline u32 host1x_opcode_gather_wide(unsigned count)
> +{
> +	return (12 << 28) | count;
> +}
> +
>   #define HOST1X_OPCODE_NOP host1x_opcode_nonincr(0, 0)
>   
>   #endif
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing
  2019-01-25  9:13   ` Mikko Perttunen
@ 2019-01-25  9:20     ` Thierry Reding
  2019-01-25  9:32       ` Mikko Perttunen
  0 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-25  9:20 UTC (permalink / raw)
  To: Mikko Perttunen; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 3992 bytes --]

On Fri, Jan 25, 2019 at 11:13:41AM +0200, Mikko Perttunen wrote:
> On 24.1.2019 20.02, Thierry Reding wrote:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Tegra186 and later support 40 bits of address space. Additional
> > registers need to be programmed to store the full 40 bits of push
> > buffer addresses.
> > 
> > Since command stream gathers can also reside in buffers in a 40-bit
> > address space, a new variant of the GATHER opcode is also introduced.
> > It takes two parameters: the first parameter contains the lower 32
> > bits of the address and the second parameter contains bits 32 to 39.
> > 
> > Signed-off-by: Thierry Reding <treding@nvidia.com>
> > ---
> >   drivers/gpu/host1x/hw/cdma_hw.c           | 13 ++++++++---
> >   drivers/gpu/host1x/hw/channel_hw.c        | 28 +++++++++++++++++++----
> >   drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
> >   drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
> >   4 files changed, 44 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
> > index ce320534cbed..6d2b7af2af89 100644
> > --- a/drivers/gpu/host1x/hw/cdma_hw.c
> > +++ b/drivers/gpu/host1x/hw/cdma_hw.c
> > @@ -68,20 +68,27 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
> >   static void cdma_start(struct host1x_cdma *cdma)
> >   {
> >   	struct host1x_channel *ch = cdma_to_channel(cdma);
> > +	u64 start, end;
> >   	if (cdma->running)
> >   		return;
> >   	cdma->last_pos = cdma->push_buffer.pos;
> > +	start = cdma->push_buffer.dma;
> > +	end = cdma->push_buffer.dma + cdma->push_buffer.size + 4;
> >   	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
> >   			 HOST1X_CHANNEL_DMACTRL);
> >   	/* set base, put and end pointer */
> > -	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
> > +	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
> >   	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
> > -	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
> > -			 HOST1X_CHANNEL_DMAEND);
> > +	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
> > +
> > +#if HOST1X_HW >= 6
> > +	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
> > +	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
> > +#endif
> >   	/* reset GET */
> >   	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
> > diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
> > index ff137fe72d34..78fb49539e8c 100644
> > --- a/drivers/gpu/host1x/hw/channel_hw.c
> > +++ b/drivers/gpu/host1x/hw/channel_hw.c
> > @@ -64,11 +64,31 @@ static void submit_gathers(struct host1x_job *job)
> >   	for (i = 0; i < job->num_gathers; i++) {
> >   		struct host1x_job_gather *g = &job->gathers[i];
> > -		u32 op1 = host1x_opcode_gather(g->words);
> > -		u32 op2 = g->base + g->offset;
> > +		dma_addr_t addr = g->base + g->offset;
> > +		u32 op2, op3;
> > -		trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
> > -		host1x_cdma_push(cdma, op1, op2);
> > +		op2 = lower_32_bits(addr);
> > +		op3 = upper_32_bits(addr);
> > +
> > +		trace_write_gather(cdma, g->bo, g->offset, g->words);
> > +
> > +		if (op3 != 0) {
> > +#if HOST1X_HW >= 6
> > +			u32 op1 = host1x_opcode_gather_wide(g->words);
> > +			u32 op4 = HOST1X_OPCODE_NOP;
> > +
> > +			host1x_cdma_push(cdma, op1, op2);
> > +			host1x_cdma_push(cdma, op3, op4);
> 
> This will break if the first push goes as the last slot of the ringbuffer
> and the second push goes as the first slot of the ringbuffer.
> 
> Otherwise looks good to me.

Why would that break? Isn't the purpose of a ringbuffer to behave as if
it was infinitely sequential? If this really is a problem, how do we fix
it? Would we have to stash NOPs into the pushbuffer until we wrap
around?

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-24 21:38 ` [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
@ 2019-01-25  9:23   ` Thierry Reding
  2019-01-25 13:14     ` Dmitry Osipenko
  0 siblings, 1 reply; 27+ messages in thread
From: Thierry Reding @ 2019-01-25  9:23 UTC (permalink / raw)
  To: Dmitry Osipenko; +Cc: linux-tegra, dri-devel, Mikko Perttunen


[-- Attachment #1.1: Type: text/plain, Size: 1743 bytes --]

On Fri, Jan 25, 2019 at 12:38:01AM +0300, Dmitry Osipenko wrote:
> 24.01.2019 21:02, Thierry Reding пишет:
> > From: Thierry Reding <treding@nvidia.com>
> > 
> > Tegra186 and later are different from earlier generations in that they
> > use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
> > slightly differently in that the geometry for IOMMU domains is set only
> > after a device was attached to it. This is to make sure that the SMMU
> > instance that the domain belongs to is known, because each instance can
> > have a different input address space (i.e. geometry).
> > 
> > Work around this by moving all IOVA allocations to a point where the
> > geometry of the domain is properly initialized.
> > 
> > This second version of the series addresses all review comments and adds
> > a number of patches that will actually allow host1x to work with an SMMU
> > enabled on Tegra186. The patches also add programming required to
> > address the full 40 bits of address space.
> > 
> > This supersedes the following patch:
> > 
> >     https://patchwork.kernel.org/patch/10775579/
> 
> My understanding is that falcon won't boot because source DMA address
> of the firmware isn't set up correctly if the address is >32bit.
> Please correct me if I'm wrong, otherwise that need to be addressed in
> this series as well for completeness.

What makes you say so? I was runtime testing this series as I was
developing these patches and this works properly on Tegra186 with a 40
bit address space. Since the carveout is allocated from the top of the
IOMMU aperture, the addresses that the Falcon sees are always from the
top of the 40 bit IOVA space and this works flawlessly.

Thierry

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing
  2019-01-25  9:20     ` Thierry Reding
@ 2019-01-25  9:32       ` Mikko Perttunen
  2019-01-25  9:34         ` Mikko Perttunen
  0 siblings, 1 reply; 27+ messages in thread
From: Mikko Perttunen @ 2019-01-25  9:32 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen



On 25.1.2019 11.20, Thierry Reding wrote:
> On Fri, Jan 25, 2019 at 11:13:41AM +0200, Mikko Perttunen wrote:
>> On 24.1.2019 20.02, Thierry Reding wrote:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Tegra186 and later support 40 bits of address space. Additional
>>> registers need to be programmed to store the full 40 bits of push
>>> buffer addresses.
>>>
>>> Since command stream gathers can also reside in buffers in a 40-bit
>>> address space, a new variant of the GATHER opcode is also introduced.
>>> It takes two parameters: the first parameter contains the lower 32
>>> bits of the address and the second parameter contains bits 32 to 39.
>>>
>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>> ---
>>>    drivers/gpu/host1x/hw/cdma_hw.c           | 13 ++++++++---
>>>    drivers/gpu/host1x/hw/channel_hw.c        | 28 +++++++++++++++++++----
>>>    drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
>>>    drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
>>>    4 files changed, 44 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c
>>> index ce320534cbed..6d2b7af2af89 100644
>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
>>> @@ -68,20 +68,27 @@ static void cdma_timeout_cpu_incr(struct host1x_cdma *cdma, u32 getptr,
>>>    static void cdma_start(struct host1x_cdma *cdma)
>>>    {
>>>    	struct host1x_channel *ch = cdma_to_channel(cdma);
>>> +	u64 start, end;
>>>    	if (cdma->running)
>>>    		return;
>>>    	cdma->last_pos = cdma->push_buffer.pos;
>>> +	start = cdma->push_buffer.dma;
>>> +	end = cdma->push_buffer.dma + cdma->push_buffer.size + 4;
>>>    	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>>>    			 HOST1X_CHANNEL_DMACTRL);
>>>    	/* set base, put and end pointer */
>>> -	host1x_ch_writel(ch, cdma->push_buffer.dma, HOST1X_CHANNEL_DMASTART);
>>> +	host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);
>>>    	host1x_ch_writel(ch, cdma->push_buffer.pos, HOST1X_CHANNEL_DMAPUT);
>>> -	host1x_ch_writel(ch, cdma->push_buffer.dma + cdma->push_buffer.size + 4,
>>> -			 HOST1X_CHANNEL_DMAEND);
>>> +	host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
>>> +
>>> +#if HOST1X_HW >= 6
>>> +	host1x_ch_writel(ch, upper_32_bits(start), HOST1X_CHANNEL_DMASTART_HI);
>>> +	host1x_ch_writel(ch, upper_32_bits(end), HOST1X_CHANNEL_DMAEND_HI);
>>> +#endif
>>>    	/* reset GET */
>>>    	host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
>>> diff --git a/drivers/gpu/host1x/hw/channel_hw.c b/drivers/gpu/host1x/hw/channel_hw.c
>>> index ff137fe72d34..78fb49539e8c 100644
>>> --- a/drivers/gpu/host1x/hw/channel_hw.c
>>> +++ b/drivers/gpu/host1x/hw/channel_hw.c
>>> @@ -64,11 +64,31 @@ static void submit_gathers(struct host1x_job *job)
>>>    	for (i = 0; i < job->num_gathers; i++) {
>>>    		struct host1x_job_gather *g = &job->gathers[i];
>>> -		u32 op1 = host1x_opcode_gather(g->words);
>>> -		u32 op2 = g->base + g->offset;
>>> +		dma_addr_t addr = g->base + g->offset;
>>> +		u32 op2, op3;
>>> -		trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
>>> -		host1x_cdma_push(cdma, op1, op2);
>>> +		op2 = lower_32_bits(addr);
>>> +		op3 = upper_32_bits(addr);
>>> +
>>> +		trace_write_gather(cdma, g->bo, g->offset, g->words);
>>> +
>>> +		if (op3 != 0) {
>>> +#if HOST1X_HW >= 6
>>> +			u32 op1 = host1x_opcode_gather_wide(g->words);
>>> +			u32 op4 = HOST1X_OPCODE_NOP;
>>> +
>>> +			host1x_cdma_push(cdma, op1, op2);
>>> +			host1x_cdma_push(cdma, op3, op4);
>>
>> This will break if the first push goes as the last slot of the ringbuffer
>> and the second push goes as the first slot of the ringbuffer.
>>
>> Otherwise looks good to me.
> 
> Why would that break? Isn't the purpose of a ringbuffer to behave as if
> it was infinitely sequential? If this really is a problem, how do we fix
> it? Would we have to stash NOPs into the pushbuffer until we wrap
> around?

That's because it's not actually a ringbuffer. It's actually just a 
buffer with a 'RESTART 0' opcode in the last slot. As such, the GATHER_W 
would incorrectly use the 'RESTART 0' opcode as its third word.

Cheers,
Mikko

> 
> Thierry
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing
  2019-01-25  9:32       ` Mikko Perttunen
@ 2019-01-25  9:34         ` Mikko Perttunen
  0 siblings, 0 replies; 27+ messages in thread
From: Mikko Perttunen @ 2019-01-25  9:34 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, Dmitry Osipenko, dri-devel, Mikko Perttunen

On 25.1.2019 11.32, Mikko Perttunen wrote:
> 
> 
> On 25.1.2019 11.20, Thierry Reding wrote:
>> On Fri, Jan 25, 2019 at 11:13:41AM +0200, Mikko Perttunen wrote:
>>> On 24.1.2019 20.02, Thierry Reding wrote:
>>>> From: Thierry Reding <treding@nvidia.com>
>>>>
>>>> Tegra186 and later support 40 bits of address space. Additional
>>>> registers need to be programmed to store the full 40 bits of push
>>>> buffer addresses.
>>>>
>>>> Since command stream gathers can also reside in buffers in a 40-bit
>>>> address space, a new variant of the GATHER opcode is also introduced.
>>>> It takes two parameters: the first parameter contains the lower 32
>>>> bits of the address and the second parameter contains bits 32 to 39.
>>>>
>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>
>>>> ---
>>>>    drivers/gpu/host1x/hw/cdma_hw.c           | 13 ++++++++---
>>>>    drivers/gpu/host1x/hw/channel_hw.c        | 28 
>>>> +++++++++++++++++++----
>>>>    drivers/gpu/host1x/hw/host1x06_hardware.h |  5 ++++
>>>>    drivers/gpu/host1x/hw/host1x07_hardware.h |  5 ++++
>>>>    4 files changed, 44 insertions(+), 7 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c 
>>>> b/drivers/gpu/host1x/hw/cdma_hw.c
>>>> index ce320534cbed..6d2b7af2af89 100644
>>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c
>>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c
>>>> @@ -68,20 +68,27 @@ static void cdma_timeout_cpu_incr(struct 
>>>> host1x_cdma *cdma, u32 getptr,
>>>>    static void cdma_start(struct host1x_cdma *cdma)
>>>>    {
>>>>        struct host1x_channel *ch = cdma_to_channel(cdma);
>>>> +    u64 start, end;
>>>>        if (cdma->running)
>>>>            return;
>>>>        cdma->last_pos = cdma->push_buffer.pos;
>>>> +    start = cdma->push_buffer.dma;
>>>> +    end = cdma->push_buffer.dma + cdma->push_buffer.size + 4;
>>>>        host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,
>>>>                 HOST1X_CHANNEL_DMACTRL);
>>>>        /* set base, put and end pointer */
>>>> -    host1x_ch_writel(ch, cdma->push_buffer.dma, 
>>>> HOST1X_CHANNEL_DMASTART);
>>>> +    host1x_ch_writel(ch, lower_32_bits(start), 
>>>> HOST1X_CHANNEL_DMASTART);
>>>>        host1x_ch_writel(ch, cdma->push_buffer.pos, 
>>>> HOST1X_CHANNEL_DMAPUT);
>>>> -    host1x_ch_writel(ch, cdma->push_buffer.dma + 
>>>> cdma->push_buffer.size + 4,
>>>> -             HOST1X_CHANNEL_DMAEND);
>>>> +    host1x_ch_writel(ch, lower_32_bits(end), HOST1X_CHANNEL_DMAEND);
>>>> +
>>>> +#if HOST1X_HW >= 6
>>>> +    host1x_ch_writel(ch, upper_32_bits(start), 
>>>> HOST1X_CHANNEL_DMASTART_HI);
>>>> +    host1x_ch_writel(ch, upper_32_bits(end), 
>>>> HOST1X_CHANNEL_DMAEND_HI);
>>>> +#endif
>>>>        /* reset GET */
>>>>        host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP |
>>>> diff --git a/drivers/gpu/host1x/hw/channel_hw.c 
>>>> b/drivers/gpu/host1x/hw/channel_hw.c
>>>> index ff137fe72d34..78fb49539e8c 100644
>>>> --- a/drivers/gpu/host1x/hw/channel_hw.c
>>>> +++ b/drivers/gpu/host1x/hw/channel_hw.c
>>>> @@ -64,11 +64,31 @@ static void submit_gathers(struct host1x_job *job)
>>>>        for (i = 0; i < job->num_gathers; i++) {
>>>>            struct host1x_job_gather *g = &job->gathers[i];
>>>> -        u32 op1 = host1x_opcode_gather(g->words);
>>>> -        u32 op2 = g->base + g->offset;
>>>> +        dma_addr_t addr = g->base + g->offset;
>>>> +        u32 op2, op3;
>>>> -        trace_write_gather(cdma, g->bo, g->offset, op1 & 0xffff);
>>>> -        host1x_cdma_push(cdma, op1, op2);
>>>> +        op2 = lower_32_bits(addr);
>>>> +        op3 = upper_32_bits(addr);
>>>> +
>>>> +        trace_write_gather(cdma, g->bo, g->offset, g->words);
>>>> +
>>>> +        if (op3 != 0) {
>>>> +#if HOST1X_HW >= 6
>>>> +            u32 op1 = host1x_opcode_gather_wide(g->words);
>>>> +            u32 op4 = HOST1X_OPCODE_NOP;
>>>> +
>>>> +            host1x_cdma_push(cdma, op1, op2);
>>>> +            host1x_cdma_push(cdma, op3, op4);
>>>
>>> This will break if the first push goes as the last slot of the 
>>> ringbuffer
>>> and the second push goes as the first slot of the ringbuffer.
>>>
>>> Otherwise looks good to me.
>>
>> Why would that break? Isn't the purpose of a ringbuffer to behave as if
>> it was infinitely sequential? If this really is a problem, how do we fix
>> it? Would we have to stash NOPs into the pushbuffer until we wrap
>> around?
> 
> That's because it's not actually a ringbuffer. It's actually just a 
> buffer with a 'RESTART 0' opcode in the last slot. As such, the GATHER_W 
> would incorrectly use the 'RESTART 0' opcode as its third word.

And yes, detecting the situation and filling with NOP is one solution. 
The reason cdma_push currently takes two words is to guarantee that 
two-word opcodes can always fit in the buffer without splitting.

> 
> Cheers,
> Mikko
> 
>>
>> Thierry
>>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-25  9:23   ` Thierry Reding
@ 2019-01-25 13:14     ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-25 13:14 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

25.01.2019 12:23, Thierry Reding пишет:
> On Fri, Jan 25, 2019 at 12:38:01AM +0300, Dmitry Osipenko wrote:
>> 24.01.2019 21:02, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Tegra186 and later are different from earlier generations in that they
>>> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
>>> slightly differently in that the geometry for IOMMU domains is set only
>>> after a device was attached to it. This is to make sure that the SMMU
>>> instance that the domain belongs to is known, because each instance can
>>> have a different input address space (i.e. geometry).
>>>
>>> Work around this by moving all IOVA allocations to a point where the
>>> geometry of the domain is properly initialized.
>>>
>>> This second version of the series addresses all review comments and adds
>>> a number of patches that will actually allow host1x to work with an SMMU
>>> enabled on Tegra186. The patches also add programming required to
>>> address the full 40 bits of address space.
>>>
>>> This supersedes the following patch:
>>>
>>>     https://patchwork.kernel.org/patch/10775579/
>>
>> My understanding is that falcon won't boot because source DMA address
>> of the firmware isn't set up correctly if the address is >32bit.
>> Please correct me if I'm wrong, otherwise that need to be addressed in
>> this series as well for completeness.
> 
> What makes you say so? I was runtime testing this series as I was
> developing these patches and this works properly on Tegra186 with a 40
> bit address space. Since the carveout is allocated from the top of the
> IOMMU aperture, the addresses that the Falcon sees are always from the
> top of the 40 bit IOVA space and this works flawlessly.

I looked again at the code and noticed this time that the address is *shifted* by 256 bytes. Mikko told yesterday about the alignment requirement, but I just wasn't seeing the shifting in the code. Looks good then! 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later
  2019-01-25  8:57   ` Mikko Perttunen
@ 2019-01-25 13:18     ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-25 13:18 UTC (permalink / raw)
  To: Mikko Perttunen, Thierry Reding, Mikko Perttunen; +Cc: linux-tegra, dri-devel

25.01.2019 11:57, Mikko Perttunen пишет:
> On 24.1.2019 23.53, Dmitry Osipenko wrote:
>> 24.01.2019 21:02, Thierry Reding пишет:
>>> From: Thierry Reding <treding@nvidia.com>
>>>
>>> Tegra186 and later are different from earlier generations in that they
>>> use an ARM SMMU rather than the Tegra SMMU. The ARM SMMU driver behaves
>>> slightly differently in that the geometry for IOMMU domains is set only
>>> after a device was attached to it. This is to make sure that the SMMU
>>> instance that the domain belongs to is known, because each instance can
>>> have a different input address space (i.e. geometry).
>>>
>>> Work around this by moving all IOVA allocations to a point where the
>>> geometry of the domain is properly initialized.
>>>
>>> This second version of the series addresses all review comments and adds
>>> a number of patches that will actually allow host1x to work with an SMMU
>>> enabled on Tegra186. The patches also add programming required to
>>> address the full 40 bits of address space.
>>>
>>> This supersedes the following patch:
>>>
>>>      https://patchwork.kernel.org/patch/10775579/
>>
>> Secondly, seems there are additional restrictions for the host1x jobs on T186, at least T186 TRM suggests so. In particular looks like each client is hardwired to a specific sync point and to a specific channel. Or maybe there is assumption that upstream kernel could work only in a hypervisor mode or with all protections disable. Could you please clarify?
>>
> 
> There are no such syncpoint/channel restrictions. The upstream driver indeed currently only supports the case where there is no "hypervisor" (that is, server process that allocates host1x resources) running and the kernel has access to the Host1x COMMON/"hypervisor" register aperture.
> 
> Adding support for the situation where this is not the case shouldn't be very difficult, but we currently don't have any upstream platforms where the Host1x server exists (it's only there on automotive platforms).

Thank you very much for the clarification!
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients
  2019-01-24 18:02 ` [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
@ 2019-01-28 14:10   ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-28 14:10 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

24.01.2019 21:02, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Tegra DRM clients need access to their parent, so store a pointer to it
> upon registration. It's technically possible to get at this by going via
> the host1x client's parent and getting the driver data, but that's quite
> complicated and not very transparent. It's much more straightforward and
> natural to let the children know about their parent.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v2:
> - clarify rationale in commit message
> 
>  drivers/gpu/drm/tegra/drm.c | 2 ++
>  drivers/gpu/drm/tegra/drm.h | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
> index 4b70ce664c41..61dcbd218ffc 100644
> --- a/drivers/gpu/drm/tegra/drm.c
> +++ b/drivers/gpu/drm/tegra/drm.c
> @@ -1041,6 +1041,7 @@ int tegra_drm_register_client(struct tegra_drm *tegra,
>  {
>  	mutex_lock(&tegra->clients_lock);
>  	list_add_tail(&client->list, &tegra->clients);
> +	client->drm = tegra;
>  	mutex_unlock(&tegra->clients_lock);
>  
>  	return 0;
> @@ -1051,6 +1052,7 @@ int tegra_drm_unregister_client(struct tegra_drm *tegra,
>  {
>  	mutex_lock(&tegra->clients_lock);
>  	list_del_init(&client->list);
> +	client->drm = NULL;
>  	mutex_unlock(&tegra->clients_lock);
>  
>  	return 0;
> diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
> index f1763b4d5b5f..2c809755bca7 100644
> --- a/drivers/gpu/drm/tegra/drm.h
> +++ b/drivers/gpu/drm/tegra/drm.h
> @@ -88,6 +88,7 @@ int tegra_drm_submit(struct tegra_drm_context *context,
>  struct tegra_drm_client {
>  	struct host1x_client base;
>  	struct list_head list;
> +	struct tegra_drm *drm;
>  
>  	unsigned int version;
>  	const struct tegra_drm_client_ops *ops;
> 

Thanks for improving the commit message! Looks good.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand
  2019-01-24 18:02 ` [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand Thierry Reding
@ 2019-01-28 14:12   ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-28 14:12 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

24.01.2019 21:02, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> Loading the firmware requires an allocation of IOVA space to make sure
> that the VIC's Falcon microcontroller can read the firmware if address
> translation via the SMMU is enabled.
> 
> However, the allocation currently happens at a time where the geometry
> of an IOMMU domain may not have been initialized yet. This happens for
> example on Tegra186 and later where an ARM SMMU is used. Domains which
> are created by the ARM SMMU driver postpone the geometry setup until a
> device is attached to the domain. This is because IOMMU domains aren't
> attached to a specific IOMMU instance at allocation time and hence the
> input address space, which defines the geometry, is not known yet.
> 
> Work around this by postponing the firmware load until it is needed at
> the time where a channel is opened to the VIC. At this time the shared
> IOMMU domain's geometry has been properly initialized.
> 
> As a byproduct this allows the Tegra DRM to be created in the absence
> of VIC firmware, since the VIC initialization no longer fails if the
> firmware can't be found.
> 
> Based on an earlier patch by Dmitry Osipenko <digetx@gmail.com>.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
> Changes in v2:
> - actually request firmware on demand
> 
>  drivers/gpu/drm/tegra/vic.c | 53 +++++++++++++++++++++++++------------
>  1 file changed, 36 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
> index d47983deb1cf..9cb10d1e923a 100644
> --- a/drivers/gpu/drm/tegra/vic.c
> +++ b/drivers/gpu/drm/tegra/vic.c
> @@ -181,13 +181,6 @@ static int vic_init(struct host1x_client *client)
>  		vic->domain = tegra->domain;
>  	}
>  
> -	if (!vic->falcon.data) {
> -		vic->falcon.data = tegra;
> -		err = falcon_load_firmware(&vic->falcon);
> -		if (err < 0)
> -			goto detach;
> -	}
> -
>  	vic->channel = host1x_channel_request(client->dev);
>  	if (!vic->channel) {
>  		err = -ENOMEM;
> @@ -246,6 +239,30 @@ static const struct host1x_client_ops vic_client_ops = {
>  	.exit = vic_exit,
>  };
>  
> +static int vic_load_firmware(struct vic *vic)
> +{
> +	int err;
> +
> +	if (vic->falcon.data)
> +		return 0;
> +
> +	vic->falcon.data = vic->client->drm;
> +
> +	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
> +	if (err < 0)
> +		goto cleanup;
> +
> +	err = falcon_load_firmware(&vic->falcon);
> +	if (err < 0)
> +		goto cleanup;
> +
> +	return 0;
> +
> +cleanup:
> +	vic->falcon.data = NULL;
> +	return err;
> +}
> +
>  static int vic_open_channel(struct tegra_drm_client *client,
>  			    struct tegra_drm_context *context)
>  {
> @@ -256,19 +273,25 @@ static int vic_open_channel(struct tegra_drm_client *client,
>  	if (err < 0)
>  		return err;
>  
> +	err = vic_load_firmware(vic);
> +	if (err < 0)
> +		goto rpm_put;
> +
>  	err = vic_boot(vic);
> -	if (err < 0) {
> -		pm_runtime_put(vic->dev);
> -		return err;
> -	}
> +	if (err < 0)
> +		goto rpm_put;
>  
>  	context->channel = host1x_channel_get(vic->channel);
>  	if (!context->channel) {
> -		pm_runtime_put(vic->dev);
> -		return -ENOMEM;
> +		err = -ENOMEM;
> +		goto rpm_put;
>  	}
>  
>  	return 0;
> +
> +rpm_put:
> +	pm_runtime_put(vic->dev);
> +	return err;
>  }
>  
>  static void vic_close_channel(struct tegra_drm_context *context)
> @@ -372,10 +395,6 @@ static int vic_probe(struct platform_device *pdev)
>  	if (err < 0)
>  		return err;
>  
> -	err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
> -	if (err < 0)
> -		goto exit_falcon;
> -
>  	platform_set_drvdata(pdev, vic);
>  
>  	INIT_LIST_HEAD(&vic->client.base.list);
> 

Looks good this time!

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage
  2019-01-24 18:02 ` [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
@ 2019-01-28 14:30   ` Dmitry Osipenko
  0 siblings, 0 replies; 27+ messages in thread
From: Dmitry Osipenko @ 2019-01-28 14:30 UTC (permalink / raw)
  To: Thierry Reding; +Cc: linux-tegra, dri-devel, Mikko Perttunen

24.01.2019 21:02, Thierry Reding пишет:
> From: Thierry Reding <treding@nvidia.com>
> 
> If we use the IOMMU API directly to map buffers into host1x' IOVA space,
> we must make sure that the DMA API doesn't already set up a mapping, or
> else translation will fail.
> 
> The direct DMA API allows us to allocate memory that will not be mapped
> through an IOMMU automatically.
> 
> Signed-off-by: Thierry Reding <treding@nvidia.com>
> ---
>  drivers/gpu/host1x/cdma.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/cdma.c b/drivers/gpu/host1x/cdma.c
> index 91df51e631b2..ccde23a0109c 100644
> --- a/drivers/gpu/host1x/cdma.c
> +++ b/drivers/gpu/host1x/cdma.c
> @@ -19,6 +19,7 @@
>  
>  #include <asm/cacheflush.h>
>  #include <linux/device.h>
> +#include <linux/dma-direct.h>
>  #include <linux/dma-mapping.h>
>  #include <linux/host1x.h>
>  #include <linux/interrupt.h>
> @@ -70,6 +71,7 @@ static void host1x_pushbuffer_destroy(struct push_buffer *pb)
>   */
>  static int host1x_pushbuffer_init(struct push_buffer *pb)
>  {
> +	unsigned long attrs = DMA_ATTR_WRITE_COMBINE;
>  	struct host1x_cdma *cdma = pb_to_cdma(pb);
>  	struct host1x *host1x = cdma_to_host1x(cdma);
>  	struct iova *alloc;
> @@ -91,8 +93,8 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
>  
>  		size = iova_align(&host1x->iova, size);
>  
> -		pb->mapped = dma_alloc_wc(host1x->dev, size, &pb->phys,
> -					  GFP_KERNEL);
> +		pb->mapped = dma_direct_alloc(host1x->dev, size, &pb->phys,
> +					      GFP_KERNEL, attrs);
>  		if (!pb->mapped)
>  			return -ENOMEM;
>  
> @@ -127,7 +129,10 @@ static int host1x_pushbuffer_init(struct push_buffer *pb)
>  iommu_free_iova:
>  	__free_iova(&host1x->iova, alloc);
>  iommu_free_mem:
> -	dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
> +	if (host1x->domain)
> +		dma_direct_free(host1x->dev, size, pb->mapped, pb->phys, attrs);
> +	else
> +		dma_free_wc(host1x->dev, size, pb->mapped, pb->phys);
>  
>  	return err;
>  }
> 

This variant looks good for the current upstream code. Alternatively we could check the domain->type==IOMMU_DOMAIN_DMA and skip IOMMU initialization on probe.

I may test that nothing breaks with this series for older Tegra's once it all will appear in -next. Please address Mikko's review-comments and push the patches once ready.

Reviewed-by: Dmitry Osipenko <digetx@gmail.com>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-01-28 14:30 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-24 18:02 [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Thierry Reding
2019-01-24 18:02 ` [PATCH v2 01/13] gpu: host1x: Set up stream ID table Thierry Reding
2019-01-24 18:02 ` [PATCH v2 02/13] gpu: host1x: Program the channel stream ID Thierry Reding
2019-01-24 18:02 ` [PATCH v2 03/13] gpu: host1x: Support 40-bit addressing Thierry Reding
2019-01-25  9:13   ` Mikko Perttunen
2019-01-25  9:20     ` Thierry Reding
2019-01-25  9:32       ` Mikko Perttunen
2019-01-25  9:34         ` Mikko Perttunen
2019-01-24 18:02 ` [PATCH v2 04/13] gpu: host1x: Use direct DMA with IOMMU API usage Thierry Reding
2019-01-28 14:30   ` Dmitry Osipenko
2019-01-24 18:02 ` [PATCH v2 05/13] gpu: host1x: Restrict IOVA space to DMA mask Thierry Reding
2019-01-24 18:02 ` [PATCH v2 06/13] gpu: host1x: Support 40-bit addressing on Tegra186 Thierry Reding
2019-01-24 18:02 ` [PATCH v2 07/13] drm/tegra: Store parent pointer in Tegra DRM clients Thierry Reding
2019-01-28 14:10   ` Dmitry Osipenko
2019-01-24 18:02 ` [PATCH v2 08/13] drm/tegra: vic: Load firmware on demand Thierry Reding
2019-01-28 14:12   ` Dmitry Osipenko
2019-01-24 18:02 ` [PATCH v2 09/13] drm/tegra: Setup shared IOMMU domain after initialization Thierry Reding
2019-01-24 18:02 ` [PATCH v2 10/13] drm/tegra: Restrict IOVA space to DMA mask Thierry Reding
2019-01-24 18:02 ` [PATCH v2 11/13] drm/tegra: vic: Do not clear driver data Thierry Reding
2019-01-24 18:02 ` [PATCH v2 12/13] drm/tegra: vic: Support stream ID register programming Thierry Reding
2019-01-24 18:02 ` [PATCH v2 13/13] arm64: tegra: Enable SMMU for VIC on Tegra186 Thierry Reding
2019-01-24 21:38 ` [PATCH v2 00/13] drm/tegra: Fix IOVA space on Tegra186 and later Dmitry Osipenko
2019-01-25  9:23   ` Thierry Reding
2019-01-25 13:14     ` Dmitry Osipenko
2019-01-24 21:53 ` Dmitry Osipenko
2019-01-25  8:57   ` Mikko Perttunen
2019-01-25 13:18     ` Dmitry Osipenko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.