All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/14] Per-instance pagetables for MSM GPUs
@ 2018-02-21 22:59 ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

This is a request for comment for support in the iommu, arm-smmu
and MSM GPU driver to support per (GPU) instance pagetables.

The general idea behind per-instance pagetables is that each GPU
client can have its own pagetable and virtual memory space which
prevents malicious or accidental corruption or copying.  We say
per-instance because the pagetables are unique to each DRM file
handle and there could be multiple instances per process.

In newer arm-smmu implementations this behavior could be managed
with hardware based PASIDs (see Jean Phillipe's epic SVA
stack https://patchwork.kernel.org/patch/10214963/)
but all the MSM GPU implementations in existence use
arm-smmu-v2 which doesn't have the ability to support switching
pagetables in hardware. As a result the vendor has added a bit
of hardware specific glue to allow the GPU microcode to switch
the pagetable asynchronously during execution (basically it
reaches out and reprograms some of the context bank registers).

To support all of this we need a handful of changes to allow
the client driver to create a properly formatted pagetable and
directly map and unmap buffers inside of that pagetable and
get the parameters (i.e. the physical address of the pagetable)
to program into the GPU at the appropriate time.

This stack builds on the aforementioned code from Jean-Phillipe to
add the needed support to iommu core and arm-smmu and then implement
per-instance pagetables in the MSM DRM GPU driver for the a5xx
family (tested on the db820c).

The first two patches add support to create enable a TTBR1 pagetable
for arm-smmuv-2 if the appropriate domain attribute is selected.
It creates a pagetable and programs the appropriate registers to
enable TTBR1.  The sign extension bit is programmed as the highest
bit in the ibs region (or the special 48th bit when the UBS is 49
bits).  The correct pagetable is automatically selected for
map/unmap based on the sign extension bit.

The next three patches add "virtual" pasid support. This allows
a client to allocate a pasid which is an index to a pagetable
structure. The pasid token is used to map and unmap entries
in that pagetable structure.  The existing pasid idr for SVA
is reused so that clients that also support hardware PASID
entries can use both types if they wish.

Th next patch adds a special side-band function for arm-smmu
that registers two callbacks to inform the client driver when
a new pasid is created/destroyed. This allows the arm-smmu
driver to pass the pagetable information (ttbr and asid) to
the client without needing changes to the IOMMU core.

All the following patches are for the DRM/GPU driver.  The
first enables 64 bit mode for a5xx which lets us use all
48 bits.  The next patch 5 patches are infrastructure patches
to cleanup address spaces and prepare for having a per-instance
address space and the final patch enables per-instance pagetables
for a5xx and implements the PM4 to switch the pagetable at create
time.

Please know that nearly all of this is up for discussion. In particular
since we all know that the two hardest problems in computer science are
caches, naming and off-by-one errors I am open to fixing the vocabulary
and terminology for "pasid" and "per-instance" and whatever - please
paint that bikeshed if you feel so inclined. Thanks for reading this
far. On with the code.

Applies against git://linux-arm.org/linux-jpb.git sva/v1

Jordan Crouse (14):
  iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
  iommu/arm-smmu: Add support for TTBR1
  iommu: Create a base struct for io_mm
  iommu: sva: Add support for pasid allocation
  iommu: arm-smmu: Add pasid implementation
  iommu: arm-smmu: Add side-band function to specific pasid callbacks
  drm/msm: Enable 64 bit mode by default
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Support using TTBR1 for kernel buffer objects
  drm/msm: Add msm_mmu features
  drm/msm: Add support for iommu-sva PASIDs
  drm/msm: Add support for per-instance address spaces
  drm/msm: Support per-instance address spaces
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |  69 +++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  17 ++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  76 ++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  11 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
 drivers/gpu/drm/msm/msm_drv.c             |  45 +++--
 drivers/gpu/drm/msm/msm_drv.h             |   4 +
 drivers/gpu/drm/msm/msm_gem.h             |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  36 +++-
 drivers/gpu/drm/msm/msm_gpu.c             |  25 ++-
 drivers/gpu/drm/msm/msm_gpu.h             |   4 +-
 drivers/gpu/drm/msm/msm_iommu.c           | 186 ++++++++++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h             |  19 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 drivers/iommu/arm-smmu-regs.h             |   2 -
 drivers/iommu/arm-smmu-v3.c               |   8 +-
 drivers/iommu/arm-smmu.c                  | 210 +++++++++++++++++++++-
 drivers/iommu/io-pgtable-arm.c            | 160 +++++++++++++++--
 drivers/iommu/io-pgtable-arm.h            |  20 +++
 drivers/iommu/io-pgtable.h                |  16 +-
 drivers/iommu/iommu-sva.c                 | 289 ++++++++++++++++++++++++++++--
 drivers/iommu/iommu.c                     |   3 +-
 include/linux/arm-smmu.h                  |  18 ++
 include/linux/iommu.h                     |  68 ++++++-
 25 files changed, 1205 insertions(+), 101 deletions(-)
 create mode 100644 include/linux/arm-smmu.h

-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [RFC 00/14] Per-instance pagetables for MSM GPUs
@ 2018-02-21 22:59 ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

This is a request for comment for support in the iommu, arm-smmu
and MSM GPU driver to support per (GPU) instance pagetables.

The general idea behind per-instance pagetables is that each GPU
client can have its own pagetable and virtual memory space which
prevents malicious or accidental corruption or copying.  We say
per-instance because the pagetables are unique to each DRM file
handle and there could be multiple instances per process.

In newer arm-smmu implementations this behavior could be managed
with hardware based PASIDs (see Jean Phillipe's epic SVA
stack https://patchwork.kernel.org/patch/10214963/)
but all the MSM GPU implementations in existence use
arm-smmu-v2 which doesn't have the ability to support switching
pagetables in hardware. As a result the vendor has added a bit
of hardware specific glue to allow the GPU microcode to switch
the pagetable asynchronously during execution (basically it
reaches out and reprograms some of the context bank registers).

To support all of this we need a handful of changes to allow
the client driver to create a properly formatted pagetable and
directly map and unmap buffers inside of that pagetable and
get the parameters (i.e. the physical address of the pagetable)
to program into the GPU at the appropriate time.

This stack builds on the aforementioned code from Jean-Phillipe to
add the needed support to iommu core and arm-smmu and then implement
per-instance pagetables in the MSM DRM GPU driver for the a5xx
family (tested on the db820c).

The first two patches add support to create enable a TTBR1 pagetable
for arm-smmuv-2 if the appropriate domain attribute is selected.
It creates a pagetable and programs the appropriate registers to
enable TTBR1.  The sign extension bit is programmed as the highest
bit in the ibs region (or the special 48th bit when the UBS is 49
bits).  The correct pagetable is automatically selected for
map/unmap based on the sign extension bit.

The next three patches add "virtual" pasid support. This allows
a client to allocate a pasid which is an index to a pagetable
structure. The pasid token is used to map and unmap entries
in that pagetable structure.  The existing pasid idr for SVA
is reused so that clients that also support hardware PASID
entries can use both types if they wish.

Th next patch adds a special side-band function for arm-smmu
that registers two callbacks to inform the client driver when
a new pasid is created/destroyed. This allows the arm-smmu
driver to pass the pagetable information (ttbr and asid) to
the client without needing changes to the IOMMU core.

All the following patches are for the DRM/GPU driver.  The
first enables 64 bit mode for a5xx which lets us use all
48 bits.  The next patch 5 patches are infrastructure patches
to cleanup address spaces and prepare for having a per-instance
address space and the final patch enables per-instance pagetables
for a5xx and implements the PM4 to switch the pagetable at create
time.

Please know that nearly all of this is up for discussion. In particular
since we all know that the two hardest problems in computer science are
caches, naming and off-by-one errors I am open to fixing the vocabulary
and terminology for "pasid" and "per-instance" and whatever - please
paint that bikeshed if you feel so inclined. Thanks for reading this
far. On with the code.

Applies against git://linux-arm.org/linux-jpb.git sva/v1

Jordan Crouse (14):
  iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
  iommu/arm-smmu: Add support for TTBR1
  iommu: Create a base struct for io_mm
  iommu: sva: Add support for pasid allocation
  iommu: arm-smmu: Add pasid implementation
  iommu: arm-smmu: Add side-band function to specific pasid callbacks
  drm/msm: Enable 64 bit mode by default
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm/gpu: Support using TTBR1 for kernel buffer objects
  drm/msm: Add msm_mmu features
  drm/msm: Add support for iommu-sva PASIDs
  drm/msm: Add support for per-instance address spaces
  drm/msm: Support per-instance address spaces
  drm/msm/a5xx: Support per-instance pagetables

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |  69 +++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  17 ++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  76 ++++++--
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  11 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
 drivers/gpu/drm/msm/msm_drv.c             |  45 +++--
 drivers/gpu/drm/msm/msm_drv.h             |   4 +
 drivers/gpu/drm/msm/msm_gem.h             |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  13 +-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  36 +++-
 drivers/gpu/drm/msm/msm_gpu.c             |  25 ++-
 drivers/gpu/drm/msm/msm_gpu.h             |   4 +-
 drivers/gpu/drm/msm/msm_iommu.c           | 186 ++++++++++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h             |  19 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 drivers/iommu/arm-smmu-regs.h             |   2 -
 drivers/iommu/arm-smmu-v3.c               |   8 +-
 drivers/iommu/arm-smmu.c                  | 210 +++++++++++++++++++++-
 drivers/iommu/io-pgtable-arm.c            | 160 +++++++++++++++--
 drivers/iommu/io-pgtable-arm.h            |  20 +++
 drivers/iommu/io-pgtable.h                |  16 +-
 drivers/iommu/iommu-sva.c                 | 289 ++++++++++++++++++++++++++++--
 drivers/iommu/iommu.c                     |   3 +-
 include/linux/arm-smmu.h                  |  18 ++
 include/linux/iommu.h                     |  68 ++++++-
 25 files changed, 1205 insertions(+), 101 deletions(-)
 create mode 100644 include/linux/arm-smmu.h

-- 
2.16.1

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add a new domain attribute to enable the TTBR1 pagetable for drivers
and devices that support it.  This will enabled using a TTBR1 (otherwise
known as a "global" or "system" pagetable for devices that support a split
pagetable scheme for switching pagetables quickly and safely.

Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 641aaf0f1b81..e2c49e583d8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -153,6 +153,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_ENABLE_TTBR1,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add a new domain attribute to enable the TTBR1 pagetable for drivers
and devices that support it.  This will enabled using a TTBR1 (otherwise
known as a "global" or "system" pagetable for devices that support a split
pagetable scheme for switching pagetables quickly and safely.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 641aaf0f1b81..e2c49e583d8d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -153,6 +153,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMU_ENABLE,
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
+	DOMAIN_ATTR_ENABLE_TTBR1,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Allow a SMMU device to opt into allocating a TTBR1 pagetable.

The size of the TTBR1 region will be the same as
the TTBR0 size with the sign extension bit set on the highest
bit in the region unless the upstream size is 49 bits and then
the sign-extension bit will be set on the 49th bit.

The map/unmap operations will automatically use the appropriate
pagetable based on the specified iova and the existing mask.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu-regs.h  |   2 -
 drivers/iommu/arm-smmu.c       |  22 ++++--
 drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/io-pgtable-arm.h |  20 ++++++
 drivers/iommu/io-pgtable.h     |  16 ++++-
 5 files changed, 192 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index a1226e4ab5f8..0ce85d5b22e9 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
 #define RESUME_RETRY			(0 << 0)
 #define RESUME_TERMINATE		(1 << 0)
 
-#define TTBCR2_SEP_SHIFT		15
-#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
 #define TTBCR2_AS			(1 << 4)
 
 #define TTBRn_ASID_SHIFT		48
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60792a8..ebfa59b59622 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -248,6 +248,7 @@ struct arm_smmu_domain {
 	enum arm_smmu_domain_stage	stage;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
+	u32 attributes;
 	struct iommu_domain		domain;
 };
 
@@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		} else {
 			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
 			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
 				cb->tcr[1] |= TTBCR2_AS;
 		}
@@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	unsigned int quirks =
+		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
+			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	else
 		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
 
+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
+
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.quirks		= quirks,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
-		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
-
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
@@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_ENABLE_TTBR1:
+		*((int *)data) = !!(smmu_domain->attributes
+					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
+		return 0;
 	default:
 		return -ENODEV;
 	}
@@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 		else
 			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+		break;
+	case DOMAIN_ATTR_ENABLE_TTBR1:
+		if (*((int *)data))
+			smmu_domain->attributes |=
+				1 << DOMAIN_ATTR_ENABLE_TTBR1;
+		ret = 0;
 		break;
 	default:
 		ret = -ENODEV;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fff0b6ba0a69..1bd0045f2cb7 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
 
-	void			*pgd;
+	void			*pgd[2];
 };
 
 typedef u64 arm_lpae_iopte;
@@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	return pte;
 }
 
+static inline arm_lpae_iopte *
+arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
+		unsigned long mask;
+
+		/*
+		 * if ias is 48 it really means that bit 48 is the sign
+		 * extension bit, otherwise the sign extension bit is ias - 1
+		 * (for example, bit 31 for ias 32)
+		 */
+		mask = (cfg->ias == 48) ? (1UL << 48) :
+			(1UL << (cfg->ias - 1));
+
+		if (iova & mask)
+			return data->pgd[1];
+	}
+
+	return data->pgd[0];
+}
+
 static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 			phys_addr_t paddr, size_t size, int iommu_prot)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
 
+	ptep = arm_lpae_get_table(data, iova);
+
 	/* If no access, then nothing to do */
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
-		    paddr >= (1ULL << data->iop.cfg.oas)))
+	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
+		return -ERANGE;
+
+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
+		    iova >= (1ULL << data->iop.cfg.ias)))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
 
-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
+	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
+	if (data->pgd[1])
+		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
+			data->pgd[1]);
 	kfree(data);
 }
 
@@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			  size_t size)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep;
 	int lvl = ARM_LPAE_START_LVL(data);
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
+	ptep = arm_lpae_get_table(data, iova);
+
+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
+		    iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
 	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
@@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 					 unsigned long iova)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
+	arm_lpae_iopte pte, *ptep;
 	int lvl = ARM_LPAE_START_LVL(data);
 
+	ptep = arm_lpae_get_table(data, iova);
+
 	do {
 		/* Valid IOPTE pointer? */
 		if (!ptep)
@@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	return data;
 }
 
+static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
+		struct arm_lpae_io_pgtable *data)
+
+{
+	u64 reg;
+
+	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
+	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
+		reg = ARM_LPAE_TCR_EPD1;
+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
+		return reg;
+	}
+
+	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
+
+	switch (1 << data->pg_shift) {
+	case SZ_4K:
+		reg |= ARM_LPAE_TCR_TG1_4K;
+		break;
+	case SZ_16K:
+		reg |= ARM_LPAE_TCR_TG1_16K;
+		break;
+	case SZ_64K:
+		reg |= ARM_LPAE_TCR_TG1_64K;
+		break;
+	}
+
+	/* Set T1SZ */
+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
+
+	/* Set the SEP bit based on the size */
+	switch (cfg->ias) {
+	case 32:
+		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 36:
+		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 40:
+		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 42:
+		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 44:
+		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 48:
+		/*
+		 * If ias is 48 then that probably means that the UBS on the
+		 * device was 0101b (49) which is a special case that assumes
+		 * bit 48 is the sign extension bit. In this case we are
+		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
+		 * the extension bit. One might be confused because there is
+		 * also an option to set the SEP to bit 47 but this is probably
+		 * not what the arm-smmu driver intended.
+		 */
+	default:
+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	}
+
+	return reg;
+}
+
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_NO_DMA |
+			IO_PGTABLE_QUIRK_ARM_TTBR1))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
 
-	/* Disable speculative walks through TTBR1 */
-	reg |= ARM_LPAE_TCR_EPD1;
+	/* Bring in the TTBR1 configuration */
+	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
+
 	cfg->arm_lpae_s1_cfg.tcr = reg;
 
 	/* MAIRs */
@@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
-	if (!data->pgd)
+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd[0])
 		goto out_free_data;
 
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
+			GFP_KERNEL, cfg);
+		if (!data->pgd[1]) {
+			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
+				cfg);
+			goto out_free_data;
+		}
+	} else {
+		data->pgd[1] = NULL;
+	}
+
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* TTBRs */
-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
+	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
+
+	if (data->pgd[1])
+		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
+
 	return &data->iop;
 
 out_free_data:
@@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
-	if (!data->pgd)
+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd[0])
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* VTTBR */
-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
 	return &data->iop;
 
 out_free_data:
@@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
 		data->levels, data->pgd_size, data->pg_shift,
-		data->bits_per_level, data->pgd);
+		data->bits_per_level, data->pgd[0]);
 }
 
 #define __FAIL(ops, i)	({						\
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
index cb31314971ac..6344b1d359a5 100644
--- a/drivers/iommu/io-pgtable-arm.h
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -25,14 +25,21 @@
 #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
 #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
 
+#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
+#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
+#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
+
 #define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH1_SHIFT		28
 #define ARM_LPAE_TCR_SH0_MASK		0x3
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
 #define ARM_LPAE_TCR_SH_IS		3
 
 #define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_ORGN1_SHIFT	26
 #define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_IRGN1_SHIFT	24
 #define ARM_LPAE_TCR_RGN_MASK		0x3
 #define ARM_LPAE_TCR_RGN_NC		0
 #define ARM_LPAE_TCR_RGN_WBWA		1
@@ -45,6 +52,9 @@
 #define ARM_LPAE_TCR_T0SZ_SHIFT		0
 #define ARM_LPAE_TCR_SZ_MASK		0x3f
 
+#define ARM_LPAE_TCR_T1SZ_SHIFT         16
+#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
+
 #define ARM_LPAE_TCR_PS_SHIFT		16
 #define ARM_LPAE_TCR_PS_MASK		0x7
 
@@ -58,6 +68,16 @@
 #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
 #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
 
+#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
+
+#define ARM_LPAE_TCR_SEP_31		0x0ULL
+#define ARM_LPAE_TCR_SEP_35		0x1ULL
+#define ARM_LPAE_TCR_SEP_39		0x2ULL
+#define ARM_LPAE_TCR_SEP_41		0x3ULL
+#define ARM_LPAE_TCR_SEP_43		0x4ULL
+#define ARM_LPAE_TCR_SEP_47		0x5ULL
+#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
+
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK		0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index cd2e1eafffe6..55f7b60cc44d 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,18 @@ struct io_pgtable_cfg {
 	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 	 *	software-emulated IOMMU), such that pagetable updates need not
 	 *	be treated as explicit DMA data.
+	 *
+	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
+	 *	this domain. Set up the configuration registers and dyanmically
+	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
+	 *	based on the address.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
 	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
+	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
@@ -173,18 +179,22 @@ struct io_pgtable {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-	iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void io_pgtable_tlb_add_flush(struct io_pgtable *iop,
 		unsigned long iova, size_t size, size_t granule, bool leaf)
 {
-	iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf, iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf,
+			iop->cookie);
 }
 
 static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
 {
-	iop->cfg.tlb->tlb_sync(iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_sync(iop->cookie);
 }
 
 /**
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Allow a SMMU device to opt into allocating a TTBR1 pagetable.

The size of the TTBR1 region will be the same as
the TTBR0 size with the sign extension bit set on the highest
bit in the region unless the upstream size is 49 bits and then
the sign-extension bit will be set on the 49th bit.

The map/unmap operations will automatically use the appropriate
pagetable based on the specified iova and the existing mask.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu-regs.h  |   2 -
 drivers/iommu/arm-smmu.c       |  22 ++++--
 drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
 drivers/iommu/io-pgtable-arm.h |  20 ++++++
 drivers/iommu/io-pgtable.h     |  16 ++++-
 5 files changed, 192 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
index a1226e4ab5f8..0ce85d5b22e9 100644
--- a/drivers/iommu/arm-smmu-regs.h
+++ b/drivers/iommu/arm-smmu-regs.h
@@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
 #define RESUME_RETRY			(0 << 0)
 #define RESUME_TERMINATE		(1 << 0)
 
-#define TTBCR2_SEP_SHIFT		15
-#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
 #define TTBCR2_AS			(1 << 4)
 
 #define TTBRn_ASID_SHIFT		48
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 69e7c60792a8..ebfa59b59622 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -248,6 +248,7 @@ struct arm_smmu_domain {
 	enum arm_smmu_domain_stage	stage;
 	struct mutex			init_mutex; /* Protects smmu pointer */
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
+	u32 attributes;
 	struct iommu_domain		domain;
 };
 
@@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		} else {
 			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
 			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
 				cb->tcr[1] |= TTBCR2_AS;
 		}
@@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	enum io_pgtable_fmt fmt;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	unsigned int quirks =
+		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
+			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
 
 	mutex_lock(&smmu_domain->init_mutex);
 	if (smmu_domain->smmu)
@@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	else
 		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
 
+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
+
 	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.quirks		= quirks,
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
 		.oas		= oas,
@@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
-	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
-		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
-
 	smmu_domain->smmu = smmu;
 	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
 	if (!pgtbl_ops) {
@@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 	case DOMAIN_ATTR_NESTING:
 		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 		return 0;
+	case DOMAIN_ATTR_ENABLE_TTBR1:
+		*((int *)data) = !!(smmu_domain->attributes
+					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
+		return 0;
 	default:
 		return -ENODEV;
 	}
@@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 		else
 			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+		break;
+	case DOMAIN_ATTR_ENABLE_TTBR1:
+		if (*((int *)data))
+			smmu_domain->attributes |=
+				1 << DOMAIN_ATTR_ENABLE_TTBR1;
+		ret = 0;
 		break;
 	default:
 		ret = -ENODEV;
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fff0b6ba0a69..1bd0045f2cb7 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
 
-	void			*pgd;
+	void			*pgd[2];
 };
 
 typedef u64 arm_lpae_iopte;
@@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 	return pte;
 }
 
+static inline arm_lpae_iopte *
+arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
+{
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
+		unsigned long mask;
+
+		/*
+		 * if ias is 48 it really means that bit 48 is the sign
+		 * extension bit, otherwise the sign extension bit is ias - 1
+		 * (for example, bit 31 for ias 32)
+		 */
+		mask = (cfg->ias == 48) ? (1UL << 48) :
+			(1UL << (cfg->ias - 1));
+
+		if (iova & mask)
+			return data->pgd[1];
+	}
+
+	return data->pgd[0];
+}
+
 static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 			phys_addr_t paddr, size_t size, int iommu_prot)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
 
+	ptep = arm_lpae_get_table(data, iova);
+
 	/* If no access, then nothing to do */
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
-		    paddr >= (1ULL << data->iop.cfg.oas)))
+	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
+		return -ERANGE;
+
+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
+		    iova >= (1ULL << data->iop.cfg.ias)))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
 
-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
+	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
+	if (data->pgd[1])
+		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
+			data->pgd[1]);
 	kfree(data);
 }
 
@@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			  size_t size)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte *ptep = data->pgd;
+	arm_lpae_iopte *ptep;
 	int lvl = ARM_LPAE_START_LVL(data);
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
+	ptep = arm_lpae_get_table(data, iova);
+
+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
+		    iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
 	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
@@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 					 unsigned long iova)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
-	arm_lpae_iopte pte, *ptep = data->pgd;
+	arm_lpae_iopte pte, *ptep;
 	int lvl = ARM_LPAE_START_LVL(data);
 
+	ptep = arm_lpae_get_table(data, iova);
+
 	do {
 		/* Valid IOPTE pointer? */
 		if (!ptep)
@@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	return data;
 }
 
+static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
+		struct arm_lpae_io_pgtable *data)
+
+{
+	u64 reg;
+
+	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
+	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
+		reg = ARM_LPAE_TCR_EPD1;
+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
+		return reg;
+	}
+
+	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
+
+	switch (1 << data->pg_shift) {
+	case SZ_4K:
+		reg |= ARM_LPAE_TCR_TG1_4K;
+		break;
+	case SZ_16K:
+		reg |= ARM_LPAE_TCR_TG1_16K;
+		break;
+	case SZ_64K:
+		reg |= ARM_LPAE_TCR_TG1_64K;
+		break;
+	}
+
+	/* Set T1SZ */
+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
+
+	/* Set the SEP bit based on the size */
+	switch (cfg->ias) {
+	case 32:
+		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 36:
+		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 40:
+		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 42:
+		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 44:
+		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	case 48:
+		/*
+		 * If ias is 48 then that probably means that the UBS on the
+		 * device was 0101b (49) which is a special case that assumes
+		 * bit 48 is the sign extension bit. In this case we are
+		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
+		 * the extension bit. One might be confused because there is
+		 * also an option to set the SEP to bit 47 but this is probably
+		 * not what the arm-smmu driver intended.
+		 */
+	default:
+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
+		break;
+	}
+
+	return reg;
+}
+
 static struct io_pgtable *
 arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 
-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
+			IO_PGTABLE_QUIRK_NO_DMA |
+			IO_PGTABLE_QUIRK_ARM_TTBR1))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
 
-	/* Disable speculative walks through TTBR1 */
-	reg |= ARM_LPAE_TCR_EPD1;
+	/* Bring in the TTBR1 configuration */
+	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
+
 	cfg->arm_lpae_s1_cfg.tcr = reg;
 
 	/* MAIRs */
@@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
-	if (!data->pgd)
+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd[0])
 		goto out_free_data;
 
+
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
+			GFP_KERNEL, cfg);
+		if (!data->pgd[1]) {
+			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
+				cfg);
+			goto out_free_data;
+		}
+	} else {
+		data->pgd[1] = NULL;
+	}
+
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* TTBRs */
-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
+	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
+
+	if (data->pgd[1])
+		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
+
 	return &data->iop;
 
 out_free_data:
@@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
-	if (!data->pgd)
+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	if (!data->pgd[0])
 		goto out_free_data;
 
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
 	/* VTTBR */
-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
 	return &data->iop;
 
 out_free_data:
@@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
 		data->levels, data->pgd_size, data->pg_shift,
-		data->bits_per_level, data->pgd);
+		data->bits_per_level, data->pgd[0]);
 }
 
 #define __FAIL(ops, i)	({						\
diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
index cb31314971ac..6344b1d359a5 100644
--- a/drivers/iommu/io-pgtable-arm.h
+++ b/drivers/iommu/io-pgtable-arm.h
@@ -25,14 +25,21 @@
 #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
 #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
 
+#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
+#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
+#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
+
 #define ARM_LPAE_TCR_SH0_SHIFT		12
+#define ARM_LPAE_TCR_SH1_SHIFT		28
 #define ARM_LPAE_TCR_SH0_MASK		0x3
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
 #define ARM_LPAE_TCR_SH_IS		3
 
 #define ARM_LPAE_TCR_ORGN0_SHIFT	10
+#define ARM_LPAE_TCR_ORGN1_SHIFT	26
 #define ARM_LPAE_TCR_IRGN0_SHIFT	8
+#define ARM_LPAE_TCR_IRGN1_SHIFT	24
 #define ARM_LPAE_TCR_RGN_MASK		0x3
 #define ARM_LPAE_TCR_RGN_NC		0
 #define ARM_LPAE_TCR_RGN_WBWA		1
@@ -45,6 +52,9 @@
 #define ARM_LPAE_TCR_T0SZ_SHIFT		0
 #define ARM_LPAE_TCR_SZ_MASK		0x3f
 
+#define ARM_LPAE_TCR_T1SZ_SHIFT         16
+#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
+
 #define ARM_LPAE_TCR_PS_SHIFT		16
 #define ARM_LPAE_TCR_PS_MASK		0x7
 
@@ -58,6 +68,16 @@
 #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
 #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
 
+#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
+
+#define ARM_LPAE_TCR_SEP_31		0x0ULL
+#define ARM_LPAE_TCR_SEP_35		0x1ULL
+#define ARM_LPAE_TCR_SEP_39		0x2ULL
+#define ARM_LPAE_TCR_SEP_41		0x3ULL
+#define ARM_LPAE_TCR_SEP_43		0x4ULL
+#define ARM_LPAE_TCR_SEP_47		0x5ULL
+#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
+
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK		0xff
 #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
index cd2e1eafffe6..55f7b60cc44d 100644
--- a/drivers/iommu/io-pgtable.h
+++ b/drivers/iommu/io-pgtable.h
@@ -71,12 +71,18 @@ struct io_pgtable_cfg {
 	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
 	 *	software-emulated IOMMU), such that pagetable updates need not
 	 *	be treated as explicit DMA data.
+	 *
+	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
+	 *	this domain. Set up the configuration registers and dyanmically
+	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
+	 *	based on the address.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
 	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
 	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
 	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
+	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
@@ -173,18 +179,22 @@ struct io_pgtable {
 
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
-	iop->cfg.tlb->tlb_flush_all(iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_flush_all(iop->cookie);
 }
 
 static inline void io_pgtable_tlb_add_flush(struct io_pgtable *iop,
 		unsigned long iova, size_t size, size_t granule, bool leaf)
 {
-	iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf, iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf,
+			iop->cookie);
 }
 
 static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
 {
-	iop->cfg.tlb->tlb_sync(iop->cookie);
+	if (iop->cfg.tlb)
+		iop->cfg.tlb->tlb_sync(iop->cookie);
 }
 
 /**
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/14] iommu: Create a base struct for io_mm
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

In order to support both shared mm sva pagetables as well as
io-pgtable backed tables add a base structure to
io_mm so that the two styles can share the same idr.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu-v3.c |  8 ++++----
 drivers/iommu/iommu-sva.c   | 50 ++++++++++++++++++++++++++++++---------------
 include/linux/iommu.h       | 11 +++++++++-
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 26935a9a5a97..4736a2bf39cf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2454,7 +2454,7 @@ static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
 	if (!attach_domain)
 		return 0;
 
-	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+	return ops->set_entry(ops, io_mm->base.pasid, smmu_mm->cd);
 }
 
 static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
@@ -2466,9 +2466,9 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
-		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+		ops->clear_entry(ops, io_mm->base.pasid, smmu_mm->cd);
 
-	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
+	arm_smmu_atc_inv_master_all(master, io_mm->base.pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2478,7 +2478,7 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 {
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
-	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
+	arm_smmu_atc_inv_master_range(master, io_mm->base.pasid, iova, size);
 	/*
 	 * TODO: Invalidate mapping if not DVM
 	 */
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index d7b231cd7355..5fc689b1ef72 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -161,13 +161,15 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	io_mm->mm		= mm;
 	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
+	io_mm->base.type	= IO_TYPE_MM;
+
 	INIT_LIST_HEAD(&io_mm->devices);
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_sva_lock);
-	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
-				 dev_param->max_pasid + 1, GFP_ATOMIC);
-	io_mm->pasid = pasid;
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_mm->base,
+		dev_param->min_pasid, dev_param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->base.pasid = pasid;
 	spin_unlock(&iommu_sva_lock);
 	idr_preload_end();
 
@@ -200,7 +202,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	 * 0 so no user could get a reference to it. Free it manually.
 	 */
 	spin_lock(&iommu_sva_lock);
-	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	idr_remove(&iommu_pasid_idr, io_mm->base.pasid);
 	spin_unlock(&iommu_sva_lock);
 
 err_free_mm:
@@ -231,7 +233,7 @@ static void io_mm_release(struct kref *kref)
 	io_mm = container_of(kref, struct io_mm, kref);
 	WARN_ON(!list_empty(&io_mm->devices));
 
-	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	idr_remove(&iommu_pasid_idr, io_mm->base.pasid);
 
 	/*
 	 * If we're being released from mm exit, the notifier callback ->release
@@ -286,7 +288,7 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 {
 	int ret;
 	bool attach_domain = true;
-	int pasid = io_mm->pasid;
+	int pasid = io_mm->base.pasid;
 	struct iommu_bond *bond, *tmp;
 	struct iommu_param *dev_param = dev->iommu_param;
 
@@ -378,7 +380,7 @@ static int iommu_signal_mm_exit(struct iommu_bond *bond)
 	if (!dev->iommu_param || !dev->iommu_param->mm_exit)
 		return 0;
 
-	return dev->iommu_param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+	return dev->iommu_param->mm_exit(dev, io_mm->base.pasid, bond->drvdata);
 }
 
 /*
@@ -410,7 +412,7 @@ static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm
 	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
 		if (iommu_signal_mm_exit(bond))
 			dev_WARN(bond->dev, "possible leak of PASID %u",
-				 io_mm->pasid);
+				 io_mm->base.pasid);
 
 		io_mm_detach_all_locked(bond);
 	}
@@ -585,6 +587,7 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			  unsigned long flags, void *drvdata)
 {
 	int i, ret;
+	struct io_base *io_base = NULL;
 	struct io_mm *io_mm = NULL;
 	struct iommu_domain *domain;
 	struct iommu_bond *bond = NULL, *tmp;
@@ -605,7 +608,12 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 
 	/* If an io_mm already exists, use it */
 	spin_lock(&iommu_sva_lock);
-	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+	idr_for_each_entry(&iommu_pasid_idr, io_base, i) {
+		if (io_base->type != IO_TYPE_MM)
+			continue;
+
+		io_mm = container_of(io_base, struct io_mm, base);
+
 		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
 			continue;
 
@@ -636,7 +644,7 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 	if (ret)
 		io_mm_put(io_mm);
 	else
-		*pasid = io_mm->pasid;
+		*pasid = io_mm->base.pasid;
 
 	return ret;
 }
@@ -659,6 +667,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 int iommu_sva_unbind_device(struct device *dev, int pasid)
 {
 	int ret = -ESRCH;
+	struct io_base *io_base;
 	struct io_mm *io_mm;
 	struct iommu_domain *domain;
 	struct iommu_bond *bond = NULL;
@@ -674,12 +683,14 @@ int iommu_sva_unbind_device(struct device *dev, int pasid)
 	iommu_fault_queue_flush(dev);
 
 	spin_lock(&iommu_sva_lock);
-	io_mm = idr_find(&iommu_pasid_idr, pasid);
-	if (!io_mm) {
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (!io_base || io_base->type != IO_TYPE_MM) {
 		spin_unlock(&iommu_sva_lock);
 		return -ESRCH;
 	}
 
+	io_mm = container_of(io_base, struct io_mm, base);
+
 	list_for_each_entry(bond, &io_mm->devices, mm_head) {
 		if (bond->dev == dev) {
 			io_mm_detach_locked(bond);
@@ -777,16 +788,21 @@ EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
  */
 struct mm_struct *iommu_sva_find(int pasid)
 {
+	struct io_base *io_base;
 	struct io_mm *io_mm;
 	struct mm_struct *mm = NULL;
 
 	spin_lock(&iommu_sva_lock);
-	io_mm = idr_find(&iommu_pasid_idr, pasid);
-	if (io_mm && io_mm_get_locked(io_mm)) {
-		if (mmget_not_zero(io_mm->mm))
-			mm = io_mm->mm;
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (io_base && io_base->type == IO_TYPE_MM) {
+		io_mm = container_of(io_base, struct io_mm, base);
+
+		if (io_mm_get_locked(io_mm)) {
+			if (mmget_not_zero(io_mm->mm))
+				mm = io_mm->mm;
 
-		io_mm_put_locked(io_mm);
+			io_mm_put_locked(io_mm);
+		}
 	}
 	spin_unlock(&iommu_sva_lock);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e2c49e583d8d..e998389cf195 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -110,8 +110,17 @@ struct iommu_domain {
 	struct list_head mm_list;
 };
 
+enum iommu_io_type {
+	IO_TYPE_MM,
+};
+
+struct io_base {
+	int type;
+	int pasid;
+};
+
 struct io_mm {
-	int			pasid;
+	struct io_base		base;
 	struct list_head	devices;
 	struct kref		kref;
 #if defined(CONFIG_MMU_NOTIFIER)
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 03/14] iommu: Create a base struct for io_mm
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

In order to support both shared mm sva pagetables as well as
io-pgtable backed tables add a base structure to
io_mm so that the two styles can share the same idr.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu-v3.c |  8 ++++----
 drivers/iommu/iommu-sva.c   | 50 ++++++++++++++++++++++++++++++---------------
 include/linux/iommu.h       | 11 +++++++++-
 3 files changed, 47 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 26935a9a5a97..4736a2bf39cf 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2454,7 +2454,7 @@ static int arm_smmu_mm_attach(struct iommu_domain *domain, struct device *dev,
 	if (!attach_domain)
 		return 0;
 
-	return ops->set_entry(ops, io_mm->pasid, smmu_mm->cd);
+	return ops->set_entry(ops, io_mm->base.pasid, smmu_mm->cd);
 }
 
 static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
@@ -2466,9 +2466,9 @@ static void arm_smmu_mm_detach(struct iommu_domain *domain, struct device *dev,
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
 	if (detach_domain)
-		ops->clear_entry(ops, io_mm->pasid, smmu_mm->cd);
+		ops->clear_entry(ops, io_mm->base.pasid, smmu_mm->cd);
 
-	arm_smmu_atc_inv_master_all(master, io_mm->pasid);
+	arm_smmu_atc_inv_master_all(master, io_mm->base.pasid);
 	/* TODO: Invalidate all mappings if last and not DVM. */
 }
 
@@ -2478,7 +2478,7 @@ static void arm_smmu_mm_invalidate(struct iommu_domain *domain,
 {
 	struct arm_smmu_master_data *master = dev->iommu_fwspec->iommu_priv;
 
-	arm_smmu_atc_inv_master_range(master, io_mm->pasid, iova, size);
+	arm_smmu_atc_inv_master_range(master, io_mm->base.pasid, iova, size);
 	/*
 	 * TODO: Invalidate mapping if not DVM
 	 */
diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index d7b231cd7355..5fc689b1ef72 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -161,13 +161,15 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	io_mm->mm		= mm;
 	io_mm->notifier.ops	= &iommu_mmu_notifier;
 	io_mm->release		= domain->ops->mm_free;
+	io_mm->base.type	= IO_TYPE_MM;
+
 	INIT_LIST_HEAD(&io_mm->devices);
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&iommu_sva_lock);
-	pasid = idr_alloc_cyclic(&iommu_pasid_idr, io_mm, dev_param->min_pasid,
-				 dev_param->max_pasid + 1, GFP_ATOMIC);
-	io_mm->pasid = pasid;
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_mm->base,
+		dev_param->min_pasid, dev_param->max_pasid + 1, GFP_ATOMIC);
+	io_mm->base.pasid = pasid;
 	spin_unlock(&iommu_sva_lock);
 	idr_preload_end();
 
@@ -200,7 +202,7 @@ io_mm_alloc(struct iommu_domain *domain, struct device *dev,
 	 * 0 so no user could get a reference to it. Free it manually.
 	 */
 	spin_lock(&iommu_sva_lock);
-	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	idr_remove(&iommu_pasid_idr, io_mm->base.pasid);
 	spin_unlock(&iommu_sva_lock);
 
 err_free_mm:
@@ -231,7 +233,7 @@ static void io_mm_release(struct kref *kref)
 	io_mm = container_of(kref, struct io_mm, kref);
 	WARN_ON(!list_empty(&io_mm->devices));
 
-	idr_remove(&iommu_pasid_idr, io_mm->pasid);
+	idr_remove(&iommu_pasid_idr, io_mm->base.pasid);
 
 	/*
 	 * If we're being released from mm exit, the notifier callback ->release
@@ -286,7 +288,7 @@ static int io_mm_attach(struct iommu_domain *domain, struct device *dev,
 {
 	int ret;
 	bool attach_domain = true;
-	int pasid = io_mm->pasid;
+	int pasid = io_mm->base.pasid;
 	struct iommu_bond *bond, *tmp;
 	struct iommu_param *dev_param = dev->iommu_param;
 
@@ -378,7 +380,7 @@ static int iommu_signal_mm_exit(struct iommu_bond *bond)
 	if (!dev->iommu_param || !dev->iommu_param->mm_exit)
 		return 0;
 
-	return dev->iommu_param->mm_exit(dev, io_mm->pasid, bond->drvdata);
+	return dev->iommu_param->mm_exit(dev, io_mm->base.pasid, bond->drvdata);
 }
 
 /*
@@ -410,7 +412,7 @@ static void iommu_notifier_release(struct mmu_notifier *mn, struct mm_struct *mm
 	list_for_each_entry_safe(bond, next, &io_mm->devices, mm_head) {
 		if (iommu_signal_mm_exit(bond))
 			dev_WARN(bond->dev, "possible leak of PASID %u",
-				 io_mm->pasid);
+				 io_mm->base.pasid);
 
 		io_mm_detach_all_locked(bond);
 	}
@@ -585,6 +587,7 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 			  unsigned long flags, void *drvdata)
 {
 	int i, ret;
+	struct io_base *io_base = NULL;
 	struct io_mm *io_mm = NULL;
 	struct iommu_domain *domain;
 	struct iommu_bond *bond = NULL, *tmp;
@@ -605,7 +608,12 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 
 	/* If an io_mm already exists, use it */
 	spin_lock(&iommu_sva_lock);
-	idr_for_each_entry(&iommu_pasid_idr, io_mm, i) {
+	idr_for_each_entry(&iommu_pasid_idr, io_base, i) {
+		if (io_base->type != IO_TYPE_MM)
+			continue;
+
+		io_mm = container_of(io_base, struct io_mm, base);
+
 		if (io_mm->mm != mm || !io_mm_get_locked(io_mm))
 			continue;
 
@@ -636,7 +644,7 @@ int iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, int *pasid,
 	if (ret)
 		io_mm_put(io_mm);
 	else
-		*pasid = io_mm->pasid;
+		*pasid = io_mm->base.pasid;
 
 	return ret;
 }
@@ -659,6 +667,7 @@ EXPORT_SYMBOL_GPL(iommu_sva_bind_device);
 int iommu_sva_unbind_device(struct device *dev, int pasid)
 {
 	int ret = -ESRCH;
+	struct io_base *io_base;
 	struct io_mm *io_mm;
 	struct iommu_domain *domain;
 	struct iommu_bond *bond = NULL;
@@ -674,12 +683,14 @@ int iommu_sva_unbind_device(struct device *dev, int pasid)
 	iommu_fault_queue_flush(dev);
 
 	spin_lock(&iommu_sva_lock);
-	io_mm = idr_find(&iommu_pasid_idr, pasid);
-	if (!io_mm) {
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (!io_base || io_base->type != IO_TYPE_MM) {
 		spin_unlock(&iommu_sva_lock);
 		return -ESRCH;
 	}
 
+	io_mm = container_of(io_base, struct io_mm, base);
+
 	list_for_each_entry(bond, &io_mm->devices, mm_head) {
 		if (bond->dev == dev) {
 			io_mm_detach_locked(bond);
@@ -777,16 +788,21 @@ EXPORT_SYMBOL_GPL(iommu_unregister_mm_exit_handler);
  */
 struct mm_struct *iommu_sva_find(int pasid)
 {
+	struct io_base *io_base;
 	struct io_mm *io_mm;
 	struct mm_struct *mm = NULL;
 
 	spin_lock(&iommu_sva_lock);
-	io_mm = idr_find(&iommu_pasid_idr, pasid);
-	if (io_mm && io_mm_get_locked(io_mm)) {
-		if (mmget_not_zero(io_mm->mm))
-			mm = io_mm->mm;
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (io_base && io_base->type == IO_TYPE_MM) {
+		io_mm = container_of(io_base, struct io_mm, base);
+
+		if (io_mm_get_locked(io_mm)) {
+			if (mmget_not_zero(io_mm->mm))
+				mm = io_mm->mm;
 
-		io_mm_put_locked(io_mm);
+			io_mm_put_locked(io_mm);
+		}
 	}
 	spin_unlock(&iommu_sva_lock);
 
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e2c49e583d8d..e998389cf195 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -110,8 +110,17 @@ struct iommu_domain {
 	struct list_head mm_list;
 };
 
+enum iommu_io_type {
+	IO_TYPE_MM,
+};
+
+struct io_base {
+	int type;
+	int pasid;
+};
+
 struct io_mm {
-	int			pasid;
+	struct io_base		base;
 	struct list_head	devices;
 	struct kref		kref;
 #if defined(CONFIG_MMU_NOTIFIER)
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/14] iommu: sva: Add support for pasid allocation
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Some older SMMU implementations that do not have a fully featured
PASID model have alternate workarounds for using multiple pagetables.
For example, MSM GPUs have logic to automatically switch the user
pagetable from hardware by writing the context bank directly.

Instead of binding and sharing CPU pagetables these implementations
need to a new pagetable structure and populate it manually. Add a
new set of API functions to create and populate a pagetable structure
identified by a pasid.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/iommu-sva.c | 239 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |   3 +-
 include/linux/iommu.h     |  56 +++++++++++
 3 files changed, 297 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5fc689b1ef72..c48fde5b0bbd 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -809,3 +809,242 @@ struct mm_struct *iommu_sva_find(int pasid)
 	return mm;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
+{
+	int ret, pasid;
+	struct io_pasid *io_pasid;
+
+	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
+		return -ENODEV;
+
+	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
+	if (!io_pasid)
+		return -ENOMEM;
+
+	io_pasid->domain = domain;
+	io_pasid->base.type = IO_TYPE_PASID;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
+		1, (1 << 31), GFP_ATOMIC);
+	io_pasid->base.pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		kfree(io_pasid);
+		return pasid;
+	}
+
+	ret = domain->ops->pasid_alloc(domain, dev, pasid);
+	if (!ret)
+		return pasid;
+
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_pasid->base.pasid);
+	spin_unlock(&iommu_sva_lock);
+
+	kfree(io_pasid);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+static struct io_pasid *get_io_pasid(int pasid)
+{
+	struct io_base *io_base;
+	struct io_pasid *io_pasid = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (io_base && io_base->type == IO_TYPE_PASID)
+		io_pasid = container_of(io_base, struct io_pasid, base);
+	spin_unlock(&iommu_sva_lock);
+
+	return io_pasid;
+}
+
+int iommu_sva_map(int pasid, unsigned long iova,
+	      phys_addr_t paddr, size_t size, int prot)
+{
+	unsigned long orig_iova = iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	int ret = 0;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+
+	if (unlikely(domain->ops->sva_map == NULL ||
+		     domain->pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
+		       iova, &paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
+			 iova, &paddr, pgsize);
+
+		ret = domain->ops->sva_map(domain, pasid, iova, paddr, pgsize,
+			prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_sva_unmap(pasid, orig_iova, orig_size - size);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map);
+
+size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot)
+{
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	struct scatterlist *s;
+	size_t mapped = 0;
+	unsigned int i, min_pagesz;
+	int ret;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+
+	if (unlikely(domain->pgsize_bitmap == 0UL))
+		return 0;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	for_each_sg(sg, s, nents, i) {
+		phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+
+		/*
+		 * We are mapping on IOMMU page boundaries, so offset within
+		 * the page must be 0. However, the IOMMU may support pages
+		 * smaller than PAGE_SIZE, so s->offset may still represent
+		 * an offset of that boundary within the CPU page.
+		 */
+		if (!IS_ALIGNED(s->offset, min_pagesz))
+			goto out_err;
+
+		ret = iommu_sva_map(pasid, iova + mapped, phys, s->length,
+			prot);
+		if (ret)
+			goto out_err;
+
+		mapped += s->length;
+	}
+
+	return mapped;
+
+out_err:
+	/* undo mappings already done */
+	iommu_unmap(domain, iova, mapped);
+
+	return 0;
+
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map_sg);
+
+size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size)
+{
+	const struct iommu_ops *ops;
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	size_t unmapped_page, unmapped = 0;
+	unsigned int min_pagesz;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+	ops = domain->ops;
+
+	if (unlikely(ops->sva_unmap == NULL ||
+		     domain->pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+
+		unmapped_page = ops->sva_unmap(domain, pasid, iova, pgsize);
+		if (!unmapped_page)
+			break;
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap);
+
+void iommu_sva_free_pasid(int pasid)
+{
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return;
+
+	domain = io_pasid->domain;
+
+	domain->ops->pasid_free(domain, pasid);
+
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_pasid->base.pasid);
+	spin_unlock(&iommu_sva_lock);
+
+	kfree(io_pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7f8395b620b1..da3728388069 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1627,7 +1627,7 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
+size_t iommu_pgsize(struct iommu_domain *domain,
 			   unsigned long addr_merge, size_t size)
 {
 	unsigned int pgsize_idx;
@@ -1658,6 +1658,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 
 	return pgsize;
 }
+EXPORT_SYMBOL_GPL(iommu_pgsize);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e998389cf195..6e34b87655a7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -112,6 +112,7 @@ struct iommu_domain {
 
 enum iommu_io_type {
 	IO_TYPE_MM,
+	IO_TYPE_PASID,
 };
 
 struct io_base {
@@ -134,6 +135,11 @@ struct io_mm {
 	struct rcu_head		rcu;
 };
 
+struct io_pasid {
+	struct io_base		base;
+	struct iommu_domain	*domain;
+};
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
@@ -347,6 +353,15 @@ struct iommu_ops {
 	int (*page_response)(struct iommu_domain *domain, struct device *dev,
 			     struct page_response_msg *msg);
 
+	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
+		int pasid);
+	int (*sva_map)(struct iommu_domain *domain, int pasid,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
+	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
+			    unsigned long iova, size_t size);
+	void (*pasid_free)(struct iommu_domain *domain, int pasid);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -577,6 +592,9 @@ extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
 
+size_t iommu_pgsize(struct iommu_domain *domain,
+			   unsigned long addr_merge, size_t size);
+
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
 				      phys_addr_t offset, u64 size,
@@ -1002,6 +1020,16 @@ extern int iommu_register_mm_exit_handler(struct device *dev,
 extern int iommu_unregister_mm_exit_handler(struct device *dev);
 
 extern struct mm_struct *iommu_sva_find(int pasid);
+
+extern int iommu_sva_alloc_pasid(struct iommu_domain *domain,
+		struct device *dev);
+extern int iommu_sva_map(int pasid, unsigned long iova, phys_addr_t physaddr,
+		size_t size, int prot);
+extern size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot);
+extern size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size);
+extern void iommu_sva_free_pasid(int pasid);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -1046,6 +1074,34 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 {
 	return NULL;
 }
+
+static inline int iommu_sva_alloc_pasid(struct iommu_domain *domain,
+		struct device *dev)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int iommu_sva_map(int pasid, unsigned long iova,
+		phys_addr_t physaddr, size_t size, int prot)
+{
+	return -ENODEV;
+}
+
+
+size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size)
+{
+	return size;
+}
+
+extern void iommu_sva_free_pasid(int pasid) { }
+
+
 #endif /* CONFIG_IOMMU_SVA */
 
 #ifdef CONFIG_IOMMU_FAULT
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/14] iommu: sva: Add support for pasid allocation
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Some older SMMU implementations that do not have a fully featured
PASID model have alternate workarounds for using multiple pagetables.
For example, MSM GPUs have logic to automatically switch the user
pagetable from hardware by writing the context bank directly.

Instead of binding and sharing CPU pagetables these implementations
need to a new pagetable structure and populate it manually. Add a
new set of API functions to create and populate a pagetable structure
identified by a pasid.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/iommu-sva.c | 239 ++++++++++++++++++++++++++++++++++++++++++++++
 drivers/iommu/iommu.c     |   3 +-
 include/linux/iommu.h     |  56 +++++++++++
 3 files changed, 297 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu-sva.c b/drivers/iommu/iommu-sva.c
index 5fc689b1ef72..c48fde5b0bbd 100644
--- a/drivers/iommu/iommu-sva.c
+++ b/drivers/iommu/iommu-sva.c
@@ -809,3 +809,242 @@ struct mm_struct *iommu_sva_find(int pasid)
 	return mm;
 }
 EXPORT_SYMBOL_GPL(iommu_sva_find);
+
+int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
+{
+	int ret, pasid;
+	struct io_pasid *io_pasid;
+
+	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
+		return -ENODEV;
+
+	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
+	if (!io_pasid)
+		return -ENOMEM;
+
+	io_pasid->domain = domain;
+	io_pasid->base.type = IO_TYPE_PASID;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&iommu_sva_lock);
+	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
+		1, (1 << 31), GFP_ATOMIC);
+	io_pasid->base.pasid = pasid;
+	spin_unlock(&iommu_sva_lock);
+	idr_preload_end();
+
+	if (pasid < 0) {
+		kfree(io_pasid);
+		return pasid;
+	}
+
+	ret = domain->ops->pasid_alloc(domain, dev, pasid);
+	if (!ret)
+		return pasid;
+
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_pasid->base.pasid);
+	spin_unlock(&iommu_sva_lock);
+
+	kfree(io_pasid);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_alloc_pasid);
+
+static struct io_pasid *get_io_pasid(int pasid)
+{
+	struct io_base *io_base;
+	struct io_pasid *io_pasid = NULL;
+
+	spin_lock(&iommu_sva_lock);
+	io_base = idr_find(&iommu_pasid_idr, pasid);
+	if (io_base && io_base->type == IO_TYPE_PASID)
+		io_pasid = container_of(io_base, struct io_pasid, base);
+	spin_unlock(&iommu_sva_lock);
+
+	return io_pasid;
+}
+
+int iommu_sva_map(int pasid, unsigned long iova,
+	      phys_addr_t paddr, size_t size, int prot)
+{
+	unsigned long orig_iova = iova;
+	unsigned int min_pagesz;
+	size_t orig_size = size;
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	int ret = 0;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+
+	if (unlikely(domain->ops->sva_map == NULL ||
+		     domain->pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	/*
+	 * both the virtual address and the physical one, as well as
+	 * the size of the mapping, must be aligned (at least) to the
+	 * size of the smallest page supported by the hardware
+	 */
+	if (!IS_ALIGNED(iova | paddr | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx pa %pa size 0x%zx min_pagesz 0x%x\n",
+		       iova, &paddr, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	while (size) {
+		size_t pgsize = iommu_pgsize(domain, iova | paddr, size);
+
+		pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n",
+			 iova, &paddr, pgsize);
+
+		ret = domain->ops->sva_map(domain, pasid, iova, paddr, pgsize,
+			prot);
+		if (ret)
+			break;
+
+		iova += pgsize;
+		paddr += pgsize;
+		size -= pgsize;
+	}
+
+	/* unroll mapping in case something went wrong */
+	if (ret)
+		iommu_sva_unmap(pasid, orig_iova, orig_size - size);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map);
+
+size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot)
+{
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	struct scatterlist *s;
+	size_t mapped = 0;
+	unsigned int i, min_pagesz;
+	int ret;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+
+	if (unlikely(domain->pgsize_bitmap == 0UL))
+		return 0;
+
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	for_each_sg(sg, s, nents, i) {
+		phys_addr_t phys = page_to_phys(sg_page(s)) + s->offset;
+
+		/*
+		 * We are mapping on IOMMU page boundaries, so offset within
+		 * the page must be 0. However, the IOMMU may support pages
+		 * smaller than PAGE_SIZE, so s->offset may still represent
+		 * an offset of that boundary within the CPU page.
+		 */
+		if (!IS_ALIGNED(s->offset, min_pagesz))
+			goto out_err;
+
+		ret = iommu_sva_map(pasid, iova + mapped, phys, s->length,
+			prot);
+		if (ret)
+			goto out_err;
+
+		mapped += s->length;
+	}
+
+	return mapped;
+
+out_err:
+	/* undo mappings already done */
+	iommu_unmap(domain, iova, mapped);
+
+	return 0;
+
+}
+EXPORT_SYMBOL_GPL(iommu_sva_map_sg);
+
+size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size)
+{
+	const struct iommu_ops *ops;
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+	size_t unmapped_page, unmapped = 0;
+	unsigned int min_pagesz;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return -ENODEV;
+
+	domain = io_pasid->domain;
+	ops = domain->ops;
+
+	if (unlikely(ops->sva_unmap == NULL ||
+		     domain->pgsize_bitmap == 0UL))
+		return -ENODEV;
+
+	/* find out the minimum page size supported */
+	min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+	/*
+	 * The virtual address, as well as the size of the mapping, must be
+	 * aligned (at least) to the size of the smallest page supported
+	 * by the hardware
+	 */
+	if (!IS_ALIGNED(iova | size, min_pagesz)) {
+		pr_err("unaligned: iova 0x%lx size 0x%zx min_pagesz 0x%x\n",
+		       iova, size, min_pagesz);
+		return -EINVAL;
+	}
+
+	/*
+	 * Keep iterating until we either unmap 'size' bytes (or more)
+	 * or we hit an area that isn't mapped.
+	 */
+	while (unmapped < size) {
+		size_t pgsize = iommu_pgsize(domain, iova, size - unmapped);
+
+		unmapped_page = ops->sva_unmap(domain, pasid, iova, pgsize);
+		if (!unmapped_page)
+			break;
+
+		iova += unmapped_page;
+		unmapped += unmapped_page;
+	}
+
+	return unmapped;
+}
+EXPORT_SYMBOL_GPL(iommu_sva_unmap);
+
+void iommu_sva_free_pasid(int pasid)
+{
+	struct io_pasid *io_pasid;
+	struct iommu_domain *domain;
+
+	io_pasid = get_io_pasid(pasid);
+	if (!io_pasid)
+		return;
+
+	domain = io_pasid->domain;
+
+	domain->ops->pasid_free(domain, pasid);
+
+	spin_lock(&iommu_sva_lock);
+	idr_remove(&iommu_pasid_idr, io_pasid->base.pasid);
+	spin_unlock(&iommu_sva_lock);
+
+	kfree(io_pasid);
+}
+EXPORT_SYMBOL_GPL(iommu_sva_free_pasid);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 7f8395b620b1..da3728388069 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1627,7 +1627,7 @@ phys_addr_t iommu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova)
 }
 EXPORT_SYMBOL_GPL(iommu_iova_to_phys);
 
-static size_t iommu_pgsize(struct iommu_domain *domain,
+size_t iommu_pgsize(struct iommu_domain *domain,
 			   unsigned long addr_merge, size_t size)
 {
 	unsigned int pgsize_idx;
@@ -1658,6 +1658,7 @@ static size_t iommu_pgsize(struct iommu_domain *domain,
 
 	return pgsize;
 }
+EXPORT_SYMBOL_GPL(iommu_pgsize);
 
 int iommu_map(struct iommu_domain *domain, unsigned long iova,
 	      phys_addr_t paddr, size_t size, int prot)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index e998389cf195..6e34b87655a7 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -112,6 +112,7 @@ struct iommu_domain {
 
 enum iommu_io_type {
 	IO_TYPE_MM,
+	IO_TYPE_PASID,
 };
 
 struct io_base {
@@ -134,6 +135,11 @@ struct io_mm {
 	struct rcu_head		rcu;
 };
 
+struct io_pasid {
+	struct io_base		base;
+	struct iommu_domain	*domain;
+};
+
 enum iommu_cap {
 	IOMMU_CAP_CACHE_COHERENCY,	/* IOMMU can enforce cache coherent DMA
 					   transactions */
@@ -347,6 +353,15 @@ struct iommu_ops {
 	int (*page_response)(struct iommu_domain *domain, struct device *dev,
 			     struct page_response_msg *msg);
 
+	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
+		int pasid);
+	int (*sva_map)(struct iommu_domain *domain, int pasid,
+		       unsigned long iova, phys_addr_t paddr, size_t size,
+		       int prot);
+	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
+			    unsigned long iova, size_t size);
+	void (*pasid_free)(struct iommu_domain *domain, int pasid);
+
 	unsigned long pgsize_bitmap;
 };
 
@@ -577,6 +592,9 @@ extern int iommu_domain_get_attr(struct iommu_domain *domain, enum iommu_attr,
 extern int iommu_domain_set_attr(struct iommu_domain *domain, enum iommu_attr,
 				 void *data);
 
+size_t iommu_pgsize(struct iommu_domain *domain,
+			   unsigned long addr_merge, size_t size);
+
 /* Window handling function prototypes */
 extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
 				      phys_addr_t offset, u64 size,
@@ -1002,6 +1020,16 @@ extern int iommu_register_mm_exit_handler(struct device *dev,
 extern int iommu_unregister_mm_exit_handler(struct device *dev);
 
 extern struct mm_struct *iommu_sva_find(int pasid);
+
+extern int iommu_sva_alloc_pasid(struct iommu_domain *domain,
+		struct device *dev);
+extern int iommu_sva_map(int pasid, unsigned long iova, phys_addr_t physaddr,
+		size_t size, int prot);
+extern size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot);
+extern size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size);
+extern void iommu_sva_free_pasid(int pasid);
+
 #else /* CONFIG_IOMMU_SVA */
 static inline int iommu_sva_device_init(struct device *dev,
 					unsigned long features,
@@ -1046,6 +1074,34 @@ static inline struct mm_struct *iommu_sva_find(int pasid)
 {
 	return NULL;
 }
+
+static inline int iommu_sva_alloc_pasid(struct iommu_domain *domain,
+		struct device *dev)
+{
+	return -EOPNOTSUPP;
+}
+
+static inline int iommu_sva_map(int pasid, unsigned long iova,
+		phys_addr_t physaddr, size_t size, int prot)
+{
+	return -ENODEV;
+}
+
+
+size_t iommu_sva_map_sg(int pasid, unsigned long iova,
+		struct scatterlist *sg, unsigned int nents, int prot)
+{
+	return 0;
+}
+
+static inline size_t iommu_sva_unmap(int pasid, unsigned long iova, size_t size)
+{
+	return size;
+}
+
+extern void iommu_sva_free_pasid(int pasid) { }
+
+
 #endif /* CONFIG_IOMMU_SVA */
 
 #ifdef CONFIG_IOMMU_FAULT
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/14] iommu: arm-smmu: Add pasid implementation
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add support for allocating and populating pagetables
indexed by pasid. Each new pasid is allocated a pagetable
with the same parameters and format as the parent domain.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 148 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 143 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ebfa59b59622..42f5bfa3e26e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -250,6 +250,9 @@ struct arm_smmu_domain {
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	u32 attributes;
 	struct iommu_domain		domain;
+
+	spinlock_t			pasid_lock;
+	struct list_head		pasid_list;
 };
 
 struct arm_smmu_option_prop {
@@ -257,6 +260,139 @@ struct arm_smmu_option_prop {
 	const char *prop;
 };
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+struct arm_smmu_pasid {
+	struct iommu_domain *domain;
+	struct io_pgtable_ops		*pgtbl_ops;
+	struct list_head node;
+	int pasid;
+};
+
+struct arm_smmu_pasid *arm_smmu_get_pasid(struct arm_smmu_domain *smmu_domain,
+		int pasid)
+{
+	struct arm_smmu_pasid *node, *obj = NULL;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
+		if (node->pasid == pasid) {
+			obj = node;
+			break;
+		}
+	}
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	return obj;
+}
+
+static void arm_smmu_pasid_free(struct iommu_domain *domain, int pasid)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_pasid *node, *obj = NULL;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
+		if (node->pasid == pasid) {
+			obj = node;
+			list_del(&obj->node);
+			break;
+		}
+	}
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	if (obj)
+		free_io_pgtable_ops(obj->pgtbl_ops);
+
+	kfree(obj);
+}
+
+static size_t arm_smmu_sva_unmap(struct iommu_domain *domain, int pasid,
+		unsigned long iova, size_t size)
+{
+	struct arm_smmu_pasid *obj =
+		arm_smmu_get_pasid(to_smmu_domain(domain), pasid);
+
+	if (!obj)
+		return -ENODEV;
+
+	return obj->pgtbl_ops->unmap(obj->pgtbl_ops, iova, size);
+}
+
+
+static int arm_smmu_sva_map(struct iommu_domain *domain, int pasid,
+		unsigned long iova, phys_addr_t paddr, size_t size, int prot)
+{
+	struct arm_smmu_pasid *obj =
+		arm_smmu_get_pasid(to_smmu_domain(domain), pasid);
+
+	if (!obj)
+		return -ENODEV;
+
+	return obj->pgtbl_ops->map(obj->pgtbl_ops, iova, paddr, size, prot);
+}
+
+static int arm_smmu_pasid_alloc(struct iommu_domain *domain, struct device *dev,
+		int pasid)
+{
+	struct arm_smmu_pasid *obj;
+	struct io_pgtable_cfg pgtbl_cfg;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	enum io_pgtable_fmt fmt;
+	unsigned long ias, oas;
+
+	/* Only allow pasid backed tables to be created on S1 domains */
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return -ENOMEM;
+
+	/* Get the same exact format as the parent domain */
+	ias = smmu->va_size;
+	oas = smmu->ipa_size;
+
+	if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
+		fmt = ARM_64_LPAE_S1;
+	else if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
+		fmt = ARM_32_LPAE_S1;
+		ias = min(ias, 32UL);
+		oas = min(oas, 40UL);
+	} else {
+		fmt = ARM_V7S;
+		ias = min(ias, 32UL);
+		oas = min(oas, 32UL);
+	}
+
+	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.pgsize_bitmap = smmu->pgsize_bitmap,
+		.ias = ias,
+		.oas = oas,
+		.tlb = NULL,
+		.iommu_dev = smmu->dev
+	};
+
+	obj->pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	if (!obj->pgtbl_ops) {
+		kfree(obj);
+		return -ENOMEM;
+	}
+
+	obj->domain = domain;
+	obj->pasid = pasid;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_add_tail(&obj->node, &smmu_domain->pasid_list);
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	return 0;
+}
+
 static atomic_t cavium_smmu_context_count = ATOMIC_INIT(0);
 
 static bool using_legacy_binding, using_generic_binding;
@@ -266,11 +402,6 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ 0, NULL},
 };
 
-static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct arm_smmu_domain, domain);
-}
-
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -961,6 +1092,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->cb_lock);
 
+	spin_lock_init(&smmu_domain->pasid_lock);
+	INIT_LIST_HEAD(&smmu_domain->pasid_list);
+
 	return &smmu_domain->domain;
 }
 
@@ -1588,6 +1722,10 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= arm_smmu_put_resv_regions,
+	.pasid_alloc		= arm_smmu_pasid_alloc,
+	.sva_map		= arm_smmu_sva_map,
+	.sva_unmap		= arm_smmu_sva_unmap,
+	.pasid_free		= arm_smmu_pasid_free,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
 };
 
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/14] iommu: arm-smmu: Add pasid implementation
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for allocating and populating pagetables
indexed by pasid. Each new pasid is allocated a pagetable
with the same parameters and format as the parent domain.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 148 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 143 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index ebfa59b59622..42f5bfa3e26e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -250,6 +250,9 @@ struct arm_smmu_domain {
 	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
 	u32 attributes;
 	struct iommu_domain		domain;
+
+	spinlock_t			pasid_lock;
+	struct list_head		pasid_list;
 };
 
 struct arm_smmu_option_prop {
@@ -257,6 +260,139 @@ struct arm_smmu_option_prop {
 	const char *prop;
 };
 
+static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
+{
+	return container_of(dom, struct arm_smmu_domain, domain);
+}
+
+struct arm_smmu_pasid {
+	struct iommu_domain *domain;
+	struct io_pgtable_ops		*pgtbl_ops;
+	struct list_head node;
+	int pasid;
+};
+
+struct arm_smmu_pasid *arm_smmu_get_pasid(struct arm_smmu_domain *smmu_domain,
+		int pasid)
+{
+	struct arm_smmu_pasid *node, *obj = NULL;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
+		if (node->pasid == pasid) {
+			obj = node;
+			break;
+		}
+	}
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	return obj;
+}
+
+static void arm_smmu_pasid_free(struct iommu_domain *domain, int pasid)
+{
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_pasid *node, *obj = NULL;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
+		if (node->pasid == pasid) {
+			obj = node;
+			list_del(&obj->node);
+			break;
+		}
+	}
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	if (obj)
+		free_io_pgtable_ops(obj->pgtbl_ops);
+
+	kfree(obj);
+}
+
+static size_t arm_smmu_sva_unmap(struct iommu_domain *domain, int pasid,
+		unsigned long iova, size_t size)
+{
+	struct arm_smmu_pasid *obj =
+		arm_smmu_get_pasid(to_smmu_domain(domain), pasid);
+
+	if (!obj)
+		return -ENODEV;
+
+	return obj->pgtbl_ops->unmap(obj->pgtbl_ops, iova, size);
+}
+
+
+static int arm_smmu_sva_map(struct iommu_domain *domain, int pasid,
+		unsigned long iova, phys_addr_t paddr, size_t size, int prot)
+{
+	struct arm_smmu_pasid *obj =
+		arm_smmu_get_pasid(to_smmu_domain(domain), pasid);
+
+	if (!obj)
+		return -ENODEV;
+
+	return obj->pgtbl_ops->map(obj->pgtbl_ops, iova, paddr, size, prot);
+}
+
+static int arm_smmu_pasid_alloc(struct iommu_domain *domain, struct device *dev,
+		int pasid)
+{
+	struct arm_smmu_pasid *obj;
+	struct io_pgtable_cfg pgtbl_cfg;
+	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	enum io_pgtable_fmt fmt;
+	unsigned long ias, oas;
+
+	/* Only allow pasid backed tables to be created on S1 domains */
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -EINVAL;
+
+	obj = kzalloc(sizeof(*obj), GFP_KERNEL);
+	if (!obj)
+		return -ENOMEM;
+
+	/* Get the same exact format as the parent domain */
+	ias = smmu->va_size;
+	oas = smmu->ipa_size;
+
+	if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64)
+		fmt = ARM_64_LPAE_S1;
+	else if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH32_L) {
+		fmt = ARM_32_LPAE_S1;
+		ias = min(ias, 32UL);
+		oas = min(oas, 40UL);
+	} else {
+		fmt = ARM_V7S;
+		ias = min(ias, 32UL);
+		oas = min(oas, 32UL);
+	}
+
+	pgtbl_cfg = (struct io_pgtable_cfg) {
+		.pgsize_bitmap = smmu->pgsize_bitmap,
+		.ias = ias,
+		.oas = oas,
+		.tlb = NULL,
+		.iommu_dev = smmu->dev
+	};
+
+	obj->pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
+	if (!obj->pgtbl_ops) {
+		kfree(obj);
+		return -ENOMEM;
+	}
+
+	obj->domain = domain;
+	obj->pasid = pasid;
+
+	spin_lock(&smmu_domain->pasid_lock);
+	list_add_tail(&obj->node, &smmu_domain->pasid_list);
+	spin_unlock(&smmu_domain->pasid_lock);
+
+	return 0;
+}
+
 static atomic_t cavium_smmu_context_count = ATOMIC_INIT(0);
 
 static bool using_legacy_binding, using_generic_binding;
@@ -266,11 +402,6 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ 0, NULL},
 };
 
-static struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
-{
-	return container_of(dom, struct arm_smmu_domain, domain);
-}
-
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -961,6 +1092,9 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	mutex_init(&smmu_domain->init_mutex);
 	spin_lock_init(&smmu_domain->cb_lock);
 
+	spin_lock_init(&smmu_domain->pasid_lock);
+	INIT_LIST_HEAD(&smmu_domain->pasid_list);
+
 	return &smmu_domain->domain;
 }
 
@@ -1588,6 +1722,10 @@ static struct iommu_ops arm_smmu_ops = {
 	.of_xlate		= arm_smmu_of_xlate,
 	.get_resv_regions	= arm_smmu_get_resv_regions,
 	.put_resv_regions	= arm_smmu_put_resv_regions,
+	.pasid_alloc		= arm_smmu_pasid_alloc,
+	.sva_map		= arm_smmu_sva_map,
+	.sva_unmap		= arm_smmu_sva_unmap,
+	.pasid_free		= arm_smmu_pasid_free,
 	.pgsize_bitmap		= -1UL, /* Restricted during device attach */
 };
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/14] iommu: arm-smmu: Add side-band function to specific pasid callbacks
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Just allowing a client driver to create and manage a
a software pasid isn't interesting if the client driver doesn't have
enough information about the pagetable to be able to use it. Add a
side band function for arm-smmu that lets the client device register
pasid operations to pass the relevant pagetable information to the
client driver whenever a new pasid is created or destroyed

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/arm-smmu.h | 18 ++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 include/linux/arm-smmu.h

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 42f5bfa3e26e..81c781705e22 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -50,6 +50,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/arm-smmu.h>
 
 #include <linux/amba/bus.h>
 
@@ -253,6 +254,8 @@ struct arm_smmu_domain {
 
 	spinlock_t			pasid_lock;
 	struct list_head		pasid_list;
+	const struct arm_smmu_pasid_ops	*pasid_ops;
+	void				*pasid_data;
 };
 
 struct arm_smmu_option_prop {
@@ -294,6 +297,10 @@ static void arm_smmu_pasid_free(struct iommu_domain *domain, int pasid)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_pasid *node, *obj = NULL;
 
+	if (smmu_domain->pasid_ops && smmu_domain->pasid_ops->remove_pasid)
+		smmu_domain->pasid_ops->remove_pasid(pasid,
+			smmu_domain->pasid_data);
+
 	spin_lock(&smmu_domain->pasid_lock);
 	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
 		if (node->pasid == pasid) {
@@ -386,6 +393,26 @@ static int arm_smmu_pasid_alloc(struct iommu_domain *domain, struct device *dev,
 	obj->domain = domain;
 	obj->pasid = pasid;
 
+	if (smmu_domain->pasid_ops && smmu_domain->pasid_ops->install_pasid) {
+		int ret;
+		u64 ttbr;
+
+		if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH32_S)
+			ttbr = pgtbl_cfg.arm_v7s_cfg.ttbr[0];
+		else
+			ttbr = pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0];
+
+		ret = smmu_domain->pasid_ops->install_pasid(pasid, ttbr,
+			smmu_domain->cfg.asid, smmu_domain->pasid_data);
+
+		if (ret) {
+			free_io_pgtable_ops(obj->pgtbl_ops);
+			kfree(obj);
+
+			return ret;
+		}
+	}
+
 	spin_lock(&smmu_domain->pasid_lock);
 	list_add_tail(&obj->node, &smmu_domain->pasid_list);
 	spin_unlock(&smmu_domain->pasid_lock);
@@ -2046,6 +2073,19 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+void arm_smmu_add_pasid_ops(struct iommu_domain *domain,
+	const struct arm_smmu_pasid_ops *ops, void *data)
+{
+	struct arm_smmu_domain *smmu_domain;
+
+	if (domain) {
+		smmu_domain = to_smmu_domain(domain);
+		smmu_domain->pasid_ops = ops;
+		smmu_domain->pasid_data = data;
+	}
+}
+EXPORT_SYMBOL_GPL(arm_smmu_add_pasid_ops);
+
 struct arm_smmu_match_data {
 	enum arm_smmu_arch_version version;
 	enum arm_smmu_implementation model;
diff --git a/include/linux/arm-smmu.h b/include/linux/arm-smmu.h
new file mode 100644
index 000000000000..c14ca52231bf
--- /dev/null
+++ b/include/linux/arm-smmu.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018, The Linux Foundation. All rights reserved. */
+
+#ifndef ARM_SMMU_H_
+#define ARM_SMMU_H_
+
+struct iommu_domain;
+
+struct arm_smmu_pasid_ops {
+	int (*install_pasid)(int pasid, u64 ttbr, u32 asid, void *data);
+	void (*remove_pasid)(int pasid, void *data);
+};
+
+
+void arm_smmu_add_pasid_ops(struct iommu_domain *domain,
+	const struct arm_smmu_pasid_ops *ops, void *data);
+
+#endif
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/14] iommu: arm-smmu: Add side-band function to specific pasid callbacks
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Just allowing a client driver to create and manage a
a software pasid isn't interesting if the client driver doesn't have
enough information about the pagetable to be able to use it. Add a
side band function for arm-smmu that lets the client device register
pasid operations to pass the relevant pagetable information to the
client driver whenever a new pasid is created or destroyed

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 40 ++++++++++++++++++++++++++++++++++++++++
 include/linux/arm-smmu.h | 18 ++++++++++++++++++
 2 files changed, 58 insertions(+)
 create mode 100644 include/linux/arm-smmu.h

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 42f5bfa3e26e..81c781705e22 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -50,6 +50,7 @@
 #include <linux/platform_device.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
+#include <linux/arm-smmu.h>
 
 #include <linux/amba/bus.h>
 
@@ -253,6 +254,8 @@ struct arm_smmu_domain {
 
 	spinlock_t			pasid_lock;
 	struct list_head		pasid_list;
+	const struct arm_smmu_pasid_ops	*pasid_ops;
+	void				*pasid_data;
 };
 
 struct arm_smmu_option_prop {
@@ -294,6 +297,10 @@ static void arm_smmu_pasid_free(struct iommu_domain *domain, int pasid)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_pasid *node, *obj = NULL;
 
+	if (smmu_domain->pasid_ops && smmu_domain->pasid_ops->remove_pasid)
+		smmu_domain->pasid_ops->remove_pasid(pasid,
+			smmu_domain->pasid_data);
+
 	spin_lock(&smmu_domain->pasid_lock);
 	list_for_each_entry(node, &smmu_domain->pasid_list, node) {
 		if (node->pasid == pasid) {
@@ -386,6 +393,26 @@ static int arm_smmu_pasid_alloc(struct iommu_domain *domain, struct device *dev,
 	obj->domain = domain;
 	obj->pasid = pasid;
 
+	if (smmu_domain->pasid_ops && smmu_domain->pasid_ops->install_pasid) {
+		int ret;
+		u64 ttbr;
+
+		if (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH32_S)
+			ttbr = pgtbl_cfg.arm_v7s_cfg.ttbr[0];
+		else
+			ttbr = pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0];
+
+		ret = smmu_domain->pasid_ops->install_pasid(pasid, ttbr,
+			smmu_domain->cfg.asid, smmu_domain->pasid_data);
+
+		if (ret) {
+			free_io_pgtable_ops(obj->pgtbl_ops);
+			kfree(obj);
+
+			return ret;
+		}
+	}
+
 	spin_lock(&smmu_domain->pasid_lock);
 	list_add_tail(&obj->node, &smmu_domain->pasid_list);
 	spin_unlock(&smmu_domain->pasid_lock);
@@ -2046,6 +2073,19 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+void arm_smmu_add_pasid_ops(struct iommu_domain *domain,
+	const struct arm_smmu_pasid_ops *ops, void *data)
+{
+	struct arm_smmu_domain *smmu_domain;
+
+	if (domain) {
+		smmu_domain = to_smmu_domain(domain);
+		smmu_domain->pasid_ops = ops;
+		smmu_domain->pasid_data = data;
+	}
+}
+EXPORT_SYMBOL_GPL(arm_smmu_add_pasid_ops);
+
 struct arm_smmu_match_data {
 	enum arm_smmu_arch_version version;
 	enum arm_smmu_implementation model;
diff --git a/include/linux/arm-smmu.h b/include/linux/arm-smmu.h
new file mode 100644
index 000000000000..c14ca52231bf
--- /dev/null
+++ b/include/linux/arm-smmu.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (c) 2018, The Linux Foundation. All rights reserved. */
+
+#ifndef ARM_SMMU_H_
+#define ARM_SMMU_H_
+
+struct iommu_domain;
+
+struct arm_smmu_pasid_ops {
+	int (*install_pasid)(int pasid, u64 ttbr, u32 asid, void *data);
+	void (*remove_pasid)(int pasid, void *data);
+};
+
+
+void arm_smmu_add_pasid_ops(struct iommu_domain *domain,
+	const struct arm_smmu_pasid_ops *ops, void *data);
+
+#endif
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/14] drm/msm: Enable 64 bit mode by default
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

A5XX GPUs can be run in either 32 or 64 bit mode. The GPU registers
and the microcode use 64 bit virtual addressing in either case but the
upper 32 bits are ignored if the GPU is in 32 bit mode. There is no
performance disadvantage to remaining in 64 bit mode even if we are
only generating 32 bit addresses so switch over now to prepare for
using addresses above 4G for targets that support them.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 14 ++++++++++++++
 drivers/gpu/drm/msm/msm_iommu.c       |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 7e09d44e4a15..c106606887e2 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -695,6 +695,20 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x00000000);
 	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x00000000);
 
+	/* Put the GPU into 64 bit by default */
+	gpu_write(gpu, REG_A5XX_CP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VSC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_GRAS_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RB_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_PC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_HLSQ_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VFD_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VPC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_UCHE_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_SP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_TPL1_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_ADDR_MODE_CNTL, 0x1);
+
 	ret = adreno_hw_init(gpu);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index b23d33622f37..fdbe1a8372f0 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -30,7 +30,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	struct msm_iommu *iommu = arg;
 	if (iommu->base.handler)
 		return iommu->base.handler(iommu->base.arg, iova, flags);
-	pr_warn_ratelimited("*** fault: iova=%08lx, flags=%d\n", iova, flags);
+	pr_warn_ratelimited("*** fault: iova=%16lx, flags=%d\n", iova, flags);
 	return 0;
 }
 
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/14] drm/msm: Enable 64 bit mode by default
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

A5XX GPUs can be run in either 32 or 64 bit mode. The GPU registers
and the microcode use 64 bit virtual addressing in either case but the
upper 32 bits are ignored if the GPU is in 32 bit mode. There is no
performance disadvantage to remaining in 64 bit mode even if we are
only generating 32 bit addresses so switch over now to prepare for
using addresses above 4G for targets that support them.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 14 ++++++++++++++
 drivers/gpu/drm/msm/msm_iommu.c       |  2 +-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 7e09d44e4a15..c106606887e2 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -695,6 +695,20 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x00000000);
 	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x00000000);
 
+	/* Put the GPU into 64 bit by default */
+	gpu_write(gpu, REG_A5XX_CP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VSC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_GRAS_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RB_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_PC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_HLSQ_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VFD_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VPC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_UCHE_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_SP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_TPL1_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_ADDR_MODE_CNTL, 0x1);
+
 	ret = adreno_hw_init(gpu);
 	if (ret)
 		return ret;
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index b23d33622f37..fdbe1a8372f0 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -30,7 +30,7 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	struct msm_iommu *iommu = arg;
 	if (iommu->base.handler)
 		return iommu->base.handler(iommu->base.arg, iova, flags);
-	pr_warn_ratelimited("*** fault: iova=%08lx, flags=%d\n", iova, flags);
+	pr_warn_ratelimited("*** fault: iova=%16lx, flags=%d\n", iova, flags);
 	return 0;
 }
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/14] drm/msm: Pass the MMU domain index in struct msm_file_private
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Pass the index of the MMU domain in struct msm_file_private instead
of assuming gpu->id throughout the submit path. This clears the way
to change ctx->aspace to a per-instance pagetable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c        | 16 ++++------------
 drivers/gpu/drm/msm/msm_drv.h        |  1 +
 drivers/gpu/drm/msm/msm_gem.h        |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 13 ++++++++-----
 drivers/gpu/drm/msm/msm_gpu.c        |  5 ++---
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index d90ef1d78a1b..74dd09db93d7 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -502,6 +502,7 @@ static void load_gpu(struct drm_device *dev)
 
 static int context_init(struct drm_device *dev, struct drm_file *file)
 {
+	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_file_private *ctx;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -510,6 +511,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 
 	msm_submitqueue_init(dev, ctx);
 
+	ctx->aspace = priv->gpu->aspace;
 	file->driver_priv = ctx;
 
 	return 0;
@@ -683,17 +685,6 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 	return ret;
 }
 
-static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-		struct drm_gem_object *obj, uint64_t *iova)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-
-	if (!priv->gpu)
-		return -EINVAL;
-
-	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
-}
-
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -709,9 +700,10 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	if (args->flags & MSM_INFO_IOVA) {
+		struct msm_file_private *ctx = file->driver_priv;
 		uint64_t iova;
 
-		ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
+		ret = msm_gem_get_iova(obj, ctx->aspace, &iova);
 		if (!ret)
 			args->offset = iova;
 	} else {
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0a653dd2e618..0499b1708f52 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -59,6 +59,7 @@ struct msm_file_private {
 	rwlock_t queuelock;
 	struct list_head submitqueues;
 	int queueid;
+	struct msm_gem_address_space *aspace;
 };
 
 enum msm_mdp_plane_property {
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 9320e184b48d..a5e61259f40b 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -138,6 +138,7 @@ void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass);
 struct msm_gem_submit {
 	struct drm_device *dev;
 	struct msm_gpu *gpu;
+	struct msm_gem_address_space *aspace;
 	struct list_head node;   /* node in ring submit list */
 	struct list_head bo_list;
 	struct ww_acquire_ctx ticket;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index b8dc8f96caf2..d14399c0dfb8 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -31,8 +31,9 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-		struct msm_gpu *gpu, struct msm_gpu_submitqueue *queue,
-		uint32_t nr_bos, uint32_t nr_cmds)
+		struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+		struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
+		uint32_t nr_cmds)
 {
 	struct msm_gem_submit *submit;
 	uint64_t sz = sizeof(*submit) + ((u64)nr_bos * sizeof(submit->bos[0])) +
@@ -46,6 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return NULL;
 
 	submit->dev = dev;
+	submit->aspace = aspace;
 	submit->gpu = gpu;
 	submit->fence = NULL;
 	submit->pid = get_pid(task_pid(current));
@@ -167,7 +169,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit,
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_put_iova(&msm_obj->base, submit->gpu->aspace);
+		msm_gem_put_iova(&msm_obj->base, submit->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		ww_mutex_unlock(&msm_obj->resv->lock);
@@ -270,7 +272,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 
 		/* if locking succeeded, pin bo: */
 		ret = msm_gem_get_iova(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+				submit->aspace, &iova);
 
 		if (ret)
 			break;
@@ -465,7 +467,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	submit = submit_create(dev, gpu, queue, args->nr_bos, args->nr_cmds);
+	submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
+		args->nr_cmds);
 	if (!submit) {
 		ret = -ENOMEM;
 		goto out_unlock;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index bd376f9e18a7..086fb347b554 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -551,7 +551,7 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		struct msm_gem_object *msm_obj = submit->bos[i].obj;
 		/* move to inactive: */
 		msm_gem_move_to_inactive(&msm_obj->base);
-		msm_gem_put_iova(&msm_obj->base, gpu->aspace);
+		msm_gem_put_iova(&msm_obj->base, submit->aspace);
 		drm_gem_object_unreference(&msm_obj->base);
 	}
 
@@ -635,8 +635,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_reference(&msm_obj->base);
-		msm_gem_get_iova(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+		msm_gem_get_iova(&msm_obj->base, submit->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			msm_gem_move_to_active(&msm_obj->base, gpu, true, submit->fence);
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/14] drm/msm: Pass the MMU domain index in struct msm_file_private
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Pass the index of the MMU domain in struct msm_file_private instead
of assuming gpu->id throughout the submit path. This clears the way
to change ctx->aspace to a per-instance pagetable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c        | 16 ++++------------
 drivers/gpu/drm/msm/msm_drv.h        |  1 +
 drivers/gpu/drm/msm/msm_gem.h        |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 13 ++++++++-----
 drivers/gpu/drm/msm/msm_gpu.c        |  5 ++---
 5 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index d90ef1d78a1b..74dd09db93d7 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -502,6 +502,7 @@ static void load_gpu(struct drm_device *dev)
 
 static int context_init(struct drm_device *dev, struct drm_file *file)
 {
+	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_file_private *ctx;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -510,6 +511,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 
 	msm_submitqueue_init(dev, ctx);
 
+	ctx->aspace = priv->gpu->aspace;
 	file->driver_priv = ctx;
 
 	return 0;
@@ -683,17 +685,6 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 	return ret;
 }
 
-static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-		struct drm_gem_object *obj, uint64_t *iova)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-
-	if (!priv->gpu)
-		return -EINVAL;
-
-	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
-}
-
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -709,9 +700,10 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	if (args->flags & MSM_INFO_IOVA) {
+		struct msm_file_private *ctx = file->driver_priv;
 		uint64_t iova;
 
-		ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
+		ret = msm_gem_get_iova(obj, ctx->aspace, &iova);
 		if (!ret)
 			args->offset = iova;
 	} else {
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0a653dd2e618..0499b1708f52 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -59,6 +59,7 @@ struct msm_file_private {
 	rwlock_t queuelock;
 	struct list_head submitqueues;
 	int queueid;
+	struct msm_gem_address_space *aspace;
 };
 
 enum msm_mdp_plane_property {
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 9320e184b48d..a5e61259f40b 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -138,6 +138,7 @@ void msm_gem_vunmap(struct drm_gem_object *obj, enum msm_gem_lock subclass);
 struct msm_gem_submit {
 	struct drm_device *dev;
 	struct msm_gpu *gpu;
+	struct msm_gem_address_space *aspace;
 	struct list_head node;   /* node in ring submit list */
 	struct list_head bo_list;
 	struct ww_acquire_ctx ticket;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index b8dc8f96caf2..d14399c0dfb8 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -31,8 +31,9 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-		struct msm_gpu *gpu, struct msm_gpu_submitqueue *queue,
-		uint32_t nr_bos, uint32_t nr_cmds)
+		struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+		struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
+		uint32_t nr_cmds)
 {
 	struct msm_gem_submit *submit;
 	uint64_t sz = sizeof(*submit) + ((u64)nr_bos * sizeof(submit->bos[0])) +
@@ -46,6 +47,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return NULL;
 
 	submit->dev = dev;
+	submit->aspace = aspace;
 	submit->gpu = gpu;
 	submit->fence = NULL;
 	submit->pid = get_pid(task_pid(current));
@@ -167,7 +169,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit,
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_put_iova(&msm_obj->base, submit->gpu->aspace);
+		msm_gem_put_iova(&msm_obj->base, submit->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		ww_mutex_unlock(&msm_obj->resv->lock);
@@ -270,7 +272,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 
 		/* if locking succeeded, pin bo: */
 		ret = msm_gem_get_iova(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+				submit->aspace, &iova);
 
 		if (ret)
 			break;
@@ -465,7 +467,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	submit = submit_create(dev, gpu, queue, args->nr_bos, args->nr_cmds);
+	submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
+		args->nr_cmds);
 	if (!submit) {
 		ret = -ENOMEM;
 		goto out_unlock;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index bd376f9e18a7..086fb347b554 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -551,7 +551,7 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		struct msm_gem_object *msm_obj = submit->bos[i].obj;
 		/* move to inactive: */
 		msm_gem_move_to_inactive(&msm_obj->base);
-		msm_gem_put_iova(&msm_obj->base, gpu->aspace);
+		msm_gem_put_iova(&msm_obj->base, submit->aspace);
 		drm_gem_object_unreference(&msm_obj->base);
 	}
 
@@ -635,8 +635,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_reference(&msm_obj->base);
-		msm_gem_get_iova(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+		msm_gem_get_iova(&msm_obj->base, submit->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			msm_gem_move_to_active(&msm_obj->base, gpu, true, submit->fence);
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/14] drm/msm/gpu: Support using TTBR1 for kernel buffer objects
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

arm-smmu based targets can support split pagetables (TTBR0/TTBR1).
This is most useful for implementing per-instance pagetables so that
the "user" pagetable can be swapped out while the "kernel" or
"global" pagetable remains entact.

if the target specifies a global virtual memory range then try to
enable TTBR1 (the "global" pagetable) on the domain and and if
successful use the global virtual memory range for allocations
on the default GPU address space - this ensures that the global
allocations make it into the right space. Per-instance pagetables
still need additional support to be enabled but even if they
aren't set up it isn't harmful to just use TTBR1 for all
virtual memory regions and leave the other pagetable unused.

If TTBR1 support isn't enabled then fall back to the "legacy"
virtual address space both kernel and user.

Signed-off-by: Jordan Crouse <jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 20 ++++++++++++++++++--
 drivers/gpu/drm/msm/msm_gpu.h |  4 ++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 086fb347b554..94332faa316f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -701,7 +701,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		u64 va_start, u64 va_end,
+		u64 va_global_start, u64 va_global_end)
 {
 	struct iommu_domain *iommu;
 	struct msm_gem_address_space *aspace;
@@ -719,6 +720,20 @@ msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
 	iommu->geometry.aperture_start = va_start;
 	iommu->geometry.aperture_end = va_end;
 
+	/* If a va_global range was specified then try to set up TTBR1 */
+	if (va_global_start && va_global_end) {
+		int val = 1;
+
+		/* Try to enable TTBR1 on the domain */
+		ret = iommu_domain_set_attr(iommu, DOMAIN_ATTR_ENABLE_TTBR1,
+			&val);
+
+		if (!WARN(ret, "Unable to enable TTBR1 for the IOMMU\n")) {
+			iommu->geometry.aperture_start = va_global_start;
+			iommu->geometry.aperture_end = va_global_end;
+		}
+	}
+
 	dev_info(gpu->dev->dev, "%s: using IOMMU\n", gpu->name);
 
 	aspace = msm_gem_address_space_create(&pdev->dev, iommu, "gpu");
@@ -811,7 +826,8 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->va_start_global,
+		config->va_end_global);
 
 	if (gpu->aspace == NULL)
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index fccfccd303af..698eca2c1431 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -31,8 +31,8 @@ struct msm_gpu_perfcntr;
 struct msm_gpu_config {
 	const char *ioname;
 	const char *irqname;
-	uint64_t va_start;
-	uint64_t va_end;
+	uint64_t va_start, va_end;
+	uint64_t va_start_global, va_end_global;
 	unsigned int nr_rings;
 };
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/14] drm/msm/gpu: Support using TTBR1 for kernel buffer objects
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

arm-smmu based targets can support split pagetables (TTBR0/TTBR1).
This is most useful for implementing per-instance pagetables so that
the "user" pagetable can be swapped out while the "kernel" or
"global" pagetable remains entact.

if the target specifies a global virtual memory range then try to
enable TTBR1 (the "global" pagetable) on the domain and and if
successful use the global virtual memory range for allocations
on the default GPU address space - this ensures that the global
allocations make it into the right space. Per-instance pagetables
still need additional support to be enabled but even if they
aren't set up it isn't harmful to just use TTBR1 for all
virtual memory regions and leave the other pagetable unused.

If TTBR1 support isn't enabled then fall back to the "legacy"
virtual address space both kernel and user.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_gpu.c | 20 ++++++++++++++++++--
 drivers/gpu/drm/msm/msm_gpu.h |  4 ++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 086fb347b554..94332faa316f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -701,7 +701,8 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 
 static struct msm_gem_address_space *
 msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
-		uint64_t va_start, uint64_t va_end)
+		u64 va_start, u64 va_end,
+		u64 va_global_start, u64 va_global_end)
 {
 	struct iommu_domain *iommu;
 	struct msm_gem_address_space *aspace;
@@ -719,6 +720,20 @@ msm_gpu_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev,
 	iommu->geometry.aperture_start = va_start;
 	iommu->geometry.aperture_end = va_end;
 
+	/* If a va_global range was specified then try to set up TTBR1 */
+	if (va_global_start && va_global_end) {
+		int val = 1;
+
+		/* Try to enable TTBR1 on the domain */
+		ret = iommu_domain_set_attr(iommu, DOMAIN_ATTR_ENABLE_TTBR1,
+			&val);
+
+		if (!WARN(ret, "Unable to enable TTBR1 for the IOMMU\n")) {
+			iommu->geometry.aperture_start = va_global_start;
+			iommu->geometry.aperture_end = va_global_end;
+		}
+	}
+
 	dev_info(gpu->dev->dev, "%s: using IOMMU\n", gpu->name);
 
 	aspace = msm_gem_address_space_create(&pdev->dev, iommu, "gpu");
@@ -811,7 +826,8 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	msm_devfreq_init(gpu);
 
 	gpu->aspace = msm_gpu_create_address_space(gpu, pdev,
-		config->va_start, config->va_end);
+		config->va_start, config->va_end, config->va_start_global,
+		config->va_end_global);
 
 	if (gpu->aspace == NULL)
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index fccfccd303af..698eca2c1431 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -31,8 +31,8 @@ struct msm_gpu_perfcntr;
 struct msm_gpu_config {
 	const char *ioname;
 	const char *irqname;
-	uint64_t va_start;
-	uint64_t va_end;
+	uint64_t va_start, va_end;
+	uint64_t va_start_global, va_end_global;
 	unsigned int nr_rings;
 };
 
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/14] drm/msm: Add msm_mmu features
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.  The
first feature will be per-instance pagetables coming in the
following patch.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_mmu.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index aa2c5d4580c8..85df78d71398 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -35,6 +35,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	unsigned long features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -54,4 +55,16 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 	mmu->handler = handler;
 }
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu,
+		unsigned long feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu,
+		unsigned long feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 #endif /* __MSM_MMU_H__ */
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/14] drm/msm: Add msm_mmu features
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add a few simple support functions to support a bitmask of
features that a specific MMU implementation supports.  The
first feature will be per-instance pagetables coming in the
following patch.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_mmu.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index aa2c5d4580c8..85df78d71398 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -35,6 +35,7 @@ struct msm_mmu {
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	unsigned long features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
@@ -54,4 +55,16 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 	mmu->handler = handler;
 }
 
+static inline void msm_mmu_set_feature(struct msm_mmu *mmu,
+		unsigned long feature)
+{
+	mmu->features |= feature;
+}
+
+static inline bool msm_mmu_has_feature(struct msm_mmu *mmu,
+		unsigned long feature)
+{
+	return (mmu->features & feature) ? true : false;
+}
+
 #endif /* __MSM_MMU_H__ */
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/14] drm/msm: Add support for iommu-sva PASIDs
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The IOMMU core can support creating multiple pagetables
for a specific domai and making them available to a client
driver that has the means to manage the pagetable itself.

PASIDs are unique indexes to a software created pagetable with
the same format and characteristics as the parent IOMMU device.
The IOMMU driver allocates the pagetable and tracks it with a
unique token (PASID) - it does not touch the actual hardware.
 The client driver is expected to be able to manage the pagetables
and do something interesting with them.

Some flavors of the MSM GPU are able to allow each DRM instance
to have its own pagetable (and virtual memory space) and switch them
asynchronously at the beginning of a command.  This protects against
accidental or malicious corruption or copying of buffers from other
instances.

The first step is to add a MMU implementation that can allocate a
PASID and set up a msm_mmu struct to abstract (most) of the details
from the rest of the system.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 184 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/msm_mmu.h   |   6 ++
 2 files changed, 190 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index fdbe1a8372f0..de8669c9a5a1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -15,6 +15,9 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/hashtable.h>
+#include <linux/arm-smmu.h>
+
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
@@ -34,12 +37,29 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	return 0;
 }
 
+static bool msm_iommu_check_per_instance(struct msm_iommu *iommu)
+{
+	int val;
+
+	if (!IS_ENABLED(CONFIG_IOMMU_SVA))
+		return false;
+
+	if (iommu_domain_get_attr(iommu->domain, DOMAIN_ATTR_ENABLE_TTBR1,
+		&val))
+		return false;
+
+	return val ? true : false;
+}
+
 static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	int ret;
 
+	if (msm_iommu_check_per_instance(iommu))
+		msm_mmu_set_feature(mmu, MMU_FEATURE_PER_INSTANCE_TABLES);
+
 	pm_runtime_get_sync(mmu->dev);
 	ret = iommu_attach_device(iommu->domain, mmu->dev);
 	pm_runtime_put_sync(mmu->dev);
@@ -112,3 +132,167 @@ struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
 
 	return &iommu->base;
 }
+
+struct pasid_entry {
+	int pasid;
+	u64 ttbr;
+	u32 asid;
+	struct hlist_node node;
+};
+
+DECLARE_HASHTABLE(pasid_table, 4);
+
+static int install_pasid_cb(int pasid, u64 ttbr, u32 asid, void *data)
+{
+	struct pasid_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->pasid = pasid;
+	entry->ttbr = ttbr;
+	entry->asid = asid;
+
+	/* FIXME: Assume that we'll never have a pasid conflict? */
+	/* FIXME: locks? RCU? */
+	hash_add(pasid_table, &entry->node, pasid);
+	return 0;
+}
+
+static void remove_pasid_cb(int pasid, void *data)
+{
+	struct pasid_entry *entry;
+
+	hash_for_each_possible(pasid_table, entry, node, pasid) {
+		if (pasid == entry->pasid) {
+			hash_del(&entry->node);
+			kfree(entry);
+			return;
+		}
+	}
+}
+
+struct msm_iommu_pasid {
+	struct msm_mmu base;
+	int pasid;
+	u64 ttbr;
+	u32 asid;
+};
+#define to_msm_iommu_pasid(x) container_of(x, struct msm_iommu_pasid, base)
+
+static int msm_iommu_pasid_attach(struct msm_mmu *mmu,
+		const char * const *names, int cnt)
+{
+	return 0;
+}
+
+static int msm_iommu_pasid_map(struct msm_mmu *mmu, uint64_t iova,
+		struct sg_table *sgt, unsigned len, int prot)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+	int ret;
+
+	ret = iommu_sva_map_sg(pasid->pasid, iova, sgt->sgl, sgt->nents, prot);
+	WARN_ON(ret < 0);
+
+	return (ret == len) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pasid_unmap(struct msm_mmu *mmu, uint64_t iova,
+		struct sg_table *sgt, unsigned len)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+
+	iommu_sva_unmap(pasid->pasid, iova, len);
+
+	return 0;
+}
+
+static void msm_iommu_pasid_detach(struct msm_mmu *mmu,
+		const char * const *names, int cnt)
+{
+}
+
+static void msm_iommu_pasid_destroy(struct msm_mmu *mmu)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+
+	iommu_sva_free_pasid(pasid->pasid);
+	kfree(pasid);
+}
+
+static const struct msm_mmu_funcs pasid_funcs = {
+		.attach = msm_iommu_pasid_attach,
+		.detach = msm_iommu_pasid_detach,
+		.map = msm_iommu_pasid_map,
+		.unmap = msm_iommu_pasid_unmap,
+		.destroy = msm_iommu_pasid_destroy,
+};
+
+static const struct arm_smmu_pasid_ops msm_iommu_pasid_ops = {
+	.install_pasid = install_pasid_cb,
+	.remove_pasid = remove_pasid_cb,
+};
+
+struct msm_mmu *msm_iommu_pasid_new(struct msm_mmu *parent)
+{
+	struct msm_iommu *parent_iommu = to_msm_iommu(parent);
+	struct msm_iommu_pasid *pasid;
+	int id;
+
+	if (!msm_mmu_has_feature(parent, MMU_FEATURE_PER_INSTANCE_TABLES))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	pasid = kzalloc(sizeof(*pasid), GFP_KERNEL);
+	if (!pasid)
+		return ERR_PTR(-ENOMEM);
+
+	arm_smmu_add_pasid_ops(parent_iommu->domain, &msm_iommu_pasid_ops,
+		NULL);
+
+	id = iommu_sva_alloc_pasid(parent_iommu->domain, parent->dev);
+	if (id < 0) {
+		kfree(pasid);
+		return ERR_PTR(id);
+	}
+
+	pasid->pasid = id;
+	msm_mmu_init(&pasid->base, parent->dev, &pasid_funcs);
+
+	return &pasid->base;
+}
+
+/* Given a pasid return the TTBR and ASID associated with it */
+int msm_iommu_pasid_info(struct msm_mmu *mmu, u64 *ttbr, u32 *asid)
+{
+	struct msm_iommu_pasid *pasid;
+	struct pasid_entry *entry;
+
+	if (mmu->funcs->map != msm_iommu_pasid_map)
+		return -ENODEV;
+
+	pasid = to_msm_iommu_pasid(mmu);
+
+	if (!pasid->ttbr) {
+		/* Find the pasid entry in the hash */
+		hash_for_each_possible(pasid_table, entry, node, pasid->pasid) {
+			if (pasid->pasid == entry->pasid) {
+				pasid->ttbr = entry->ttbr;
+				pasid->asid = entry->asid;
+				goto out;
+			}
+		}
+
+		WARN(1, "Couldn't find the entry for pasid %d\n", pasid->pasid);
+		return -EINVAL;
+	}
+
+out:
+	if (*ttbr)
+		*ttbr = pasid->ttbr;
+
+	if (*asid)
+		*asid = pasid->asid;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 85df78d71398..29436b9daa73 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -30,6 +30,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };
 
+/* MMU features */
+#define MMU_FEATURE_PER_INSTANCE_TABLES (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
@@ -48,6 +51,9 @@ static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
 struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu);
 
+struct msm_mmu *msm_iommu_pasid_new(struct msm_mmu *parent);
+int msm_iommu_pasid_info(struct msm_mmu *mmu, u64 *ttbr, u32 *asid);
+
 static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 		int (*handler)(void *arg, unsigned long iova, int flags))
 {
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 11/14] drm/msm: Add support for iommu-sva PASIDs
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

The IOMMU core can support creating multiple pagetables
for a specific domai and making them available to a client
driver that has the means to manage the pagetable itself.

PASIDs are unique indexes to a software created pagetable with
the same format and characteristics as the parent IOMMU device.
The IOMMU driver allocates the pagetable and tracks it with a
unique token (PASID) - it does not touch the actual hardware.
 The client driver is expected to be able to manage the pagetables
and do something interesting with them.

Some flavors of the MSM GPU are able to allow each DRM instance
to have its own pagetable (and virtual memory space) and switch them
asynchronously at the beginning of a command.  This protects against
accidental or malicious corruption or copying of buffers from other
instances.

The first step is to add a MMU implementation that can allocate a
PASID and set up a msm_mmu struct to abstract (most) of the details
from the rest of the system.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 184 ++++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/msm_mmu.h   |   6 ++
 2 files changed, 190 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index fdbe1a8372f0..de8669c9a5a1 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -15,6 +15,9 @@
  * this program.  If not, see <http://www.gnu.org/licenses/>.
  */
 
+#include <linux/hashtable.h>
+#include <linux/arm-smmu.h>
+
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
@@ -34,12 +37,29 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	return 0;
 }
 
+static bool msm_iommu_check_per_instance(struct msm_iommu *iommu)
+{
+	int val;
+
+	if (!IS_ENABLED(CONFIG_IOMMU_SVA))
+		return false;
+
+	if (iommu_domain_get_attr(iommu->domain, DOMAIN_ATTR_ENABLE_TTBR1,
+		&val))
+		return false;
+
+	return val ? true : false;
+}
+
 static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 			    int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	int ret;
 
+	if (msm_iommu_check_per_instance(iommu))
+		msm_mmu_set_feature(mmu, MMU_FEATURE_PER_INSTANCE_TABLES);
+
 	pm_runtime_get_sync(mmu->dev);
 	ret = iommu_attach_device(iommu->domain, mmu->dev);
 	pm_runtime_put_sync(mmu->dev);
@@ -112,3 +132,167 @@ struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
 
 	return &iommu->base;
 }
+
+struct pasid_entry {
+	int pasid;
+	u64 ttbr;
+	u32 asid;
+	struct hlist_node node;
+};
+
+DECLARE_HASHTABLE(pasid_table, 4);
+
+static int install_pasid_cb(int pasid, u64 ttbr, u32 asid, void *data)
+{
+	struct pasid_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+
+	if (!entry)
+		return -ENOMEM;
+
+	entry->pasid = pasid;
+	entry->ttbr = ttbr;
+	entry->asid = asid;
+
+	/* FIXME: Assume that we'll never have a pasid conflict? */
+	/* FIXME: locks? RCU? */
+	hash_add(pasid_table, &entry->node, pasid);
+	return 0;
+}
+
+static void remove_pasid_cb(int pasid, void *data)
+{
+	struct pasid_entry *entry;
+
+	hash_for_each_possible(pasid_table, entry, node, pasid) {
+		if (pasid == entry->pasid) {
+			hash_del(&entry->node);
+			kfree(entry);
+			return;
+		}
+	}
+}
+
+struct msm_iommu_pasid {
+	struct msm_mmu base;
+	int pasid;
+	u64 ttbr;
+	u32 asid;
+};
+#define to_msm_iommu_pasid(x) container_of(x, struct msm_iommu_pasid, base)
+
+static int msm_iommu_pasid_attach(struct msm_mmu *mmu,
+		const char * const *names, int cnt)
+{
+	return 0;
+}
+
+static int msm_iommu_pasid_map(struct msm_mmu *mmu, uint64_t iova,
+		struct sg_table *sgt, unsigned len, int prot)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+	int ret;
+
+	ret = iommu_sva_map_sg(pasid->pasid, iova, sgt->sgl, sgt->nents, prot);
+	WARN_ON(ret < 0);
+
+	return (ret == len) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pasid_unmap(struct msm_mmu *mmu, uint64_t iova,
+		struct sg_table *sgt, unsigned len)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+
+	iommu_sva_unmap(pasid->pasid, iova, len);
+
+	return 0;
+}
+
+static void msm_iommu_pasid_detach(struct msm_mmu *mmu,
+		const char * const *names, int cnt)
+{
+}
+
+static void msm_iommu_pasid_destroy(struct msm_mmu *mmu)
+{
+	struct msm_iommu_pasid *pasid = to_msm_iommu_pasid(mmu);
+
+	iommu_sva_free_pasid(pasid->pasid);
+	kfree(pasid);
+}
+
+static const struct msm_mmu_funcs pasid_funcs = {
+		.attach = msm_iommu_pasid_attach,
+		.detach = msm_iommu_pasid_detach,
+		.map = msm_iommu_pasid_map,
+		.unmap = msm_iommu_pasid_unmap,
+		.destroy = msm_iommu_pasid_destroy,
+};
+
+static const struct arm_smmu_pasid_ops msm_iommu_pasid_ops = {
+	.install_pasid = install_pasid_cb,
+	.remove_pasid = remove_pasid_cb,
+};
+
+struct msm_mmu *msm_iommu_pasid_new(struct msm_mmu *parent)
+{
+	struct msm_iommu *parent_iommu = to_msm_iommu(parent);
+	struct msm_iommu_pasid *pasid;
+	int id;
+
+	if (!msm_mmu_has_feature(parent, MMU_FEATURE_PER_INSTANCE_TABLES))
+		return ERR_PTR(-EOPNOTSUPP);
+
+	pasid = kzalloc(sizeof(*pasid), GFP_KERNEL);
+	if (!pasid)
+		return ERR_PTR(-ENOMEM);
+
+	arm_smmu_add_pasid_ops(parent_iommu->domain, &msm_iommu_pasid_ops,
+		NULL);
+
+	id = iommu_sva_alloc_pasid(parent_iommu->domain, parent->dev);
+	if (id < 0) {
+		kfree(pasid);
+		return ERR_PTR(id);
+	}
+
+	pasid->pasid = id;
+	msm_mmu_init(&pasid->base, parent->dev, &pasid_funcs);
+
+	return &pasid->base;
+}
+
+/* Given a pasid return the TTBR and ASID associated with it */
+int msm_iommu_pasid_info(struct msm_mmu *mmu, u64 *ttbr, u32 *asid)
+{
+	struct msm_iommu_pasid *pasid;
+	struct pasid_entry *entry;
+
+	if (mmu->funcs->map != msm_iommu_pasid_map)
+		return -ENODEV;
+
+	pasid = to_msm_iommu_pasid(mmu);
+
+	if (!pasid->ttbr) {
+		/* Find the pasid entry in the hash */
+		hash_for_each_possible(pasid_table, entry, node, pasid->pasid) {
+			if (pasid->pasid == entry->pasid) {
+				pasid->ttbr = entry->ttbr;
+				pasid->asid = entry->asid;
+				goto out;
+			}
+		}
+
+		WARN(1, "Couldn't find the entry for pasid %d\n", pasid->pasid);
+		return -EINVAL;
+	}
+
+out:
+	if (*ttbr)
+		*ttbr = pasid->ttbr;
+
+	if (*asid)
+		*asid = pasid->asid;
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 85df78d71398..29436b9daa73 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -30,6 +30,9 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };
 
+/* MMU features */
+#define MMU_FEATURE_PER_INSTANCE_TABLES (1 << 0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
@@ -48,6 +51,9 @@ static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
 struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu);
 
+struct msm_mmu *msm_iommu_pasid_new(struct msm_mmu *parent);
+int msm_iommu_pasid_info(struct msm_mmu *mmu, u64 *ttbr, u32 *asid);
+
 static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 		int (*handler)(void *arg, unsigned long iova, int flags))
 {
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/14] drm/msm: Add support for per-instance address spaces
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add a function to allocate a new pasid from a existing
MMU domain and create a per-instance address space.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.h     |  3 +++
 drivers/gpu/drm/msm/msm_gem_vma.c | 36 +++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0499b1708f52..4fbc3188d776 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -178,6 +178,9 @@ void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 struct msm_gem_address_space *
 msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
 		const char *name);
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		u64 start, u64 end);
 
 void msm_gem_submit_free(struct msm_gem_submit *submit);
 int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c
index d34e331554f3..ef33026f593f 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -92,10 +92,11 @@ msm_gem_map_vma(struct msm_gem_address_space *aspace,
 }
 
 struct msm_gem_address_space *
-msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
-		const char *name)
+msm_gem_address_space_new(struct msm_mmu *mmu, const char *name,
+		u64 start, u64 end)
 {
 	struct msm_gem_address_space *aspace;
+	u64 size = end - start;
 
 	aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
 	if (!aspace)
@@ -103,12 +104,37 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
 
 	spin_lock_init(&aspace->lock);
 	aspace->name = name;
-	aspace->mmu = msm_iommu_new(dev, domain);
+	aspace->mmu = mmu;
 
-	drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> PAGE_SHIFT),
-			(domain->geometry.aperture_end >> PAGE_SHIFT) - 1);
+	drm_mm_init(&aspace->mm, (start >> PAGE_SHIFT), size >> PAGE_SHIFT);
 
 	kref_init(&aspace->kref);
 
 	return aspace;
 }
+
+struct msm_gem_address_space *
+msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
+		const char *name)
+{
+	struct msm_mmu *mmu = msm_iommu_new(dev, domain);
+
+	if (IS_ERR(mmu))
+		return ERR_CAST(mmu);
+
+	return msm_gem_address_space_new(mmu, name,
+		domain->geometry.aperture_start,
+		domain->geometry.aperture_end);
+}
+
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		u64 start, u64 end)
+{
+	struct msm_mmu *instance = msm_iommu_pasid_new(parent);
+
+	if (IS_ERR(instance))
+		return ERR_CAST(instance);
+
+	return msm_gem_address_space_new(instance, name, start, end);
+}
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 12/14] drm/msm: Add support for per-instance address spaces
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add a function to allocate a new pasid from a existing
MMU domain and create a per-instance address space.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.h     |  3 +++
 drivers/gpu/drm/msm/msm_gem_vma.c | 36 +++++++++++++++++++++++++++++++-----
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0499b1708f52..4fbc3188d776 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -178,6 +178,9 @@ void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 struct msm_gem_address_space *
 msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
 		const char *name);
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		u64 start, u64 end);
 
 void msm_gem_submit_free(struct msm_gem_submit *submit);
 int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c
index d34e331554f3..ef33026f593f 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -92,10 +92,11 @@ msm_gem_map_vma(struct msm_gem_address_space *aspace,
 }
 
 struct msm_gem_address_space *
-msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
-		const char *name)
+msm_gem_address_space_new(struct msm_mmu *mmu, const char *name,
+		u64 start, u64 end)
 {
 	struct msm_gem_address_space *aspace;
+	u64 size = end - start;
 
 	aspace = kzalloc(sizeof(*aspace), GFP_KERNEL);
 	if (!aspace)
@@ -103,12 +104,37 @@ msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
 
 	spin_lock_init(&aspace->lock);
 	aspace->name = name;
-	aspace->mmu = msm_iommu_new(dev, domain);
+	aspace->mmu = mmu;
 
-	drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> PAGE_SHIFT),
-			(domain->geometry.aperture_end >> PAGE_SHIFT) - 1);
+	drm_mm_init(&aspace->mm, (start >> PAGE_SHIFT), size >> PAGE_SHIFT);
 
 	kref_init(&aspace->kref);
 
 	return aspace;
 }
+
+struct msm_gem_address_space *
+msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
+		const char *name)
+{
+	struct msm_mmu *mmu = msm_iommu_new(dev, domain);
+
+	if (IS_ERR(mmu))
+		return ERR_CAST(mmu);
+
+	return msm_gem_address_space_new(mmu, name,
+		domain->geometry.aperture_start,
+		domain->geometry.aperture_end);
+}
+
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		u64 start, u64 end)
+{
+	struct msm_mmu *instance = msm_iommu_pasid_new(parent);
+
+	if (IS_ERR(instance))
+		return ERR_CAST(instance);
+
+	return msm_gem_address_space_new(instance, name, start, end);
+}
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/14] drm/msm: Support per-instance address spaces
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Create a per-instance address spaces when a new DRM file instance is
opened assuming the target supports it and the underlying
infrastructure exists. If the operation is unsupported fall back
quietly to use the global pagetable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 74dd09db93d7..24d23293b090 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -22,6 +22,7 @@
 #include "msm_fence.h"
 #include "msm_gpu.h"
 #include "msm_kms.h"
+#include "msm_gem.h"
 
 
 /*
@@ -511,7 +512,27 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 
 	msm_submitqueue_init(dev, ctx);
 
-	ctx->aspace = priv->gpu->aspace;
+	/* FIXME: Do we want a dynamic name of some sort? */
+	/* FIXME: We need a smarter way to set the range based on target */
+
+	ctx->aspace = msm_gem_address_space_create_instance(
+		priv->gpu->aspace->mmu, "gpu", 0x100000000, 0x1ffffffff);
+
+	if (IS_ERR(ctx->aspace)) {
+		int ret = PTR_ERR(ctx->aspace);
+
+		/*
+		 * if per-instance pagetables are not supported, fall back to
+		 * using the generic address space
+		 */
+		if (ret == -EOPNOTSUPP)
+			ctx->aspace = priv->gpu->aspace;
+		else {
+			kfree(ctx);
+			return ret;
+		}
+	}
+
 	file->driver_priv = ctx;
 
 	return 0;
@@ -527,8 +548,12 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
 	return context_init(dev, file);
 }
 
-static void context_close(struct msm_file_private *ctx)
+static void context_close(struct msm_drm_private *priv,
+		struct msm_file_private *ctx)
 {
+	if (ctx && ctx->aspace != priv->gpu->aspace)
+		msm_gem_address_space_put(ctx->aspace);
+
 	msm_submitqueue_close(ctx);
 	kfree(ctx);
 }
@@ -543,7 +568,7 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file)
 		priv->lastctx = NULL;
 	mutex_unlock(&dev->struct_mutex);
 
-	context_close(ctx);
+	context_close(priv, ctx);
 }
 
 static irqreturn_t msm_irq(int irq, void *arg)
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 13/14] drm/msm: Support per-instance address spaces
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Create a per-instance address spaces when a new DRM file instance is
opened assuming the target supports it and the underlying
infrastructure exists. If the operation is unsupported fall back
quietly to use the global pagetable.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 74dd09db93d7..24d23293b090 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -22,6 +22,7 @@
 #include "msm_fence.h"
 #include "msm_gpu.h"
 #include "msm_kms.h"
+#include "msm_gem.h"
 
 
 /*
@@ -511,7 +512,27 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 
 	msm_submitqueue_init(dev, ctx);
 
-	ctx->aspace = priv->gpu->aspace;
+	/* FIXME: Do we want a dynamic name of some sort? */
+	/* FIXME: We need a smarter way to set the range based on target */
+
+	ctx->aspace = msm_gem_address_space_create_instance(
+		priv->gpu->aspace->mmu, "gpu", 0x100000000, 0x1ffffffff);
+
+	if (IS_ERR(ctx->aspace)) {
+		int ret = PTR_ERR(ctx->aspace);
+
+		/*
+		 * if per-instance pagetables are not supported, fall back to
+		 * using the generic address space
+		 */
+		if (ret == -EOPNOTSUPP)
+			ctx->aspace = priv->gpu->aspace;
+		else {
+			kfree(ctx);
+			return ret;
+		}
+	}
+
 	file->driver_priv = ctx;
 
 	return 0;
@@ -527,8 +548,12 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
 	return context_init(dev, file);
 }
 
-static void context_close(struct msm_file_private *ctx)
+static void context_close(struct msm_drm_private *priv,
+		struct msm_file_private *ctx)
 {
+	if (ctx && ctx->aspace != priv->gpu->aspace)
+		msm_gem_address_space_put(ctx->aspace);
+
 	msm_submitqueue_close(ctx);
 	kfree(ctx);
 }
@@ -543,7 +568,7 @@ static void msm_postclose(struct drm_device *dev, struct drm_file *file)
 		priv->lastctx = NULL;
 	mutex_unlock(&dev->struct_mutex);
 
-	context_close(ctx);
+	context_close(priv, ctx);
 }
 
 static irqreturn_t msm_irq(int irq, void *arg)
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/14] drm/msm/a5xx: Support per-instance pagetables
  2018-02-21 22:59 ` Jordan Crouse
@ 2018-02-21 22:59     ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: jean-philippe.brucker-5wv7dgnIgG8,
	linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add support for per-instance pagetables for 5XX targets. Create a support
buffer for preemption to hold the SMMU pagetable information for a preempted
ring, enable TTBR1 to support split pagetables and add the necessary PM4
commands to trigger a pagetable switch at the beginning of a user command.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 55 ++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 17 +++++++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 76 +++++++++++++++++++++++++------
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 11 +++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  1 +
 6 files changed, 152 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c106606887e2..e5d8df8e8e12 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -140,6 +140,59 @@ static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
+static void a5xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
+	struct msm_file_private *ctx)
+{
+	u64 ttbr;
+	u32 asid;
+
+	if (msm_iommu_pasid_info(ctx->aspace->mmu, &ttbr, &asid))
+		return;
+
+	ttbr = ttbr | ((u64) asid) << 48;
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn on APIV mode to access critical regions */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 1);
+
+	/* Make sure the ME is synchronized before staring the update */
+	OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+	/* Execute the table update */
+	OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 3);
+	OUT_RING(ring, lower_32_bits(ttbr));
+	OUT_RING(ring, upper_32_bits(ttbr));
+	OUT_RING(ring, 0);
+
+	/*
+	 * Write the new TTBR0 to the preemption records - this will be used to
+	 * reload the pagetable if the current ring gets preempted out.
+	 */
+	OUT_PKT7(ring, CP_MEM_WRITE, 4);
+	OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+	OUT_RING(ring, lower_32_bits(ttbr));
+	OUT_RING(ring, upper_32_bits(ttbr));
+
+	/* Invalidate the draw state so we start off fresh */
+	OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+	OUT_RING(ring, 0x40000);
+	OUT_RING(ring, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off APRIV */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
@@ -149,6 +202,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	a5xx_set_pagetable(gpu, ring, ctx);
+
 	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
 	OUT_RING(ring, 0x02);
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 6fb8c2f9b9e4..5070cb17d66c 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -45,6 +45,9 @@ struct a5xx_gpu {
 
 	atomic_t preempt_state;
 	struct timer_list preempt_timer;
+	struct a5xx_smmu_info *smmu_info;
+	struct drm_gem_object *smmu_info_bo;
+	uint64_t smmu_info_iova;
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
@@ -128,6 +131,20 @@ struct a5xx_preempt_record {
  */
 #define A5XX_PREEMPT_COUNTER_SIZE (16 * 4)
 
+/*
+ * This is a global structure that the preemption code uses to switch in the
+ * pagetable for the preempted process - the code switches in whatever we
+ * after preempting in a new ring.
+ */
+struct a5xx_smmu_info {
+	uint32_t  magic;
+	uint32_t  _pad4;
+	uint64_t  ttbr0;
+	uint32_t  asid;
+	uint32_t  contextidr;
+};
+
+#define A5XX_SMMU_INFO_MAGIC 0x3618CDA3UL
 
 int a5xx_power_init(struct msm_gpu *gpu);
 void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index 970c7963ae29..8a6618f51eb8 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -12,6 +12,7 @@
  */
 
 #include "msm_gem.h"
+#include "msm_mmu.h"
 #include "a5xx_gpu.h"
 
 /*
@@ -145,6 +146,15 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
 	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
 	spin_unlock_irqrestore(&ring->lock, flags);
 
+	/* Do read barrier to make sure we have updated pagetable info */
+	rmb();
+
+	/* Set the SMMU info for the preemption */
+	if (a5xx_gpu->smmu_info) {
+		a5xx_gpu->smmu_info->ttbr0 = ring->memptrs->ttbr0;
+		a5xx_gpu->smmu_info->contextidr = 0;
+	}
+
 	/* Set the address of the incoming preemption record */
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
 		REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_HI,
@@ -214,9 +224,10 @@ void a5xx_preempt_hw_init(struct msm_gpu *gpu)
 		a5xx_gpu->preempt[i]->rbase = gpu->rb[i]->iova;
 	}
 
-	/* Write a 0 to signal that we aren't switching pagetables */
+	/* Tell the CP where to find the smmu_info buffer*/
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_LO,
-		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI, 0);
+		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI,
+		a5xx_gpu->smmu_info_iova);
 
 	/* Reset the preemption state */
 	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
@@ -275,8 +286,43 @@ void a5xx_preempt_fini(struct msm_gpu *gpu)
 		drm_gem_object_unreference(a5xx_gpu->preempt_bo[i]);
 		a5xx_gpu->preempt_bo[i] = NULL;
 	}
+
+	if (a5xx_gpu->smmu_info_bo) {
+		if (a5xx_gpu->smmu_info_iova)
+			msm_gem_put_iova(a5xx_gpu->smmu_info_bo, gpu->aspace);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->smmu_info_bo);
+		a5xx_gpu->smmu_info_bo = NULL;
+	}
 }
 
+static int a5xx_smmu_info_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct a5xx_smmu_info *ptr;
+	struct drm_gem_object *bo;
+	u64 iova;
+
+	if (!msm_mmu_has_feature(gpu->aspace->mmu,
+			MMU_FEATURE_PER_INSTANCE_TABLES))
+		return 0;
+
+	ptr = msm_gem_kernel_new(gpu->dev, sizeof(struct a5xx_smmu_info),
+		MSM_BO_UNCACHED, gpu->aspace, &bo, &iova);
+
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	ptr->magic = A5XX_SMMU_INFO_MAGIC;
+
+	a5xx_gpu->smmu_info_bo = bo;
+	a5xx_gpu->smmu_info_iova = iova;
+	a5xx_gpu->smmu_info = ptr;
+
+	return 0;
+}
+
+
 void a5xx_preempt_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -288,17 +334,21 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
 		return;
 
 	for (i = 0; i < gpu->nr_rings; i++) {
-		if (preempt_init_ring(a5xx_gpu, gpu->rb[i])) {
-			/*
-			 * On any failure our adventure is over. Clean up and
-			 * set nr_rings to 1 to force preemption off
-			 */
-			a5xx_preempt_fini(gpu);
-			gpu->nr_rings = 1;
-
-			return;
-		}
+		if (preempt_init_ring(a5xx_gpu, gpu->rb[i]))
+			goto fail;
 	}
 
-	timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
+	if (a5xx_smmu_info_init(gpu))
+		goto fail;
+
+		timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
+
+	return;
+fail:
+	/*
+	 * On any failure our adventure is over. Clean up and
+	 * set nr_rings to 1 to force preemption off
+	 */
+	a5xx_preempt_fini(gpu);
+	gpu->nr_rings = 1;
 }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index de63ff26a062..cccd437e6ea3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -552,6 +552,17 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.ioname = "kgsl_3d0_reg_memory";
 	adreno_gpu_config.irqname = "kgsl_3d0_irq";
 
+	if (adreno_is_a5xx(adreno_gpu)) {
+		/*
+		 * If possible use the TTBR1 virtual address space for all the
+		 * "global" buffer objects which are shared between processes.
+		 * This leaves the lower virtual address space open for
+		 * per-instance pagables if they are available
+		 */
+		adreno_gpu_config.va_start_global = 0xfffffff800000000ULL;
+		adreno_gpu_config.va_end_global = 0xfffffff8ffffffffULL;
+	}
+
 	adreno_gpu_config.va_start = SZ_16M;
 	adreno_gpu_config.va_end = 0xffffffff;
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 8d3d0a924908..1b2021c884f7 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -197,6 +197,11 @@ static inline int adreno_is_a530(struct adreno_gpu *gpu)
 	return gpu->revn == 530;
 }
 
+static inline bool adreno_is_a5xx(struct adreno_gpu *gpu)
+{
+	return ((gpu->revn >= 500) & (gpu->revn < 600));
+}
+
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 const struct firmware *adreno_request_fw(struct adreno_gpu *adreno_gpu,
 		const char *fwname);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index cffce094aecb..fd71484d5894 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -26,6 +26,7 @@
 struct msm_rbmemptrs {
 	volatile uint32_t rptr;
 	volatile uint32_t fence;
+	volatile uint64_t ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.16.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 14/14] drm/msm/a5xx: Support per-instance pagetables
@ 2018-02-21 22:59     ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-02-21 22:59 UTC (permalink / raw)
  To: linux-arm-kernel

Add support for per-instance pagetables for 5XX targets. Create a support
buffer for preemption to hold the SMMU pagetable information for a preempted
ring, enable TTBR1 to support split pagetables and add the necessary PM4
commands to trigger a pagetable switch at the beginning of a user command.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 55 ++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 17 +++++++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 76 +++++++++++++++++++++++++------
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 11 +++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  5 ++
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  1 +
 6 files changed, 152 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index c106606887e2..e5d8df8e8e12 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -140,6 +140,59 @@ static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
+static void a5xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
+	struct msm_file_private *ctx)
+{
+	u64 ttbr;
+	u32 asid;
+
+	if (msm_iommu_pasid_info(ctx->aspace->mmu, &ttbr, &asid))
+		return;
+
+	ttbr = ttbr | ((u64) asid) << 48;
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn on APIV mode to access critical regions */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 1);
+
+	/* Make sure the ME is synchronized before staring the update */
+	OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+	/* Execute the table update */
+	OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 3);
+	OUT_RING(ring, lower_32_bits(ttbr));
+	OUT_RING(ring, upper_32_bits(ttbr));
+	OUT_RING(ring, 0);
+
+	/*
+	 * Write the new TTBR0 to the preemption records - this will be used to
+	 * reload the pagetable if the current ring gets preempted out.
+	 */
+	OUT_PKT7(ring, CP_MEM_WRITE, 4);
+	OUT_RING(ring, lower_32_bits(rbmemptr(ring, ttbr0)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(ring, ttbr0)));
+	OUT_RING(ring, lower_32_bits(ttbr));
+	OUT_RING(ring, upper_32_bits(ttbr));
+
+	/* Invalidate the draw state so we start off fresh */
+	OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+	OUT_RING(ring, 0x40000);
+	OUT_RING(ring, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off APRIV */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
@@ -149,6 +202,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	a5xx_set_pagetable(gpu, ring, ctx);
+
 	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
 	OUT_RING(ring, 0x02);
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 6fb8c2f9b9e4..5070cb17d66c 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -45,6 +45,9 @@ struct a5xx_gpu {
 
 	atomic_t preempt_state;
 	struct timer_list preempt_timer;
+	struct a5xx_smmu_info *smmu_info;
+	struct drm_gem_object *smmu_info_bo;
+	uint64_t smmu_info_iova;
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
@@ -128,6 +131,20 @@ struct a5xx_preempt_record {
  */
 #define A5XX_PREEMPT_COUNTER_SIZE (16 * 4)
 
+/*
+ * This is a global structure that the preemption code uses to switch in the
+ * pagetable for the preempted process - the code switches in whatever we
+ * after preempting in a new ring.
+ */
+struct a5xx_smmu_info {
+	uint32_t  magic;
+	uint32_t  _pad4;
+	uint64_t  ttbr0;
+	uint32_t  asid;
+	uint32_t  contextidr;
+};
+
+#define A5XX_SMMU_INFO_MAGIC 0x3618CDA3UL
 
 int a5xx_power_init(struct msm_gpu *gpu);
 void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index 970c7963ae29..8a6618f51eb8 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -12,6 +12,7 @@
  */
 
 #include "msm_gem.h"
+#include "msm_mmu.h"
 #include "a5xx_gpu.h"
 
 /*
@@ -145,6 +146,15 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
 	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
 	spin_unlock_irqrestore(&ring->lock, flags);
 
+	/* Do read barrier to make sure we have updated pagetable info */
+	rmb();
+
+	/* Set the SMMU info for the preemption */
+	if (a5xx_gpu->smmu_info) {
+		a5xx_gpu->smmu_info->ttbr0 = ring->memptrs->ttbr0;
+		a5xx_gpu->smmu_info->contextidr = 0;
+	}
+
 	/* Set the address of the incoming preemption record */
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
 		REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_HI,
@@ -214,9 +224,10 @@ void a5xx_preempt_hw_init(struct msm_gpu *gpu)
 		a5xx_gpu->preempt[i]->rbase = gpu->rb[i]->iova;
 	}
 
-	/* Write a 0 to signal that we aren't switching pagetables */
+	/* Tell the CP where to find the smmu_info buffer*/
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_LO,
-		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI, 0);
+		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI,
+		a5xx_gpu->smmu_info_iova);
 
 	/* Reset the preemption state */
 	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
@@ -275,8 +286,43 @@ void a5xx_preempt_fini(struct msm_gpu *gpu)
 		drm_gem_object_unreference(a5xx_gpu->preempt_bo[i]);
 		a5xx_gpu->preempt_bo[i] = NULL;
 	}
+
+	if (a5xx_gpu->smmu_info_bo) {
+		if (a5xx_gpu->smmu_info_iova)
+			msm_gem_put_iova(a5xx_gpu->smmu_info_bo, gpu->aspace);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->smmu_info_bo);
+		a5xx_gpu->smmu_info_bo = NULL;
+	}
 }
 
+static int a5xx_smmu_info_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct a5xx_smmu_info *ptr;
+	struct drm_gem_object *bo;
+	u64 iova;
+
+	if (!msm_mmu_has_feature(gpu->aspace->mmu,
+			MMU_FEATURE_PER_INSTANCE_TABLES))
+		return 0;
+
+	ptr = msm_gem_kernel_new(gpu->dev, sizeof(struct a5xx_smmu_info),
+		MSM_BO_UNCACHED, gpu->aspace, &bo, &iova);
+
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	ptr->magic = A5XX_SMMU_INFO_MAGIC;
+
+	a5xx_gpu->smmu_info_bo = bo;
+	a5xx_gpu->smmu_info_iova = iova;
+	a5xx_gpu->smmu_info = ptr;
+
+	return 0;
+}
+
+
 void a5xx_preempt_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -288,17 +334,21 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
 		return;
 
 	for (i = 0; i < gpu->nr_rings; i++) {
-		if (preempt_init_ring(a5xx_gpu, gpu->rb[i])) {
-			/*
-			 * On any failure our adventure is over. Clean up and
-			 * set nr_rings to 1 to force preemption off
-			 */
-			a5xx_preempt_fini(gpu);
-			gpu->nr_rings = 1;
-
-			return;
-		}
+		if (preempt_init_ring(a5xx_gpu, gpu->rb[i]))
+			goto fail;
 	}
 
-	timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
+	if (a5xx_smmu_info_init(gpu))
+		goto fail;
+
+		timer_setup(&a5xx_gpu->preempt_timer, a5xx_preempt_timer, 0);
+
+	return;
+fail:
+	/*
+	 * On any failure our adventure is over. Clean up and
+	 * set nr_rings to 1 to force preemption off
+	 */
+	a5xx_preempt_fini(gpu);
+	gpu->nr_rings = 1;
 }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index de63ff26a062..cccd437e6ea3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -552,6 +552,17 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.ioname = "kgsl_3d0_reg_memory";
 	adreno_gpu_config.irqname = "kgsl_3d0_irq";
 
+	if (adreno_is_a5xx(adreno_gpu)) {
+		/*
+		 * If possible use the TTBR1 virtual address space for all the
+		 * "global" buffer objects which are shared between processes.
+		 * This leaves the lower virtual address space open for
+		 * per-instance pagables if they are available
+		 */
+		adreno_gpu_config.va_start_global = 0xfffffff800000000ULL;
+		adreno_gpu_config.va_end_global = 0xfffffff8ffffffffULL;
+	}
+
 	adreno_gpu_config.va_start = SZ_16M;
 	adreno_gpu_config.va_end = 0xffffffff;
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 8d3d0a924908..1b2021c884f7 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -197,6 +197,11 @@ static inline int adreno_is_a530(struct adreno_gpu *gpu)
 	return gpu->revn == 530;
 }
 
+static inline bool adreno_is_a5xx(struct adreno_gpu *gpu)
+{
+	return ((gpu->revn >= 500) & (gpu->revn < 600));
+}
+
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 const struct firmware *adreno_request_fw(struct adreno_gpu *adreno_gpu,
 		const char *fwname);
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index cffce094aecb..fd71484d5894 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -26,6 +26,7 @@
 struct msm_rbmemptrs {
 	volatile uint32_t rptr;
 	volatile uint32_t fence;
+	volatile uint64_t ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.16.1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/14] iommu: Create a base struct for io_mm
  2018-02-21 22:59     ` Jordan Crouse
@ 2018-03-02 12:25         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:25 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Yisheng Xie, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Bob Liu,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Jordan,

Thank you for this, SMMUv3 and virtio-iommu need these SVA patches as well.

On 21/02/18 22:59, Jordan Crouse wrote:
[...]> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e2c49e583d8d..e998389cf195 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -110,8 +110,17 @@ struct iommu_domain {
>  	struct list_head mm_list;
>  };
>  
> +enum iommu_io_type {
> +	IO_TYPE_MM,
> +};
> +
> +struct io_base {
> +	int type;
> +	int pasid;
> +};

"io_base" is a bit vague. I'm bad at naming so my opinion doesn't hold
much water, but I'd rather this be something like "io_mm_base". When I
initially toyed with the idea I intended to keep io_mm as parent structure
and have "private" and "shared" sub-structures. Even if private PASIDs
don't rely on the kernel mm subsystem, this structure would still
represent an I/O mm of sorts, with a pgd and pgtable info.

Thanks,
Jean

> +
>  struct io_mm {
> -	int			pasid;
> +	struct io_base		base;
>  	struct list_head	devices;
>  	struct kref		kref;
>  #if defined(CONFIG_MMU_NOTIFIER)
> 

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/14] iommu: Create a base struct for io_mm
@ 2018-03-02 12:25         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:25 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jordan,

Thank you for this, SMMUv3 and virtio-iommu need these SVA patches as well.

On 21/02/18 22:59, Jordan Crouse wrote:
[...]> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index e2c49e583d8d..e998389cf195 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -110,8 +110,17 @@ struct iommu_domain {
>  	struct list_head mm_list;
>  };
>  
> +enum iommu_io_type {
> +	IO_TYPE_MM,
> +};
> +
> +struct io_base {
> +	int type;
> +	int pasid;
> +};

"io_base" is a bit vague. I'm bad at naming so my opinion doesn't hold
much water, but I'd rather this be something like "io_mm_base". When I
initially toyed with the idea I intended to keep io_mm as parent structure
and have "private" and "shared" sub-structures. Even if private PASIDs
don't rely on the kernel mm subsystem, this structure would still
represent an I/O mm of sorts, with a pgd and pgtable info.

Thanks,
Jean

> +
>  struct io_mm {
> -	int			pasid;
> +	struct io_base		base;
>  	struct list_head	devices;
>  	struct kref		kref;
>  #if defined(CONFIG_MMU_NOTIFIER)
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/14] iommu: sva: Add support for pasid allocation
  2018-02-21 22:59     ` Jordan Crouse
@ 2018-03-02 12:27         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:27 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Yisheng Xie, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Bob Liu,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 21/02/18 22:59, Jordan Crouse wrote:
[...]
> +int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
> +{
> +	int ret, pasid;
> +	struct io_pasid *io_pasid;
> +
> +	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
> +		return -ENODEV;
> +
> +	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
> +	if (!io_pasid)
> +		return -ENOMEM;
> +
> +	io_pasid->domain = domain;
> +	io_pasid->base.type = IO_TYPE_PASID;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
> +		1, (1 << 31), GFP_ATOMIC);

To be usable by other IOMMUs, this should restrict the PASID range to what
the IOMMU and the device support like io_mm_alloc(). In your case 31 bits,
but with PCI PASID it depends on the PASID capability and the SMMU
SubstreamID range.

For this reason I think device drivers should call iommu_sva_device_init()
once, even for the alloc_pasid() API. For SMMUv2 I guess it will be a NOP,
but other IOMMUs will allocate PASID tables and enable features in the
device. In addition, knowing that all users of the API call
iommu_sva_device_init()/shutdown() could allow us to allocate and enable
stuff lazily in the future.

It would also allow a given device driver to use both
iommu_sva_pasid_alloc() and iommu_sva_bind() at the same time. So that the
driver can assigns contexts to userspace and still use some of them for
management.

[...]
> +int iommu_sva_map(int pasid, unsigned long iova,
> +	      phys_addr_t paddr, size_t size, int prot)

It would be nice to factor iommu_map(), since this logic for map, map_sg
and unmap should be the same regardless of the PASID argument.

For example
- iommu_sva_map(domain, pasid, ...)
- iommu_map(domain, ...)

both call
- __iommu_map(domain, pasid, ...)

which calls either
- ops->map(domain, ...)
- ops->sva_map(domain, pasid, ...)

[...]
> @@ -347,6 +353,15 @@ struct iommu_ops {
>  	int (*page_response)(struct iommu_domain *domain, struct device *dev,
>  			     struct page_response_msg *msg);
>  
> +	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
> +		int pasid);
> +	int (*sva_map)(struct iommu_domain *domain, int pasid,
> +		       unsigned long iova, phys_addr_t paddr, size_t size,
> +		       int prot);
> +	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
> +			    unsigned long iova, size_t size);
> +	void (*pasid_free)(struct iommu_domain *domain, int pasid);
> +

Hmm, now IOMMU has the following ops:

* mm_alloc(): allocates a shared mm structure
* mm_attach(): writes the entry in the PASID table
* mm_detach(): removes the entry from the PASID table and invalidates it
* mm_free(): free shared mm
* pasid_alloc(): allocates a pasid structure (which I usually call
"private mm") and write the entry in the PASID table (or call
install_pasid() for SMMUv2)
* pasid_free(): remove from the PASID table (or call remove_pasid()) and
free the pasid structure.

Splitting mm_alloc and mm_attach is necessary because the io_mm in my case
can be shared between devices (allocated once, attached multiple times).
In your case a PASID is private to one device so only one callback is
needed. However mm_alloc+mm_attach will do roughly the same as
pasid_alloc, so to reduce redundancy in iommu_ops, maybe we could reuse
mm_alloc and mm_attach for the private PASID case?

Thanks,
Jean
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 04/14] iommu: sva: Add support for pasid allocation
@ 2018-03-02 12:27         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:27 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/02/18 22:59, Jordan Crouse wrote:
[...]
> +int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
> +{
> +	int ret, pasid;
> +	struct io_pasid *io_pasid;
> +
> +	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
> +		return -ENODEV;
> +
> +	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
> +	if (!io_pasid)
> +		return -ENOMEM;
> +
> +	io_pasid->domain = domain;
> +	io_pasid->base.type = IO_TYPE_PASID;
> +
> +	idr_preload(GFP_KERNEL);
> +	spin_lock(&iommu_sva_lock);
> +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
> +		1, (1 << 31), GFP_ATOMIC);

To be usable by other IOMMUs, this should restrict the PASID range to what
the IOMMU and the device support like io_mm_alloc(). In your case 31 bits,
but with PCI PASID it depends on the PASID capability and the SMMU
SubstreamID range.

For this reason I think device drivers should call iommu_sva_device_init()
once, even for the alloc_pasid() API. For SMMUv2 I guess it will be a NOP,
but other IOMMUs will allocate PASID tables and enable features in the
device. In addition, knowing that all users of the API call
iommu_sva_device_init()/shutdown() could allow us to allocate and enable
stuff lazily in the future.

It would also allow a given device driver to use both
iommu_sva_pasid_alloc() and iommu_sva_bind()@the same time. So that the
driver can assigns contexts to userspace and still use some of them for
management.

[...]
> +int iommu_sva_map(int pasid, unsigned long iova,
> +	      phys_addr_t paddr, size_t size, int prot)

It would be nice to factor iommu_map(), since this logic for map, map_sg
and unmap should be the same regardless of the PASID argument.

For example
- iommu_sva_map(domain, pasid, ...)
- iommu_map(domain, ...)

both call
- __iommu_map(domain, pasid, ...)

which calls either
- ops->map(domain, ...)
- ops->sva_map(domain, pasid, ...)

[...]
> @@ -347,6 +353,15 @@ struct iommu_ops {
>  	int (*page_response)(struct iommu_domain *domain, struct device *dev,
>  			     struct page_response_msg *msg);
>  
> +	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
> +		int pasid);
> +	int (*sva_map)(struct iommu_domain *domain, int pasid,
> +		       unsigned long iova, phys_addr_t paddr, size_t size,
> +		       int prot);
> +	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
> +			    unsigned long iova, size_t size);
> +	void (*pasid_free)(struct iommu_domain *domain, int pasid);
> +

Hmm, now IOMMU has the following ops:

* mm_alloc(): allocates a shared mm structure
* mm_attach(): writes the entry in the PASID table
* mm_detach(): removes the entry from the PASID table and invalidates it
* mm_free(): free shared mm
* pasid_alloc(): allocates a pasid structure (which I usually call
"private mm") and write the entry in the PASID table (or call
install_pasid() for SMMUv2)
* pasid_free(): remove from the PASID table (or call remove_pasid()) and
free the pasid structure.

Splitting mm_alloc and mm_attach is necessary because the io_mm in my case
can be shared between devices (allocated once, attached multiple times).
In your case a PASID is private to one device so only one callback is
needed. However mm_alloc+mm_attach will do roughly the same as
pasid_alloc, so to reduce redundancy in iommu_ops, maybe we could reuse
mm_alloc and mm_attach for the private PASID case?

Thanks,
Jean

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 11/14] drm/msm: Add support for iommu-sva PASIDs
  2018-02-21 22:59     ` Jordan Crouse
@ 2018-03-02 12:29         ` Jean-Philippe Brucker
  -1 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:29 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 21/02/18 22:59, Jordan Crouse wrote:
[...]> +static int install_pasid_cb(int pasid, u64 ttbr, u32 asid, void *data)
> +{
> +	struct pasid_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	entry->pasid = pasid;
> +	entry->ttbr = ttbr;
> +	entry->asid = asid;
> +
> +	/* FIXME: Assume that we'll never have a pasid conflict? */

I think a conflict would be a bug on the IOMMU side. Users should not have
to check this. Then again, I have a few WARNs on the SMMUv3 context table
code that uncovered nasty bugs during development.

Thanks,
Jean
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 11/14] drm/msm: Add support for iommu-sva PASIDs
@ 2018-03-02 12:29         ` Jean-Philippe Brucker
  0 siblings, 0 replies; 46+ messages in thread
From: Jean-Philippe Brucker @ 2018-03-02 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/02/18 22:59, Jordan Crouse wrote:
[...]> +static int install_pasid_cb(int pasid, u64 ttbr, u32 asid, void *data)
> +{
> +	struct pasid_entry *entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> +
> +	if (!entry)
> +		return -ENOMEM;
> +
> +	entry->pasid = pasid;
> +	entry->ttbr = ttbr;
> +	entry->asid = asid;
> +
> +	/* FIXME: Assume that we'll never have a pasid conflict? */

I think a conflict would be a bug on the IOMMU side. Users should not have
to check this. Then again, I have a few WARNs on the SMMUv3 context table
code that uncovered nasty bugs during development.

Thanks,
Jean

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
  2018-02-21 22:59     ` Jordan Crouse
@ 2018-03-02 14:56         ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-03-02 14:56 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 21/02/18 22:59, Jordan Crouse wrote:
> Add a new domain attribute to enable the TTBR1 pagetable for drivers
> and devices that support it.  This will enabled using a TTBR1 (otherwise
> known as a "global" or "system" pagetable for devices that support a split
> pagetable scheme for switching pagetables quickly and safely.

TTBR1 is very much an Arm VMSA-specific term; if the concept of a split 
address space is useful in general, is it worth trying to frame it in 
general terms? AFAICS other IOMMU drivers could achieve the same effect 
fairly straightforwardly by simply copying the top-level "global" 
entries across whenever they switch "private" tables.

FWIW even for SMMU there could potentially be cases with Arm Ltd. IP 
where the SoC vendor implements a stage-2-only configuration in their 
media subsystem, because they care most about minimising area and 
stage-1-only isn't an option.

Robin.

> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>   include/linux/iommu.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 641aaf0f1b81..e2c49e583d8d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -153,6 +153,7 @@ enum iommu_attr {
>   	DOMAIN_ATTR_FSL_PAMU_ENABLE,
>   	DOMAIN_ATTR_FSL_PAMUV1,
>   	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> +	DOMAIN_ATTR_ENABLE_TTBR1,
>   	DOMAIN_ATTR_MAX,
>   };
>   
> 
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1
@ 2018-03-02 14:56         ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-03-02 14:56 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/02/18 22:59, Jordan Crouse wrote:
> Add a new domain attribute to enable the TTBR1 pagetable for drivers
> and devices that support it.  This will enabled using a TTBR1 (otherwise
> known as a "global" or "system" pagetable for devices that support a split
> pagetable scheme for switching pagetables quickly and safely.

TTBR1 is very much an Arm VMSA-specific term; if the concept of a split 
address space is useful in general, is it worth trying to frame it in 
general terms? AFAICS other IOMMU drivers could achieve the same effect 
fairly straightforwardly by simply copying the top-level "global" 
entries across whenever they switch "private" tables.

FWIW even for SMMU there could potentially be cases with Arm Ltd. IP 
where the SoC vendor implements a stage-2-only configuration in their 
media subsystem, because they care most about minimising area and 
stage-1-only isn't an option.

Robin.

> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>   include/linux/iommu.h | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 641aaf0f1b81..e2c49e583d8d 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -153,6 +153,7 @@ enum iommu_attr {
>   	DOMAIN_ATTR_FSL_PAMU_ENABLE,
>   	DOMAIN_ATTR_FSL_PAMUV1,
>   	DOMAIN_ATTR_NESTING,	/* two stages of translation */
> +	DOMAIN_ATTR_ENABLE_TTBR1,
>   	DOMAIN_ATTR_MAX,
>   };
>   
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/14] iommu: Create a base struct for io_mm
  2018-03-02 12:25         ` Jean-Philippe Brucker
@ 2018-03-02 16:14             ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 16:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: Yisheng Xie, linux-arm-msm-u79uwXL29TY76Z2rM5mHXA, Bob Liu,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	tfiga-F7+t8E8rja9g9hUCZPvPmw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	vivek.gautam-sgV2jX0FEOL9JmXXK+q4OQ,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Mar 02, 2018 at 12:25:48PM +0000, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> Thank you for this, SMMUv3 and virtio-iommu need these SVA patches as well.
> 
> On 21/02/18 22:59, Jordan Crouse wrote:
> [...]> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index e2c49e583d8d..e998389cf195 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -110,8 +110,17 @@ struct iommu_domain {
> >  	struct list_head mm_list;
> >  };
> >  
> > +enum iommu_io_type {
> > +	IO_TYPE_MM,
> > +};
> > +
> > +struct io_base {
> > +	int type;
> > +	int pasid;
> > +};
> 
> "io_base" is a bit vague. I'm bad at naming so my opinion doesn't hold
> much water, but I'd rather this be something like "io_mm_base". When I
> initially toyed with the idea I intended to keep io_mm as parent structure
> and have "private" and "shared" sub-structures. Even if private PASIDs
> don't rely on the kernel mm subsystem, this structure would still
> represent an I/O mm of sorts, with a pgd and pgtable info.

I'm also bad at naming but I don't mind changing it. io_mm_base seems okay to me
unless somebody has a better idea. I also like the terms "private" and
"shared". I'm going to start adopting those where it makes sense.

Jordan
> 
> > +
> >  struct io_mm {
> > -	int			pasid;
> > +	struct io_base		base;
> >  	struct list_head	devices;
> >  	struct kref		kref;
> >  #if defined(CONFIG_MMU_NOTIFIER)
> > 
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/14] iommu: Create a base struct for io_mm
@ 2018-03-02 16:14             ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 16:14 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 02, 2018 at 12:25:48PM +0000, Jean-Philippe Brucker wrote:
> Hi Jordan,
> 
> Thank you for this, SMMUv3 and virtio-iommu need these SVA patches as well.
> 
> On 21/02/18 22:59, Jordan Crouse wrote:
> [...]> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > index e2c49e583d8d..e998389cf195 100644
> > --- a/include/linux/iommu.h
> > +++ b/include/linux/iommu.h
> > @@ -110,8 +110,17 @@ struct iommu_domain {
> >  	struct list_head mm_list;
> >  };
> >  
> > +enum iommu_io_type {
> > +	IO_TYPE_MM,
> > +};
> > +
> > +struct io_base {
> > +	int type;
> > +	int pasid;
> > +};
> 
> "io_base" is a bit vague. I'm bad at naming so my opinion doesn't hold
> much water, but I'd rather this be something like "io_mm_base". When I
> initially toyed with the idea I intended to keep io_mm as parent structure
> and have "private" and "shared" sub-structures. Even if private PASIDs
> don't rely on the kernel mm subsystem, this structure would still
> represent an I/O mm of sorts, with a pgd and pgtable info.

I'm also bad at naming but I don't mind changing it. io_mm_base seems okay to me
unless somebody has a better idea. I also like the terms "private" and
"shared". I'm going to start adopting those where it makes sense.

Jordan
> 
> > +
> >  struct io_mm {
> > -	int			pasid;
> > +	struct io_base		base;
> >  	struct list_head	devices;
> >  	struct kref		kref;
> >  #if defined(CONFIG_MMU_NOTIFIER)
> > 
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/14] iommu: sva: Add support for pasid allocation
  2018-03-02 12:27         ` Jean-Philippe Brucker
@ 2018-03-02 16:23             ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 16:23 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Mar 02, 2018 at 12:27:58PM +0000, Jean-Philippe Brucker wrote:
> On 21/02/18 22:59, Jordan Crouse wrote:
> [...]
> > +int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	int ret, pasid;
> > +	struct io_pasid *io_pasid;
> > +
> > +	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
> > +		return -ENODEV;
> > +
> > +	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
> > +	if (!io_pasid)
> > +		return -ENOMEM;
> > +
> > +	io_pasid->domain = domain;
> > +	io_pasid->base.type = IO_TYPE_PASID;
> > +
> > +	idr_preload(GFP_KERNEL);
> > +	spin_lock(&iommu_sva_lock);
> > +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
> > +		1, (1 << 31), GFP_ATOMIC);
> 
> To be usable by other IOMMUs, this should restrict the PASID range to what
> the IOMMU and the device support like io_mm_alloc(). In your case 31 bits,
> but with PCI PASID it depends on the PASID capability and the SMMU
> SubstreamID range.
> 
> For this reason I think device drivers should call iommu_sva_device_init()
> once, even for the alloc_pasid() API. For SMMUv2 I guess it will be a NOP,
> but other IOMMUs will allocate PASID tables and enable features in the
> device. In addition, knowing that all users of the API call
> iommu_sva_device_init()/shutdown() could allow us to allocate and enable
> stuff lazily in the future.
> 
> It would also allow a given device driver to use both
> iommu_sva_pasid_alloc() and iommu_sva_bind() at the same time. So that the
> driver can assigns contexts to userspace and still use some of them for
> management.

No problem.

> [...]
> > +int iommu_sva_map(int pasid, unsigned long iova,
> > +	      phys_addr_t paddr, size_t size, int prot)
> 
> It would be nice to factor iommu_map(), since this logic for map, map_sg
> and unmap should be the same regardless of the PASID argument.
> 
> For example
> - iommu_sva_map(domain, pasid, ...)
> - iommu_map(domain, ...)
> 
> both call
> - __iommu_map(domain, pasid, ...)
> 
> which calls either
> - ops->map(domain, ...)
> - ops->sva_map(domain, pasid, ...)

Agree.  I was kind of annoyed at the code duplication - this would be a good way
to handle it.

> [...]
> > @@ -347,6 +353,15 @@ struct iommu_ops {
> >  	int (*page_response)(struct iommu_domain *domain, struct device *dev,
> >  			     struct page_response_msg *msg);
> >  
> > +	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
> > +		int pasid);
> > +	int (*sva_map)(struct iommu_domain *domain, int pasid,
> > +		       unsigned long iova, phys_addr_t paddr, size_t size,
> > +		       int prot);
> > +	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
> > +			    unsigned long iova, size_t size);
> > +	void (*pasid_free)(struct iommu_domain *domain, int pasid);
> > +
> 
> Hmm, now IOMMU has the following ops:
> 
> * mm_alloc(): allocates a shared mm structure
> * mm_attach(): writes the entry in the PASID table
> * mm_detach(): removes the entry from the PASID table and invalidates it
> * mm_free(): free shared mm
> * pasid_alloc(): allocates a pasid structure (which I usually call
> "private mm") and write the entry in the PASID table (or call
> install_pasid() for SMMUv2)
> * pasid_free(): remove from the PASID table (or call remove_pasid()) and
> free the pasid structure.
> 
> Splitting mm_alloc and mm_attach is necessary because the io_mm in my case
> can be shared between devices (allocated once, attached multiple times).
> In your case a PASID is private to one device so only one callback is
> needed. However mm_alloc+mm_attach will do roughly the same as
> pasid_alloc, so to reduce redundancy in iommu_ops, maybe we could reuse
> mm_alloc and mm_attach for the private PASID case?

Okay - let me bang on it and see what we can clean up.  Thanks for the review.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 04/14] iommu: sva: Add support for pasid allocation
@ 2018-03-02 16:23             ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 16:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 02, 2018 at 12:27:58PM +0000, Jean-Philippe Brucker wrote:
> On 21/02/18 22:59, Jordan Crouse wrote:
> [...]
> > +int iommu_sva_alloc_pasid(struct iommu_domain *domain, struct device *dev)
> > +{
> > +	int ret, pasid;
> > +	struct io_pasid *io_pasid;
> > +
> > +	if (!domain->ops->pasid_alloc || !domain->ops->pasid_free)
> > +		return -ENODEV;
> > +
> > +	io_pasid = kzalloc(sizeof(*io_pasid), GFP_KERNEL);
> > +	if (!io_pasid)
> > +		return -ENOMEM;
> > +
> > +	io_pasid->domain = domain;
> > +	io_pasid->base.type = IO_TYPE_PASID;
> > +
> > +	idr_preload(GFP_KERNEL);
> > +	spin_lock(&iommu_sva_lock);
> > +	pasid = idr_alloc_cyclic(&iommu_pasid_idr, &io_pasid->base,
> > +		1, (1 << 31), GFP_ATOMIC);
> 
> To be usable by other IOMMUs, this should restrict the PASID range to what
> the IOMMU and the device support like io_mm_alloc(). In your case 31 bits,
> but with PCI PASID it depends on the PASID capability and the SMMU
> SubstreamID range.
> 
> For this reason I think device drivers should call iommu_sva_device_init()
> once, even for the alloc_pasid() API. For SMMUv2 I guess it will be a NOP,
> but other IOMMUs will allocate PASID tables and enable features in the
> device. In addition, knowing that all users of the API call
> iommu_sva_device_init()/shutdown() could allow us to allocate and enable
> stuff lazily in the future.
> 
> It would also allow a given device driver to use both
> iommu_sva_pasid_alloc() and iommu_sva_bind() at the same time. So that the
> driver can assigns contexts to userspace and still use some of them for
> management.

No problem.

> [...]
> > +int iommu_sva_map(int pasid, unsigned long iova,
> > +	      phys_addr_t paddr, size_t size, int prot)
> 
> It would be nice to factor iommu_map(), since this logic for map, map_sg
> and unmap should be the same regardless of the PASID argument.
> 
> For example
> - iommu_sva_map(domain, pasid, ...)
> - iommu_map(domain, ...)
> 
> both call
> - __iommu_map(domain, pasid, ...)
> 
> which calls either
> - ops->map(domain, ...)
> - ops->sva_map(domain, pasid, ...)

Agree.  I was kind of annoyed at the code duplication - this would be a good way
to handle it.

> [...]
> > @@ -347,6 +353,15 @@ struct iommu_ops {
> >  	int (*page_response)(struct iommu_domain *domain, struct device *dev,
> >  			     struct page_response_msg *msg);
> >  
> > +	int (*pasid_alloc)(struct iommu_domain *domain, struct device *dev,
> > +		int pasid);
> > +	int (*sva_map)(struct iommu_domain *domain, int pasid,
> > +		       unsigned long iova, phys_addr_t paddr, size_t size,
> > +		       int prot);
> > +	size_t (*sva_unmap)(struct iommu_domain *domain, int pasid,
> > +			    unsigned long iova, size_t size);
> > +	void (*pasid_free)(struct iommu_domain *domain, int pasid);
> > +
> 
> Hmm, now IOMMU has the following ops:
> 
> * mm_alloc(): allocates a shared mm structure
> * mm_attach(): writes the entry in the PASID table
> * mm_detach(): removes the entry from the PASID table and invalidates it
> * mm_free(): free shared mm
> * pasid_alloc(): allocates a pasid structure (which I usually call
> "private mm") and write the entry in the PASID table (or call
> install_pasid() for SMMUv2)
> * pasid_free(): remove from the PASID table (or call remove_pasid()) and
> free the pasid structure.
> 
> Splitting mm_alloc and mm_attach is necessary because the io_mm in my case
> can be shared between devices (allocated once, attached multiple times).
> In your case a PASID is private to one device so only one callback is
> needed. However mm_alloc+mm_attach will do roughly the same as
> pasid_alloc, so to reduce redundancy in iommu_ops, maybe we could reuse
> mm_alloc and mm_attach for the private PASID case?

Okay - let me bang on it and see what we can clean up.  Thanks for the review.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
  2018-02-21 22:59     ` Jordan Crouse
@ 2018-03-02 17:57         ` Robin Murphy
  -1 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-03-02 17:57 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 21/02/18 22:59, Jordan Crouse wrote:
> Allow a SMMU device to opt into allocating a TTBR1 pagetable.
> 
> The size of the TTBR1 region will be the same as
> the TTBR0 size with the sign extension bit set on the highest
> bit in the region unless the upstream size is 49 bits and then
> the sign-extension bit will be set on the 49th bit.

Um, isn't the 49th bit still "the highest bit" if the address size is 49 
bits? ;)

> The map/unmap operations will automatically use the appropriate
> pagetable based on the specified iova and the existing mask.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>   drivers/iommu/arm-smmu-regs.h  |   2 -
>   drivers/iommu/arm-smmu.c       |  22 ++++--
>   drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
>   drivers/iommu/io-pgtable-arm.h |  20 ++++++
>   drivers/iommu/io-pgtable.h     |  16 ++++-
>   5 files changed, 192 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> index a1226e4ab5f8..0ce85d5b22e9 100644
> --- a/drivers/iommu/arm-smmu-regs.h
> +++ b/drivers/iommu/arm-smmu-regs.h
> @@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
>   #define RESUME_RETRY			(0 << 0)
>   #define RESUME_TERMINATE		(1 << 0)
>   
> -#define TTBCR2_SEP_SHIFT		15
> -#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
>   #define TTBCR2_AS			(1 << 4)
>   
>   #define TTBRn_ASID_SHIFT		48
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60792a8..ebfa59b59622 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -248,6 +248,7 @@ struct arm_smmu_domain {
>   	enum arm_smmu_domain_stage	stage;
>   	struct mutex			init_mutex; /* Protects smmu pointer */
>   	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
> +	u32 attributes;
>   	struct iommu_domain		domain;
>   };
>   
> @@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>   		} else {
>   			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>   			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> -			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
>   			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
>   				cb->tcr[1] |= TTBCR2_AS;
>   		}
> @@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   	enum io_pgtable_fmt fmt;
>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	unsigned int quirks =
> +		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
> +			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
>   
>   	mutex_lock(&smmu_domain->init_mutex);
>   	if (smmu_domain->smmu)
> @@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   	else
>   		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
>   
> +	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> +		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
> +
>   	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.quirks		= quirks,
>   		.pgsize_bitmap	= smmu->pgsize_bitmap,
>   		.ias		= ias,
>   		.oas		= oas,
> @@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   		.iommu_dev	= smmu->dev,
>   	};
>   
> -	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> -		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
> -
>   	smmu_domain->smmu = smmu;
>   	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>   	if (!pgtbl_ops) {
> @@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>   	case DOMAIN_ATTR_NESTING:
>   		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
>   		return 0;
> +	case DOMAIN_ATTR_ENABLE_TTBR1:
> +		*((int *)data) = !!(smmu_domain->attributes
> +					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
> +		return 0;
>   	default:
>   		return -ENODEV;
>   	}
> @@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
>   		else
>   			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
>   
> +		break;
> +	case DOMAIN_ATTR_ENABLE_TTBR1:
> +		if (*((int *)data))
> +			smmu_domain->attributes |=
> +				1 << DOMAIN_ATTR_ENABLE_TTBR1;
> +		ret = 0;
>   		break;
>   	default:
>   		ret = -ENODEV;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index fff0b6ba0a69..1bd0045f2cb7 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
>   	unsigned long		pg_shift;
>   	unsigned long		bits_per_level;
>   
> -	void			*pgd;
> +	void			*pgd[2];

This might be reasonable for short-descriptor, but I really don't like 
it for LPAE. The two tables are more or less independent in terms of 
size, granule, etc., so this brings in a lot of artificial coupling.

I think it would be a lot cleaner for io-pgtable to have little or no 
knowledge of this, and it be down to the caller to allocate two tables 
and merge the TCRs, then dispatch maps/unmaps to the appropriate table 
by itself.

>   };
>   
>   typedef u64 arm_lpae_iopte;
> @@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>   	return pte;
>   }
>   
> +static inline arm_lpae_iopte *
> +arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
> +{
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
> +		unsigned long mask;
> +
> +		/*
> +		 * if ias is 48 it really means that bit 48 is the sign
> +		 * extension bit, otherwise the sign extension bit is ias - 1
> +		 * (for example, bit 31 for ias 32)
> +		 */
> +		mask = (cfg->ias == 48) ? (1UL << 48) :
> +			(1UL << (cfg->ias - 1));

This would look less silly if it was done in the SMMU driver where the 
original UBS information is directly to hand, instead of having to 
reverse-engineer it from the pagetable config (on every operation, no less).

That said, it's still going to be pretty fragile in general, since on 
SMMUv1 the UBS is merely guessed at from the IPA size, and either way 
all it tells you is what the SMMU knows about its own interfaces; it 
doesn't have a clue how many bits the masters connected to said 
interface(s) are actually capable of driving.

> +
> +		if (iova & mask)
> +			return data->pgd[1];
> +	}
> +
> +	return data->pgd[0];
> +}
> +
>   static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>   			phys_addr_t paddr, size_t size, int iommu_prot)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep;
>   	int ret, lvl = ARM_LPAE_START_LVL(data);
>   	arm_lpae_iopte prot;
>   
> +	ptep = arm_lpae_get_table(data, iova);
> +
>   	/* If no access, then nothing to do */
>   	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
>   		return 0;
>   
> -	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> -		    paddr >= (1ULL << data->iop.cfg.oas)))
> +	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
> +		return -ERANGE;
> +
> +	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> +		    iova >= (1ULL << data->iop.cfg.ias)))
>   		return -ERANGE;
>   
>   	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
>   
> -	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
> +	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
> +	if (data->pgd[1])
> +		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
> +			data->pgd[1]);
>   	kfree(data);
>   }
>   
> @@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>   			  size_t size)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep;
>   	int lvl = ARM_LPAE_START_LVL(data);
>   
> -	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
> +	ptep = arm_lpae_get_table(data, iova);
> +
> +	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> +		    iova >= (1ULL << data->iop.cfg.ias)))
>   		return 0;
>   
>   	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
> @@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>   					 unsigned long iova)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte pte, *ptep = data->pgd;
> +	arm_lpae_iopte pte, *ptep;
>   	int lvl = ARM_LPAE_START_LVL(data);
>   
> +	ptep = arm_lpae_get_table(data, iova);
> +
>   	do {
>   		/* Valid IOPTE pointer? */
>   		if (!ptep)
> @@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
>   	return data;
>   }
>   
> +static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
> +		struct arm_lpae_io_pgtable *data)
> +
> +{
> +	u64 reg;
> +
> +	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
> +	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
> +		reg = ARM_LPAE_TCR_EPD1;
> +		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> +		return reg;
> +	}
> +
> +	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
> +	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
> +	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
> +
> +	switch (1 << data->pg_shift) {
> +	case SZ_4K:
> +		reg |= ARM_LPAE_TCR_TG1_4K;
> +		break;
> +	case SZ_16K:
> +		reg |= ARM_LPAE_TCR_TG1_16K;
> +		break;
> +	case SZ_64K:
> +		reg |= ARM_LPAE_TCR_TG1_64K;
> +		break;
> +	}
> +
> +	/* Set T1SZ */
> +	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
> +
> +	/* Set the SEP bit based on the size */
> +	switch (cfg->ias) {
> +	case 32:
> +		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 36:
> +		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 40:
> +		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 42:
> +		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 44:
> +		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 48:
> +		/*
> +		 * If ias is 48 then that probably means that the UBS on the
> +		 * device was 0101b (49) which is a special case that assumes
> +		 * bit 48 is the sign extension bit. In this case we are
> +		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
> +		 * the extension bit. One might be confused because there is
> +		 * also an option to set the SEP to bit 47 but this is probably
> +		 * not what the arm-smmu driver intended.
> +		 */

Again, a clear sign that this probably isn't the most appropriate place 
to be trying to handle this.

> +	default:
> +		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	}
> +
> +	return reg;
> +}
> +
>   static struct io_pgtable *
>   arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   {
>   	u64 reg;
>   	struct arm_lpae_io_pgtable *data;
>   
> -	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
> +	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> +			IO_PGTABLE_QUIRK_NO_DMA |
> +			IO_PGTABLE_QUIRK_ARM_TTBR1))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> @@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
>   
> -	/* Disable speculative walks through TTBR1 */
> -	reg |= ARM_LPAE_TCR_EPD1;
> +	/* Bring in the TTBR1 configuration */
> +	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
> +
>   	cfg->arm_lpae_s1_cfg.tcr = reg;
>   
>   	/* MAIRs */
> @@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   	cfg->arm_lpae_s1_cfg.mair[1] = 0;
>   
>   	/* Looking good; allocate a pgd */
> -	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> +	if (!data->pgd[0])
>   		goto out_free_data;
>   
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> +		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
> +			GFP_KERNEL, cfg);
> +		if (!data->pgd[1]) {
> +			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
> +				cfg);
> +			goto out_free_data;
> +		}
> +	} else {
> +		data->pgd[1] = NULL;
> +	}
> +
>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>   	wmb();
>   
>   	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
> +
> +	if (data->pgd[1])
> +		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
> +
>   	return &data->iop;
>   
>   out_free_data:
> @@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>   	cfg->arm_lpae_s2_cfg.vtcr = reg;
>   
>   	/* Allocate pgd pages */
> -	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> +	if (!data->pgd[0])
>   		goto out_free_data;
>   
>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>   	wmb();
>   
>   	/* VTTBR */
> -	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
> +	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
>   	return &data->iop;
>   
>   out_free_data:
> @@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
>   		cfg->pgsize_bitmap, cfg->ias);
>   	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
>   		data->levels, data->pgd_size, data->pg_shift,
> -		data->bits_per_level, data->pgd);
> +		data->bits_per_level, data->pgd[0]);
>   }
>   
>   #define __FAIL(ops, i)	({						\
> diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
> index cb31314971ac..6344b1d359a5 100644
> --- a/drivers/iommu/io-pgtable-arm.h
> +++ b/drivers/iommu/io-pgtable-arm.h
> @@ -25,14 +25,21 @@
>   #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
>   #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
>   
> +#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
> +#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
> +#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
> +
>   #define ARM_LPAE_TCR_SH0_SHIFT		12
> +#define ARM_LPAE_TCR_SH1_SHIFT		28
>   #define ARM_LPAE_TCR_SH0_MASK		0x3
>   #define ARM_LPAE_TCR_SH_NS		0
>   #define ARM_LPAE_TCR_SH_OS		2
>   #define ARM_LPAE_TCR_SH_IS		3
>   
>   #define ARM_LPAE_TCR_ORGN0_SHIFT	10
> +#define ARM_LPAE_TCR_ORGN1_SHIFT	26
>   #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> +#define ARM_LPAE_TCR_IRGN1_SHIFT	24
>   #define ARM_LPAE_TCR_RGN_MASK		0x3
>   #define ARM_LPAE_TCR_RGN_NC		0
>   #define ARM_LPAE_TCR_RGN_WBWA		1
> @@ -45,6 +52,9 @@
>   #define ARM_LPAE_TCR_T0SZ_SHIFT		0
>   #define ARM_LPAE_TCR_SZ_MASK		0x3f
>   
> +#define ARM_LPAE_TCR_T1SZ_SHIFT         16
> +#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
> +
>   #define ARM_LPAE_TCR_PS_SHIFT		16
>   #define ARM_LPAE_TCR_PS_MASK		0x7
>   
> @@ -58,6 +68,16 @@
>   #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
>   #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
>   
> +#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
> +
> +#define ARM_LPAE_TCR_SEP_31		0x0ULL
> +#define ARM_LPAE_TCR_SEP_35		0x1ULL
> +#define ARM_LPAE_TCR_SEP_39		0x2ULL
> +#define ARM_LPAE_TCR_SEP_41		0x3ULL
> +#define ARM_LPAE_TCR_SEP_43		0x4ULL
> +#define ARM_LPAE_TCR_SEP_47		0x5ULL
> +#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
> +
>   #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
>   #define ARM_LPAE_MAIR_ATTR_MASK		0xff
>   #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index cd2e1eafffe6..55f7b60cc44d 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -71,12 +71,18 @@ struct io_pgtable_cfg {
>   	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
>   	 *	software-emulated IOMMU), such that pagetable updates need not
>   	 *	be treated as explicit DMA data.
> +	 *
> +	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
> +	 *	this domain. Set up the configuration registers and dyanmically
> +	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
> +	 *	based on the address.
>   	 */
>   	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>   	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>   	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>   	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
>   	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
>   	unsigned long			quirks;
>   	unsigned long			pgsize_bitmap;
>   	unsigned int			ias;
> @@ -173,18 +179,22 @@ struct io_pgtable {
>   
>   static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
>   {
> -	iop->cfg.tlb->tlb_flush_all(iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_flush_all(iop->cookie);

What's going on here?

It's not obvious how this change is relevant to TTBR1 support, and 
either way I can't see how an io_pgtable with no TLB ops wouldn't just 
be fundamentally broken.

Robin.

>   }
>   
>   static inline void io_pgtable_tlb_add_flush(struct io_pgtable *iop,
>   		unsigned long iova, size_t size, size_t granule, bool leaf)
>   {
> -	iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf, iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf,
> +			iop->cookie);
>   }
>   
>   static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
>   {
> -	iop->cfg.tlb->tlb_sync(iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_sync(iop->cookie);
>   }
>   
>   /**
> 
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
@ 2018-03-02 17:57         ` Robin Murphy
  0 siblings, 0 replies; 46+ messages in thread
From: Robin Murphy @ 2018-03-02 17:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/02/18 22:59, Jordan Crouse wrote:
> Allow a SMMU device to opt into allocating a TTBR1 pagetable.
> 
> The size of the TTBR1 region will be the same as
> the TTBR0 size with the sign extension bit set on the highest
> bit in the region unless the upstream size is 49 bits and then
> the sign-extension bit will be set on the 49th bit.

Um, isn't the 49th bit still "the highest bit" if the address size is 49 
bits? ;)

> The map/unmap operations will automatically use the appropriate
> pagetable based on the specified iova and the existing mask.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>   drivers/iommu/arm-smmu-regs.h  |   2 -
>   drivers/iommu/arm-smmu.c       |  22 ++++--
>   drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
>   drivers/iommu/io-pgtable-arm.h |  20 ++++++
>   drivers/iommu/io-pgtable.h     |  16 ++++-
>   5 files changed, 192 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> index a1226e4ab5f8..0ce85d5b22e9 100644
> --- a/drivers/iommu/arm-smmu-regs.h
> +++ b/drivers/iommu/arm-smmu-regs.h
> @@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
>   #define RESUME_RETRY			(0 << 0)
>   #define RESUME_TERMINATE		(1 << 0)
>   
> -#define TTBCR2_SEP_SHIFT		15
> -#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
>   #define TTBCR2_AS			(1 << 4)
>   
>   #define TTBRn_ASID_SHIFT		48
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 69e7c60792a8..ebfa59b59622 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -248,6 +248,7 @@ struct arm_smmu_domain {
>   	enum arm_smmu_domain_stage	stage;
>   	struct mutex			init_mutex; /* Protects smmu pointer */
>   	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
> +	u32 attributes;
>   	struct iommu_domain		domain;
>   };
>   
> @@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>   		} else {
>   			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>   			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> -			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
>   			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
>   				cb->tcr[1] |= TTBCR2_AS;
>   		}
> @@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   	enum io_pgtable_fmt fmt;
>   	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
>   	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +	unsigned int quirks =
> +		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
> +			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
>   
>   	mutex_lock(&smmu_domain->init_mutex);
>   	if (smmu_domain->smmu)
> @@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   	else
>   		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
>   
> +	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> +		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
> +
>   	pgtbl_cfg = (struct io_pgtable_cfg) {
> +		.quirks		= quirks,
>   		.pgsize_bitmap	= smmu->pgsize_bitmap,
>   		.ias		= ias,
>   		.oas		= oas,
> @@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
>   		.iommu_dev	= smmu->dev,
>   	};
>   
> -	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> -		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
> -
>   	smmu_domain->smmu = smmu;
>   	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
>   	if (!pgtbl_ops) {
> @@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>   	case DOMAIN_ATTR_NESTING:
>   		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
>   		return 0;
> +	case DOMAIN_ATTR_ENABLE_TTBR1:
> +		*((int *)data) = !!(smmu_domain->attributes
> +					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
> +		return 0;
>   	default:
>   		return -ENODEV;
>   	}
> @@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
>   		else
>   			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
>   
> +		break;
> +	case DOMAIN_ATTR_ENABLE_TTBR1:
> +		if (*((int *)data))
> +			smmu_domain->attributes |=
> +				1 << DOMAIN_ATTR_ENABLE_TTBR1;
> +		ret = 0;
>   		break;
>   	default:
>   		ret = -ENODEV;
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index fff0b6ba0a69..1bd0045f2cb7 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
>   	unsigned long		pg_shift;
>   	unsigned long		bits_per_level;
>   
> -	void			*pgd;
> +	void			*pgd[2];

This might be reasonable for short-descriptor, but I really don't like 
it for LPAE. The two tables are more or less independent in terms of 
size, granule, etc., so this brings in a lot of artificial coupling.

I think it would be a lot cleaner for io-pgtable to have little or no 
knowledge of this, and it be down to the caller to allocate two tables 
and merge the TCRs, then dispatch maps/unmaps to the appropriate table 
by itself.

>   };
>   
>   typedef u64 arm_lpae_iopte;
> @@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>   	return pte;
>   }
>   
> +static inline arm_lpae_iopte *
> +arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
> +{
> +	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
> +		unsigned long mask;
> +
> +		/*
> +		 * if ias is 48 it really means that bit 48 is the sign
> +		 * extension bit, otherwise the sign extension bit is ias - 1
> +		 * (for example, bit 31 for ias 32)
> +		 */
> +		mask = (cfg->ias == 48) ? (1UL << 48) :
> +			(1UL << (cfg->ias - 1));

This would look less silly if it was done in the SMMU driver where the 
original UBS information is directly to hand, instead of having to 
reverse-engineer it from the pagetable config (on every operation, no less).

That said, it's still going to be pretty fragile in general, since on 
SMMUv1 the UBS is merely guessed at from the IPA size, and either way 
all it tells you is what the SMMU knows about its own interfaces; it 
doesn't have a clue how many bits the masters connected to said 
interface(s) are actually capable of driving.

> +
> +		if (iova & mask)
> +			return data->pgd[1];
> +	}
> +
> +	return data->pgd[0];
> +}
> +
>   static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>   			phys_addr_t paddr, size_t size, int iommu_prot)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep;
>   	int ret, lvl = ARM_LPAE_START_LVL(data);
>   	arm_lpae_iopte prot;
>   
> +	ptep = arm_lpae_get_table(data, iova);
> +
>   	/* If no access, then nothing to do */
>   	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
>   		return 0;
>   
> -	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> -		    paddr >= (1ULL << data->iop.cfg.oas)))
> +	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
> +		return -ERANGE;
> +
> +	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> +		    iova >= (1ULL << data->iop.cfg.ias)))
>   		return -ERANGE;
>   
>   	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
>   
> -	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
> +	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
> +	if (data->pgd[1])
> +		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
> +			data->pgd[1]);
>   	kfree(data);
>   }
>   
> @@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>   			  size_t size)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte *ptep = data->pgd;
> +	arm_lpae_iopte *ptep;
>   	int lvl = ARM_LPAE_START_LVL(data);
>   
> -	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
> +	ptep = arm_lpae_get_table(data, iova);
> +
> +	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> +		    iova >= (1ULL << data->iop.cfg.ias)))
>   		return 0;
>   
>   	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
> @@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>   					 unsigned long iova)
>   {
>   	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> -	arm_lpae_iopte pte, *ptep = data->pgd;
> +	arm_lpae_iopte pte, *ptep;
>   	int lvl = ARM_LPAE_START_LVL(data);
>   
> +	ptep = arm_lpae_get_table(data, iova);
> +
>   	do {
>   		/* Valid IOPTE pointer? */
>   		if (!ptep)
> @@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
>   	return data;
>   }
>   
> +static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
> +		struct arm_lpae_io_pgtable *data)
> +
> +{
> +	u64 reg;
> +
> +	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
> +	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
> +		reg = ARM_LPAE_TCR_EPD1;
> +		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> +		return reg;
> +	}
> +
> +	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
> +	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
> +	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
> +
> +	switch (1 << data->pg_shift) {
> +	case SZ_4K:
> +		reg |= ARM_LPAE_TCR_TG1_4K;
> +		break;
> +	case SZ_16K:
> +		reg |= ARM_LPAE_TCR_TG1_16K;
> +		break;
> +	case SZ_64K:
> +		reg |= ARM_LPAE_TCR_TG1_64K;
> +		break;
> +	}
> +
> +	/* Set T1SZ */
> +	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
> +
> +	/* Set the SEP bit based on the size */
> +	switch (cfg->ias) {
> +	case 32:
> +		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 36:
> +		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 40:
> +		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 42:
> +		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 44:
> +		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	case 48:
> +		/*
> +		 * If ias is 48 then that probably means that the UBS on the
> +		 * device was 0101b (49) which is a special case that assumes
> +		 * bit 48 is the sign extension bit. In this case we are
> +		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
> +		 * the extension bit. One might be confused because there is
> +		 * also an option to set the SEP to bit 47 but this is probably
> +		 * not what the arm-smmu driver intended.
> +		 */

Again, a clear sign that this probably isn't the most appropriate place 
to be trying to handle this.

> +	default:
> +		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> +		break;
> +	}
> +
> +	return reg;
> +}
> +
>   static struct io_pgtable *
>   arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   {
>   	u64 reg;
>   	struct arm_lpae_io_pgtable *data;
>   
> -	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
> +	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> +			IO_PGTABLE_QUIRK_NO_DMA |
> +			IO_PGTABLE_QUIRK_ARM_TTBR1))
>   		return NULL;
>   
>   	data = arm_lpae_alloc_pgtable(cfg);
> @@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   
>   	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
>   
> -	/* Disable speculative walks through TTBR1 */
> -	reg |= ARM_LPAE_TCR_EPD1;
> +	/* Bring in the TTBR1 configuration */
> +	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
> +
>   	cfg->arm_lpae_s1_cfg.tcr = reg;
>   
>   	/* MAIRs */
> @@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>   	cfg->arm_lpae_s1_cfg.mair[1] = 0;
>   
>   	/* Looking good; allocate a pgd */
> -	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> +	if (!data->pgd[0])
>   		goto out_free_data;
>   
> +
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> +		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
> +			GFP_KERNEL, cfg);
> +		if (!data->pgd[1]) {
> +			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
> +				cfg);
> +			goto out_free_data;
> +		}
> +	} else {
> +		data->pgd[1] = NULL;
> +	}
> +
>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>   	wmb();
>   
>   	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
> +
> +	if (data->pgd[1])
> +		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
> +
>   	return &data->iop;
>   
>   out_free_data:
> @@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>   	cfg->arm_lpae_s2_cfg.vtcr = reg;
>   
>   	/* Allocate pgd pages */
> -	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> -	if (!data->pgd)
> +	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> +	if (!data->pgd[0])
>   		goto out_free_data;
>   
>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>   	wmb();
>   
>   	/* VTTBR */
> -	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
> +	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
>   	return &data->iop;
>   
>   out_free_data:
> @@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
>   		cfg->pgsize_bitmap, cfg->ias);
>   	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
>   		data->levels, data->pgd_size, data->pg_shift,
> -		data->bits_per_level, data->pgd);
> +		data->bits_per_level, data->pgd[0]);
>   }
>   
>   #define __FAIL(ops, i)	({						\
> diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
> index cb31314971ac..6344b1d359a5 100644
> --- a/drivers/iommu/io-pgtable-arm.h
> +++ b/drivers/iommu/io-pgtable-arm.h
> @@ -25,14 +25,21 @@
>   #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
>   #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
>   
> +#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
> +#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
> +#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
> +
>   #define ARM_LPAE_TCR_SH0_SHIFT		12
> +#define ARM_LPAE_TCR_SH1_SHIFT		28
>   #define ARM_LPAE_TCR_SH0_MASK		0x3
>   #define ARM_LPAE_TCR_SH_NS		0
>   #define ARM_LPAE_TCR_SH_OS		2
>   #define ARM_LPAE_TCR_SH_IS		3
>   
>   #define ARM_LPAE_TCR_ORGN0_SHIFT	10
> +#define ARM_LPAE_TCR_ORGN1_SHIFT	26
>   #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> +#define ARM_LPAE_TCR_IRGN1_SHIFT	24
>   #define ARM_LPAE_TCR_RGN_MASK		0x3
>   #define ARM_LPAE_TCR_RGN_NC		0
>   #define ARM_LPAE_TCR_RGN_WBWA		1
> @@ -45,6 +52,9 @@
>   #define ARM_LPAE_TCR_T0SZ_SHIFT		0
>   #define ARM_LPAE_TCR_SZ_MASK		0x3f
>   
> +#define ARM_LPAE_TCR_T1SZ_SHIFT         16
> +#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
> +
>   #define ARM_LPAE_TCR_PS_SHIFT		16
>   #define ARM_LPAE_TCR_PS_MASK		0x7
>   
> @@ -58,6 +68,16 @@
>   #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
>   #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
>   
> +#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
> +
> +#define ARM_LPAE_TCR_SEP_31		0x0ULL
> +#define ARM_LPAE_TCR_SEP_35		0x1ULL
> +#define ARM_LPAE_TCR_SEP_39		0x2ULL
> +#define ARM_LPAE_TCR_SEP_41		0x3ULL
> +#define ARM_LPAE_TCR_SEP_43		0x4ULL
> +#define ARM_LPAE_TCR_SEP_47		0x5ULL
> +#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
> +
>   #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
>   #define ARM_LPAE_MAIR_ATTR_MASK		0xff
>   #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
> diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> index cd2e1eafffe6..55f7b60cc44d 100644
> --- a/drivers/iommu/io-pgtable.h
> +++ b/drivers/iommu/io-pgtable.h
> @@ -71,12 +71,18 @@ struct io_pgtable_cfg {
>   	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
>   	 *	software-emulated IOMMU), such that pagetable updates need not
>   	 *	be treated as explicit DMA data.
> +	 *
> +	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
> +	 *	this domain. Set up the configuration registers and dyanmically
> +	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
> +	 *	based on the address.
>   	 */
>   	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>   	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>   	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>   	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
>   	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
>   	unsigned long			quirks;
>   	unsigned long			pgsize_bitmap;
>   	unsigned int			ias;
> @@ -173,18 +179,22 @@ struct io_pgtable {
>   
>   static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
>   {
> -	iop->cfg.tlb->tlb_flush_all(iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_flush_all(iop->cookie);

What's going on here?

It's not obvious how this change is relevant to TTBR1 support, and 
either way I can't see how an io_pgtable with no TLB ops wouldn't just 
be fundamentally broken.

Robin.

>   }
>   
>   static inline void io_pgtable_tlb_add_flush(struct io_pgtable *iop,
>   		unsigned long iova, size_t size, size_t granule, bool leaf)
>   {
> -	iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf, iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_add_flush(iova, size, granule, leaf,
> +			iop->cookie);
>   }
>   
>   static inline void io_pgtable_tlb_sync(struct io_pgtable *iop)
>   {
> -	iop->cfg.tlb->tlb_sync(iop->cookie);
> +	if (iop->cfg.tlb)
> +		iop->cfg.tlb->tlb_sync(iop->cookie);
>   }
>   
>   /**
> 

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
  2018-03-02 17:57         ` Robin Murphy
@ 2018-03-02 18:28             ` Jordan Crouse
  -1 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 18:28 UTC (permalink / raw)
  To: Robin Murphy
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, Mar 02, 2018 at 05:57:21PM +0000, Robin Murphy wrote:
> On 21/02/18 22:59, Jordan Crouse wrote:
> >Allow a SMMU device to opt into allocating a TTBR1 pagetable.
> >
> >The size of the TTBR1 region will be the same as
> >the TTBR0 size with the sign extension bit set on the highest
> >bit in the region unless the upstream size is 49 bits and then
> >the sign-extension bit will be set on the 49th bit.
> 
> Um, isn't the 49th bit still "the highest bit" if the address size
> is 49 bits? ;)

Indeed. :)

> >The map/unmap operations will automatically use the appropriate
> >pagetable based on the specified iova and the existing mask.
> >
> >Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> >---
> >  drivers/iommu/arm-smmu-regs.h  |   2 -
> >  drivers/iommu/arm-smmu.c       |  22 ++++--
> >  drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
> >  drivers/iommu/io-pgtable-arm.h |  20 ++++++
> >  drivers/iommu/io-pgtable.h     |  16 ++++-
> >  5 files changed, 192 insertions(+), 28 deletions(-)
> >
> >diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> >index a1226e4ab5f8..0ce85d5b22e9 100644
> >--- a/drivers/iommu/arm-smmu-regs.h
> >+++ b/drivers/iommu/arm-smmu-regs.h
> >@@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
> >  #define RESUME_RETRY			(0 << 0)
> >  #define RESUME_TERMINATE		(1 << 0)
> >-#define TTBCR2_SEP_SHIFT		15
> >-#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
> >  #define TTBCR2_AS			(1 << 4)
> >  #define TTBRn_ASID_SHIFT		48
> >diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> >index 69e7c60792a8..ebfa59b59622 100644
> >--- a/drivers/iommu/arm-smmu.c
> >+++ b/drivers/iommu/arm-smmu.c
> >@@ -248,6 +248,7 @@ struct arm_smmu_domain {
> >  	enum arm_smmu_domain_stage	stage;
> >  	struct mutex			init_mutex; /* Protects smmu pointer */
> >  	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
> >+	u32 attributes;
> >  	struct iommu_domain		domain;
> >  };
> >@@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> >  		} else {
> >  			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> >  			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> >-			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
> >  			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
> >  				cb->tcr[1] |= TTBCR2_AS;
> >  		}
> >@@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  	enum io_pgtable_fmt fmt;
> >  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> >+	unsigned int quirks =
> >+		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
> >+			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
> >  	mutex_lock(&smmu_domain->init_mutex);
> >  	if (smmu_domain->smmu)
> >@@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  	else
> >  		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
> >+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> >+		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
> >+
> >  	pgtbl_cfg = (struct io_pgtable_cfg) {
> >+		.quirks		= quirks,
> >  		.pgsize_bitmap	= smmu->pgsize_bitmap,
> >  		.ias		= ias,
> >  		.oas		= oas,
> >@@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  		.iommu_dev	= smmu->dev,
> >  	};
> >-	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> >-		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
> >-
> >  	smmu_domain->smmu = smmu;
> >  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> >  	if (!pgtbl_ops) {
> >@@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  	case DOMAIN_ATTR_NESTING:
> >  		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> >  		return 0;
> >+	case DOMAIN_ATTR_ENABLE_TTBR1:
> >+		*((int *)data) = !!(smmu_domain->attributes
> >+					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
> >+		return 0;
> >  	default:
> >  		return -ENODEV;
> >  	}
> >@@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> >  		else
> >  			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> >+		break;
> >+	case DOMAIN_ATTR_ENABLE_TTBR1:
> >+		if (*((int *)data))
> >+			smmu_domain->attributes |=
> >+				1 << DOMAIN_ATTR_ENABLE_TTBR1;
> >+		ret = 0;
> >  		break;
> >  	default:
> >  		ret = -ENODEV;
> >diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> >index fff0b6ba0a69..1bd0045f2cb7 100644
> >--- a/drivers/iommu/io-pgtable-arm.c
> >+++ b/drivers/iommu/io-pgtable-arm.c
> >@@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
> >  	unsigned long		pg_shift;
> >  	unsigned long		bits_per_level;
> >-	void			*pgd;
> >+	void			*pgd[2];
> 
> This might be reasonable for short-descriptor, but I really don't
> like it for LPAE. The two tables are more or less independent in
> terms of size, granule, etc., so this brings in a lot of artificial
> coupling.
> 
> I think it would be a lot cleaner for io-pgtable to have little or
> no knowledge of this, and it be down to the caller to allocate two
> tables and merge the TCRs, then dispatch maps/unmaps to the
> appropriate table by itself.

Okay, that make sense. I'll try to move as much of this into the arm-smmu driver
as I can.

> >  };
> >  typedef u64 arm_lpae_iopte;
> >@@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> >  	return pte;
> >  }
> >+static inline arm_lpae_iopte *
> >+arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
> >+{
> >+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> >+
> >+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
> >+		unsigned long mask;
> >+
> >+		/*
> >+		 * if ias is 48 it really means that bit 48 is the sign
> >+		 * extension bit, otherwise the sign extension bit is ias - 1
> >+		 * (for example, bit 31 for ias 32)
> >+		 */
> >+		mask = (cfg->ias == 48) ? (1UL << 48) :
> >+			(1UL << (cfg->ias - 1));
> 
> This would look less silly if it was done in the SMMU driver where
> the original UBS information is directly to hand, instead of having
> to reverse-engineer it from the pagetable config (on every
> operation, no less).
> 
> That said, it's still going to be pretty fragile in general, since
> on SMMUv1 the UBS is merely guessed at from the IPA size, and either
> way all it tells you is what the SMMU knows about its own
> interfaces; it doesn't have a clue how many bits the masters
> connected to said interface(s) are actually capable of driving.

Hopefully I can make it a little bit more robust if we move it into arm-smmu
with access to more target specific information.

> >+
> >+		if (iova & mask)
> >+			return data->pgd[1];
> >+	}
> >+
> >+	return data->pgd[0];
> >+}
> >+
> >  static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
> >  			phys_addr_t paddr, size_t size, int iommu_prot)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte *ptep = data->pgd;
> >+	arm_lpae_iopte *ptep;
> >  	int ret, lvl = ARM_LPAE_START_LVL(data);
> >  	arm_lpae_iopte prot;
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >  	/* If no access, then nothing to do */
> >  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> >  		return 0;
> >-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> >-		    paddr >= (1ULL << data->iop.cfg.oas)))
> >+	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
> >+		return -ERANGE;
> >+
> >+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> >+		    iova >= (1ULL << data->iop.cfg.ias)))
> >  		return -ERANGE;
> >  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> >@@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
> >-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
> >+	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
> >+	if (data->pgd[1])
> >+		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
> >+			data->pgd[1]);
> >  	kfree(data);
> >  }
> >@@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> >  			  size_t size)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte *ptep = data->pgd;
> >+	arm_lpae_iopte *ptep;
> >  	int lvl = ARM_LPAE_START_LVL(data);
> >-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> >+		    iova >= (1ULL << data->iop.cfg.ias)))
> >  		return 0;
> >  	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
> >@@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> >  					 unsigned long iova)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte pte, *ptep = data->pgd;
> >+	arm_lpae_iopte pte, *ptep;
> >  	int lvl = ARM_LPAE_START_LVL(data);
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >  	do {
> >  		/* Valid IOPTE pointer? */
> >  		if (!ptep)
> >@@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
> >  	return data;
> >  }
> >+static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
> >+		struct arm_lpae_io_pgtable *data)
> >+
> >+{
> >+	u64 reg;
> >+
> >+	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
> >+	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
> >+		reg = ARM_LPAE_TCR_EPD1;
> >+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> >+		return reg;
> >+	}
> >+
> >+	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
> >+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
> >+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
> >+
> >+	switch (1 << data->pg_shift) {
> >+	case SZ_4K:
> >+		reg |= ARM_LPAE_TCR_TG1_4K;
> >+		break;
> >+	case SZ_16K:
> >+		reg |= ARM_LPAE_TCR_TG1_16K;
> >+		break;
> >+	case SZ_64K:
> >+		reg |= ARM_LPAE_TCR_TG1_64K;
> >+		break;
> >+	}
> >+
> >+	/* Set T1SZ */
> >+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
> >+
> >+	/* Set the SEP bit based on the size */
> >+	switch (cfg->ias) {
> >+	case 32:
> >+		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 36:
> >+		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 40:
> >+		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 42:
> >+		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 44:
> >+		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 48:
> >+		/*
> >+		 * If ias is 48 then that probably means that the UBS on the
> >+		 * device was 0101b (49) which is a special case that assumes
> >+		 * bit 48 is the sign extension bit. In this case we are
> >+		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
> >+		 * the extension bit. One might be confused because there is
> >+		 * also an option to set the SEP to bit 47 but this is probably
> >+		 * not what the arm-smmu driver intended.
> >+		 */
> 
> Again, a clear sign that this probably isn't the most appropriate
> place to be trying to handle this.

Agreed - this is entirely a bunch of hand waving. I documented it so clearly
because the msm GPU hits exactly this case and it drove me crazy for a while
until I figured it out.

> >+	default:
> >+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	}
> >+
> >+	return reg;
> >+}
> >+
> >  static struct io_pgtable *
> >  arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  {
> >  	u64 reg;
> >  	struct arm_lpae_io_pgtable *data;
> >-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
> >+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> >+			IO_PGTABLE_QUIRK_NO_DMA |
> >+			IO_PGTABLE_QUIRK_ARM_TTBR1))
> >  		return NULL;
> >  	data = arm_lpae_alloc_pgtable(cfg);
> >@@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> >-	/* Disable speculative walks through TTBR1 */
> >-	reg |= ARM_LPAE_TCR_EPD1;
> >+	/* Bring in the TTBR1 configuration */
> >+	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
> >+
> >  	cfg->arm_lpae_s1_cfg.tcr = reg;
> >  	/* MAIRs */
> >@@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  	cfg->arm_lpae_s1_cfg.mair[1] = 0;
> >  	/* Looking good; allocate a pgd */
> >-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >-	if (!data->pgd)
> >+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >+	if (!data->pgd[0])
> >  		goto out_free_data;
> >+
> >+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> >+		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
> >+			GFP_KERNEL, cfg);
> >+		if (!data->pgd[1]) {
> >+			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
> >+				cfg);
> >+			goto out_free_data;
> >+		}
> >+	} else {
> >+		data->pgd[1] = NULL;
> >+	}
> >+
> >  	/* Ensure the empty pgd is visible before any actual TTBR write */
> >  	wmb();
> >  	/* TTBRs */
> >-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> >-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> >+	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
> >+
> >+	if (data->pgd[1])
> >+		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
> >+
> >  	return &data->iop;
> >  out_free_data:
> >@@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  	cfg->arm_lpae_s2_cfg.vtcr = reg;
> >  	/* Allocate pgd pages */
> >-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >-	if (!data->pgd)
> >+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >+	if (!data->pgd[0])
> >  		goto out_free_data;
> >  	/* Ensure the empty pgd is visible before any actual TTBR write */
> >  	wmb();
> >  	/* VTTBR */
> >-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
> >+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
> >  	return &data->iop;
> >  out_free_data:
> >@@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
> >  		cfg->pgsize_bitmap, cfg->ias);
> >  	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
> >  		data->levels, data->pgd_size, data->pg_shift,
> >-		data->bits_per_level, data->pgd);
> >+		data->bits_per_level, data->pgd[0]);
> >  }
> >  #define __FAIL(ops, i)	({						\
> >diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
> >index cb31314971ac..6344b1d359a5 100644
> >--- a/drivers/iommu/io-pgtable-arm.h
> >+++ b/drivers/iommu/io-pgtable-arm.h
> >@@ -25,14 +25,21 @@
> >  #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
> >  #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
> >+#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
> >+#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
> >+#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
> >+
> >  #define ARM_LPAE_TCR_SH0_SHIFT		12
> >+#define ARM_LPAE_TCR_SH1_SHIFT		28
> >  #define ARM_LPAE_TCR_SH0_MASK		0x3
> >  #define ARM_LPAE_TCR_SH_NS		0
> >  #define ARM_LPAE_TCR_SH_OS		2
> >  #define ARM_LPAE_TCR_SH_IS		3
> >  #define ARM_LPAE_TCR_ORGN0_SHIFT	10
> >+#define ARM_LPAE_TCR_ORGN1_SHIFT	26
> >  #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> >+#define ARM_LPAE_TCR_IRGN1_SHIFT	24
> >  #define ARM_LPAE_TCR_RGN_MASK		0x3
> >  #define ARM_LPAE_TCR_RGN_NC		0
> >  #define ARM_LPAE_TCR_RGN_WBWA		1
> >@@ -45,6 +52,9 @@
> >  #define ARM_LPAE_TCR_T0SZ_SHIFT		0
> >  #define ARM_LPAE_TCR_SZ_MASK		0x3f
> >+#define ARM_LPAE_TCR_T1SZ_SHIFT         16
> >+#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
> >+
> >  #define ARM_LPAE_TCR_PS_SHIFT		16
> >  #define ARM_LPAE_TCR_PS_MASK		0x7
> >@@ -58,6 +68,16 @@
> >  #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
> >  #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
> >+#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
> >+
> >+#define ARM_LPAE_TCR_SEP_31		0x0ULL
> >+#define ARM_LPAE_TCR_SEP_35		0x1ULL
> >+#define ARM_LPAE_TCR_SEP_39		0x2ULL
> >+#define ARM_LPAE_TCR_SEP_41		0x3ULL
> >+#define ARM_LPAE_TCR_SEP_43		0x4ULL
> >+#define ARM_LPAE_TCR_SEP_47		0x5ULL
> >+#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
> >+
> >  #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
> >  #define ARM_LPAE_MAIR_ATTR_MASK		0xff
> >  #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
> >diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> >index cd2e1eafffe6..55f7b60cc44d 100644
> >--- a/drivers/iommu/io-pgtable.h
> >+++ b/drivers/iommu/io-pgtable.h
> >@@ -71,12 +71,18 @@ struct io_pgtable_cfg {
> >  	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
> >  	 *	software-emulated IOMMU), such that pagetable updates need not
> >  	 *	be treated as explicit DMA data.
> >+	 *
> >+	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
> >+	 *	this domain. Set up the configuration registers and dyanmically
> >+	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
> >+	 *	based on the address.
> >  	 */
> >  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
> >  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
> >  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
> >  	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
> >  	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
> >+	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
> >  	unsigned long			quirks;
> >  	unsigned long			pgsize_bitmap;
> >  	unsigned int			ias;
> >@@ -173,18 +179,22 @@ struct io_pgtable {
> >  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
> >  {
> >-	iop->cfg.tlb->tlb_flush_all(iop->cookie);
> >+	if (iop->cfg.tlb)
> >+		iop->cfg.tlb->tlb_flush_all(iop->cookie);
> 
> What's going on here?
> 
> It's not obvious how this change is relevant to TTBR1 support, and
> either way I can't see how an io_pgtable with no TLB ops wouldn't
> just be fundamentally broken.

Oops, this is in the wrong patch. This should go along with the private pasid
patches because client devices that use private pasid pagetables need to handle
tlb on those tables on their own (and that too might be up for debate, but not
here).

> Robin.

Thanks so much for reviewing.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1
@ 2018-03-02 18:28             ` Jordan Crouse
  0 siblings, 0 replies; 46+ messages in thread
From: Jordan Crouse @ 2018-03-02 18:28 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Mar 02, 2018 at 05:57:21PM +0000, Robin Murphy wrote:
> On 21/02/18 22:59, Jordan Crouse wrote:
> >Allow a SMMU device to opt into allocating a TTBR1 pagetable.
> >
> >The size of the TTBR1 region will be the same as
> >the TTBR0 size with the sign extension bit set on the highest
> >bit in the region unless the upstream size is 49 bits and then
> >the sign-extension bit will be set on the 49th bit.
> 
> Um, isn't the 49th bit still "the highest bit" if the address size
> is 49 bits? ;)

Indeed. :)

> >The map/unmap operations will automatically use the appropriate
> >pagetable based on the specified iova and the existing mask.
> >
> >Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> >---
> >  drivers/iommu/arm-smmu-regs.h  |   2 -
> >  drivers/iommu/arm-smmu.c       |  22 ++++--
> >  drivers/iommu/io-pgtable-arm.c | 160 ++++++++++++++++++++++++++++++++++++-----
> >  drivers/iommu/io-pgtable-arm.h |  20 ++++++
> >  drivers/iommu/io-pgtable.h     |  16 ++++-
> >  5 files changed, 192 insertions(+), 28 deletions(-)
> >
> >diff --git a/drivers/iommu/arm-smmu-regs.h b/drivers/iommu/arm-smmu-regs.h
> >index a1226e4ab5f8..0ce85d5b22e9 100644
> >--- a/drivers/iommu/arm-smmu-regs.h
> >+++ b/drivers/iommu/arm-smmu-regs.h
> >@@ -193,8 +193,6 @@ enum arm_smmu_s2cr_privcfg {
> >  #define RESUME_RETRY			(0 << 0)
> >  #define RESUME_TERMINATE		(1 << 0)
> >-#define TTBCR2_SEP_SHIFT		15
> >-#define TTBCR2_SEP_UPSTREAM		(0x7 << TTBCR2_SEP_SHIFT)
> >  #define TTBCR2_AS			(1 << 4)
> >  #define TTBRn_ASID_SHIFT		48
> >diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> >index 69e7c60792a8..ebfa59b59622 100644
> >--- a/drivers/iommu/arm-smmu.c
> >+++ b/drivers/iommu/arm-smmu.c
> >@@ -248,6 +248,7 @@ struct arm_smmu_domain {
> >  	enum arm_smmu_domain_stage	stage;
> >  	struct mutex			init_mutex; /* Protects smmu pointer */
> >  	spinlock_t			cb_lock; /* Serialises ATS1* ops and TLB syncs */
> >+	u32 attributes;
> >  	struct iommu_domain		domain;
> >  };
> >@@ -598,7 +599,6 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> >  		} else {
> >  			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> >  			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> >-			cb->tcr[1] |= TTBCR2_SEP_UPSTREAM;
> >  			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
> >  				cb->tcr[1] |= TTBCR2_AS;
> >  		}
> >@@ -729,6 +729,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  	enum io_pgtable_fmt fmt;
> >  	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> >  	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> >+	unsigned int quirks =
> >+		smmu_domain->attributes & (1 << DOMAIN_ATTR_ENABLE_TTBR1) ?
> >+			IO_PGTABLE_QUIRK_ARM_TTBR1 : 0;
> >  	mutex_lock(&smmu_domain->init_mutex);
> >  	if (smmu_domain->smmu)
> >@@ -852,7 +855,11 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  	else
> >  		cfg->asid = cfg->cbndx + smmu->cavium_id_base;
> >+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> >+		quirks |= IO_PGTABLE_QUIRK_NO_DMA;
> >+
> >  	pgtbl_cfg = (struct io_pgtable_cfg) {
> >+		.quirks		= quirks,
> >  		.pgsize_bitmap	= smmu->pgsize_bitmap,
> >  		.ias		= ias,
> >  		.oas		= oas,
> >@@ -860,9 +867,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
> >  		.iommu_dev	= smmu->dev,
> >  	};
> >-	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
> >-		pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_NO_DMA;
> >-
> >  	smmu_domain->smmu = smmu;
> >  	pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
> >  	if (!pgtbl_ops) {
> >@@ -1477,6 +1481,10 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >  	case DOMAIN_ATTR_NESTING:
> >  		*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> >  		return 0;
> >+	case DOMAIN_ATTR_ENABLE_TTBR1:
> >+		*((int *)data) = !!(smmu_domain->attributes
> >+					& (1 << DOMAIN_ATTR_ENABLE_TTBR1));
> >+		return 0;
> >  	default:
> >  		return -ENODEV;
> >  	}
> >@@ -1505,6 +1513,12 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> >  		else
> >  			smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> >+		break;
> >+	case DOMAIN_ATTR_ENABLE_TTBR1:
> >+		if (*((int *)data))
> >+			smmu_domain->attributes |=
> >+				1 << DOMAIN_ATTR_ENABLE_TTBR1;
> >+		ret = 0;
> >  		break;
> >  	default:
> >  		ret = -ENODEV;
> >diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> >index fff0b6ba0a69..1bd0045f2cb7 100644
> >--- a/drivers/iommu/io-pgtable-arm.c
> >+++ b/drivers/iommu/io-pgtable-arm.c
> >@@ -152,7 +152,7 @@ struct arm_lpae_io_pgtable {
> >  	unsigned long		pg_shift;
> >  	unsigned long		bits_per_level;
> >-	void			*pgd;
> >+	void			*pgd[2];
> 
> This might be reasonable for short-descriptor, but I really don't
> like it for LPAE. The two tables are more or less independent in
> terms of size, granule, etc., so this brings in a lot of artificial
> coupling.
> 
> I think it would be a lot cleaner for io-pgtable to have little or
> no knowledge of this, and it be down to the caller to allocate two
> tables and merge the TCRs, then dispatch maps/unmaps to the
> appropriate table by itself.

Okay, that make sense. I'll try to move as much of this into the arm-smmu driver
as I can.

> >  };
> >  typedef u64 arm_lpae_iopte;
> >@@ -394,20 +394,48 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> >  	return pte;
> >  }
> >+static inline arm_lpae_iopte *
> >+arm_lpae_get_table(struct arm_lpae_io_pgtable *data, unsigned long iova)
> >+{
> >+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
> >+
> >+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)  {
> >+		unsigned long mask;
> >+
> >+		/*
> >+		 * if ias is 48 it really means that bit 48 is the sign
> >+		 * extension bit, otherwise the sign extension bit is ias - 1
> >+		 * (for example, bit 31 for ias 32)
> >+		 */
> >+		mask = (cfg->ias == 48) ? (1UL << 48) :
> >+			(1UL << (cfg->ias - 1));
> 
> This would look less silly if it was done in the SMMU driver where
> the original UBS information is directly to hand, instead of having
> to reverse-engineer it from the pagetable config (on every
> operation, no less).
> 
> That said, it's still going to be pretty fragile in general, since
> on SMMUv1 the UBS is merely guessed at from the IPA size, and either
> way all it tells you is what the SMMU knows about its own
> interfaces; it doesn't have a clue how many bits the masters
> connected to said interface(s) are actually capable of driving.

Hopefully I can make it a little bit more robust if we move it into arm-smmu
with access to more target specific information.

> >+
> >+		if (iova & mask)
> >+			return data->pgd[1];
> >+	}
> >+
> >+	return data->pgd[0];
> >+}
> >+
> >  static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
> >  			phys_addr_t paddr, size_t size, int iommu_prot)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte *ptep = data->pgd;
> >+	arm_lpae_iopte *ptep;
> >  	int ret, lvl = ARM_LPAE_START_LVL(data);
> >  	arm_lpae_iopte prot;
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >  	/* If no access, then nothing to do */
> >  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> >  		return 0;
> >-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
> >-		    paddr >= (1ULL << data->iop.cfg.oas)))
> >+	if (WARN_ON(paddr >= (1ULL << data->iop.cfg.oas)))
> >+		return -ERANGE;
> >+
> >+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> >+		    iova >= (1ULL << data->iop.cfg.ias)))
> >  		return -ERANGE;
> >  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> >@@ -456,7 +484,10 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
> >-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
> >+	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd[0]);
> >+	if (data->pgd[1])
> >+		__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data),
> >+			data->pgd[1]);
> >  	kfree(data);
> >  }
> >@@ -564,10 +595,13 @@ static int arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
> >  			  size_t size)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte *ptep = data->pgd;
> >+	arm_lpae_iopte *ptep;
> >  	int lvl = ARM_LPAE_START_LVL(data);
> >-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >+	if (WARN_ON(!(data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) &&
> >+		    iova >= (1ULL << data->iop.cfg.ias)))
> >  		return 0;
> >  	return __arm_lpae_unmap(data, iova, size, lvl, ptep);
> >@@ -577,9 +611,11 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
> >  					 unsigned long iova)
> >  {
> >  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
> >-	arm_lpae_iopte pte, *ptep = data->pgd;
> >+	arm_lpae_iopte pte, *ptep;
> >  	int lvl = ARM_LPAE_START_LVL(data);
> >+	ptep = arm_lpae_get_table(data, iova);
> >+
> >  	do {
> >  		/* Valid IOPTE pointer? */
> >  		if (!ptep)
> >@@ -689,13 +725,82 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
> >  	return data;
> >  }
> >+static u64 arm_64_lpae_setup_ttbr1(struct io_pgtable_cfg *cfg,
> >+		struct arm_lpae_io_pgtable *data)
> >+
> >+{
> >+	u64 reg;
> >+
> >+	/* If TTBR1 is disabled, disable speculative walks through the TTBR1 */
> >+	if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)) {
> >+		reg = ARM_LPAE_TCR_EPD1;
> >+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> >+		return reg;
> >+	}
> >+
> >+	reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH1_SHIFT) |
> >+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN1_SHIFT) |
> >+	      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN1_SHIFT);
> >+
> >+	switch (1 << data->pg_shift) {
> >+	case SZ_4K:
> >+		reg |= ARM_LPAE_TCR_TG1_4K;
> >+		break;
> >+	case SZ_16K:
> >+		reg |= ARM_LPAE_TCR_TG1_16K;
> >+		break;
> >+	case SZ_64K:
> >+		reg |= ARM_LPAE_TCR_TG1_64K;
> >+		break;
> >+	}
> >+
> >+	/* Set T1SZ */
> >+	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T1SZ_SHIFT;
> >+
> >+	/* Set the SEP bit based on the size */
> >+	switch (cfg->ias) {
> >+	case 32:
> >+		reg |= (ARM_LPAE_TCR_SEP_31 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 36:
> >+		reg |= (ARM_LPAE_TCR_SEP_35 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 40:
> >+		reg |= (ARM_LPAE_TCR_SEP_39 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 42:
> >+		reg |= (ARM_LPAE_TCR_SEP_41 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 44:
> >+		reg |= (ARM_LPAE_TCR_SEP_43 << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	case 48:
> >+		/*
> >+		 * If ias is 48 then that probably means that the UBS on the
> >+		 * device was 0101b (49) which is a special case that assumes
> >+		 * bit 48 is the sign extension bit. In this case we are
> >+		 * expected to use ARM_LPAE_TCR_SEP_UPSTREAM to use bit 48 as
> >+		 * the extension bit. One might be confused because there is
> >+		 * also an option to set the SEP to bit 47 but this is probably
> >+		 * not what the arm-smmu driver intended.
> >+		 */
> 
> Again, a clear sign that this probably isn't the most appropriate
> place to be trying to handle this.

Agreed - this is entirely a bunch of hand waving. I documented it so clearly
because the msm GPU hits exactly this case and it drove me crazy for a while
until I figured it out.

> >+	default:
> >+		reg |= (ARM_LPAE_TCR_SEP_UPSTREAM << ARM_LPAE_TCR_SEP_SHIFT);
> >+		break;
> >+	}
> >+
> >+	return reg;
> >+}
> >+
> >  static struct io_pgtable *
> >  arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  {
> >  	u64 reg;
> >  	struct arm_lpae_io_pgtable *data;
> >-	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NO_DMA))
> >+	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> >+			IO_PGTABLE_QUIRK_NO_DMA |
> >+			IO_PGTABLE_QUIRK_ARM_TTBR1))
> >  		return NULL;
> >  	data = arm_lpae_alloc_pgtable(cfg);
> >@@ -744,8 +849,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> >-	/* Disable speculative walks through TTBR1 */
> >-	reg |= ARM_LPAE_TCR_EPD1;
> >+	/* Bring in the TTBR1 configuration */
> >+	reg |= arm_64_lpae_setup_ttbr1(cfg, data);
> >+
> >  	cfg->arm_lpae_s1_cfg.tcr = reg;
> >  	/* MAIRs */
> >@@ -760,16 +866,32 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  	cfg->arm_lpae_s1_cfg.mair[1] = 0;
> >  	/* Looking good; allocate a pgd */
> >-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >-	if (!data->pgd)
> >+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >+	if (!data->pgd[0])
> >  		goto out_free_data;
> >+
> >+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
> >+		data->pgd[1] = __arm_lpae_alloc_pages(data->pgd_size,
> >+			GFP_KERNEL, cfg);
> >+		if (!data->pgd[1]) {
> >+			__arm_lpae_free_pages(data->pgd[0], data->pgd_size,
> >+				cfg);
> >+			goto out_free_data;
> >+		}
> >+	} else {
> >+		data->pgd[1] = NULL;
> >+	}
> >+
> >  	/* Ensure the empty pgd is visible before any actual TTBR write */
> >  	wmb();
> >  	/* TTBRs */
> >-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> >-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> >+	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd[0]);
> >+
> >+	if (data->pgd[1])
> >+		cfg->arm_lpae_s1_cfg.ttbr[1] = virt_to_phys(data->pgd[1]);
> >+
> >  	return &data->iop;
> >  out_free_data:
> >@@ -854,15 +976,15 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  	cfg->arm_lpae_s2_cfg.vtcr = reg;
> >  	/* Allocate pgd pages */
> >-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >-	if (!data->pgd)
> >+	data->pgd[0] = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
> >+	if (!data->pgd[0])
> >  		goto out_free_data;
> >  	/* Ensure the empty pgd is visible before any actual TTBR write */
> >  	wmb();
> >  	/* VTTBR */
> >-	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd);
> >+	cfg->arm_lpae_s2_cfg.vttbr = virt_to_phys(data->pgd[0]);
> >  	return &data->iop;
> >  out_free_data:
> >@@ -960,7 +1082,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
> >  		cfg->pgsize_bitmap, cfg->ias);
> >  	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
> >  		data->levels, data->pgd_size, data->pg_shift,
> >-		data->bits_per_level, data->pgd);
> >+		data->bits_per_level, data->pgd[0]);
> >  }
> >  #define __FAIL(ops, i)	({						\
> >diff --git a/drivers/iommu/io-pgtable-arm.h b/drivers/iommu/io-pgtable-arm.h
> >index cb31314971ac..6344b1d359a5 100644
> >--- a/drivers/iommu/io-pgtable-arm.h
> >+++ b/drivers/iommu/io-pgtable-arm.h
> >@@ -25,14 +25,21 @@
> >  #define ARM_LPAE_TCR_TG0_64K		(1 << 14)
> >  #define ARM_LPAE_TCR_TG0_16K		(2 << 14)
> >+#define ARM_LPAE_TCR_TG1_16K            (1 << 30)
> >+#define ARM_LPAE_TCR_TG1_4K             (2 << 30)
> >+#define ARM_LPAE_TCR_TG1_64K            (3 << 30)
> >+
> >  #define ARM_LPAE_TCR_SH0_SHIFT		12
> >+#define ARM_LPAE_TCR_SH1_SHIFT		28
> >  #define ARM_LPAE_TCR_SH0_MASK		0x3
> >  #define ARM_LPAE_TCR_SH_NS		0
> >  #define ARM_LPAE_TCR_SH_OS		2
> >  #define ARM_LPAE_TCR_SH_IS		3
> >  #define ARM_LPAE_TCR_ORGN0_SHIFT	10
> >+#define ARM_LPAE_TCR_ORGN1_SHIFT	26
> >  #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> >+#define ARM_LPAE_TCR_IRGN1_SHIFT	24
> >  #define ARM_LPAE_TCR_RGN_MASK		0x3
> >  #define ARM_LPAE_TCR_RGN_NC		0
> >  #define ARM_LPAE_TCR_RGN_WBWA		1
> >@@ -45,6 +52,9 @@
> >  #define ARM_LPAE_TCR_T0SZ_SHIFT		0
> >  #define ARM_LPAE_TCR_SZ_MASK		0x3f
> >+#define ARM_LPAE_TCR_T1SZ_SHIFT         16
> >+#define ARM_LPAE_TCR_T1SZ_MASK          0x3f
> >+
> >  #define ARM_LPAE_TCR_PS_SHIFT		16
> >  #define ARM_LPAE_TCR_PS_MASK		0x7
> >@@ -58,6 +68,16 @@
> >  #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
> >  #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
> >+#define ARM_LPAE_TCR_SEP_SHIFT		(15 + 32)
> >+
> >+#define ARM_LPAE_TCR_SEP_31		0x0ULL
> >+#define ARM_LPAE_TCR_SEP_35		0x1ULL
> >+#define ARM_LPAE_TCR_SEP_39		0x2ULL
> >+#define ARM_LPAE_TCR_SEP_41		0x3ULL
> >+#define ARM_LPAE_TCR_SEP_43		0x4ULL
> >+#define ARM_LPAE_TCR_SEP_47		0x5ULL
> >+#define ARM_LPAE_TCR_SEP_UPSTREAM	0x7ULL
> >+
> >  #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
> >  #define ARM_LPAE_MAIR_ATTR_MASK		0xff
> >  #define ARM_LPAE_MAIR_ATTR_DEVICE	0x04
> >diff --git a/drivers/iommu/io-pgtable.h b/drivers/iommu/io-pgtable.h
> >index cd2e1eafffe6..55f7b60cc44d 100644
> >--- a/drivers/iommu/io-pgtable.h
> >+++ b/drivers/iommu/io-pgtable.h
> >@@ -71,12 +71,18 @@ struct io_pgtable_cfg {
> >  	 *	be accessed by a fully cache-coherent IOMMU or CPU (e.g. for a
> >  	 *	software-emulated IOMMU), such that pagetable updates need not
> >  	 *	be treated as explicit DMA data.
> >+	 *
> >+	 * PGTABLE_QUIRK_ARM_TTBR1: Specifies that TTBR1 has been enabled on
> >+	 *	this domain. Set up the configuration registers and dyanmically
> >+	 *      choose which pagetable (TTBR0 or TTBR1) a mapping should go into
> >+	 *	based on the address.
> >  	 */
> >  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
> >  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
> >  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
> >  	#define IO_PGTABLE_QUIRK_ARM_MTK_4GB	BIT(3)
> >  	#define IO_PGTABLE_QUIRK_NO_DMA		BIT(4)
> >+	#define IO_PGTABLE_QUIRK_ARM_TTBR1      BIT(5)
> >  	unsigned long			quirks;
> >  	unsigned long			pgsize_bitmap;
> >  	unsigned int			ias;
> >@@ -173,18 +179,22 @@ struct io_pgtable {
> >  static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
> >  {
> >-	iop->cfg.tlb->tlb_flush_all(iop->cookie);
> >+	if (iop->cfg.tlb)
> >+		iop->cfg.tlb->tlb_flush_all(iop->cookie);
> 
> What's going on here?
> 
> It's not obvious how this change is relevant to TTBR1 support, and
> either way I can't see how an io_pgtable with no TLB ops wouldn't
> just be fundamentally broken.

Oops, this is in the wrong patch. This should go along with the private pasid
patches because client devices that use private pasid pagetables need to handle
tlb on those tables on their own (and that too might be up for debate, but not
here).

> Robin.

Thanks so much for reviewing.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2018-03-02 18:28 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-21 22:59 [RFC 00/14] Per-instance pagetables for MSM GPUs Jordan Crouse
2018-02-21 22:59 ` Jordan Crouse
     [not found] ` <20180221225924.30737-1-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-02-21 22:59   ` [PATCH 01/14] iommu: Add DOMAIN_ATTR_ENABLE_TTBR1 Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
     [not found]     ` <20180221225924.30737-2-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 14:56       ` Robin Murphy
2018-03-02 14:56         ` Robin Murphy
2018-02-21 22:59   ` [PATCH 02/14] iommu/arm-smmu: Add support for TTBR1 Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
     [not found]     ` <20180221225924.30737-3-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 17:57       ` Robin Murphy
2018-03-02 17:57         ` Robin Murphy
     [not found]         ` <155a85ea-1c66-ce0b-06b3-d3933d6f54df-5wv7dgnIgG8@public.gmane.org>
2018-03-02 18:28           ` Jordan Crouse
2018-03-02 18:28             ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 03/14] iommu: Create a base struct for io_mm Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
     [not found]     ` <20180221225924.30737-4-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 12:25       ` Jean-Philippe Brucker
2018-03-02 12:25         ` Jean-Philippe Brucker
     [not found]         ` <fddafdae-4384-4826-ef63-9075e9866ae9-5wv7dgnIgG8@public.gmane.org>
2018-03-02 16:14           ` Jordan Crouse
2018-03-02 16:14             ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 04/14] iommu: sva: Add support for pasid allocation Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
     [not found]     ` <20180221225924.30737-5-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 12:27       ` Jean-Philippe Brucker
2018-03-02 12:27         ` Jean-Philippe Brucker
     [not found]         ` <b71fefa1-2fdf-f14f-9e7a-0e525a103049-5wv7dgnIgG8@public.gmane.org>
2018-03-02 16:23           ` Jordan Crouse
2018-03-02 16:23             ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 05/14] iommu: arm-smmu: Add pasid implementation Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 06/14] iommu: arm-smmu: Add side-band function to specific pasid callbacks Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 07/14] drm/msm: Enable 64 bit mode by default Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 08/14] drm/msm: Pass the MMU domain index in struct msm_file_private Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 09/14] drm/msm/gpu: Support using TTBR1 for kernel buffer objects Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 10/14] drm/msm: Add msm_mmu features Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 11/14] drm/msm: Add support for iommu-sva PASIDs Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
     [not found]     ` <20180221225924.30737-12-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2018-03-02 12:29       ` Jean-Philippe Brucker
2018-03-02 12:29         ` Jean-Philippe Brucker
2018-02-21 22:59   ` [PATCH 12/14] drm/msm: Add support for per-instance address spaces Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 13/14] drm/msm: Support " Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse
2018-02-21 22:59   ` [PATCH 14/14] drm/msm/a5xx: Support per-instance pagetables Jordan Crouse
2018-02-21 22:59     ` Jordan Crouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.