All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] drm/msm: Add per-instance pagetables
@ 2017-03-07 17:14 Jordan Crouse
  2017-03-07 17:14 ` [PATCH 1/6] drm/msm: Enable 64 bit mode by default Jordan Crouse
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Using the framework described here

https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020716.html

This implements per-instance pagetables for the GPU driver creating an
individual pagetable for each file descriptor (so not strictly per-process
but in practice we can't share buffers between file descriptors anyway without
importing them).

This is the brief workflow for the process:

 - At init, the driver attaches an UNMANGED domain to the IOMMU (context bank 0)

 - All "global" buffers (kernel side GPU buffers such as ringbuffers, etc) are
   mapped into the TTBR1 space which is defined as any address with bit 48 set.
   In pratice we have discovered that for reasons yet uknown, bit 47 also has
   to be set for the GPU to sign extend correctly, so the TTBR1 region is
   defined as starting at 0xffff8_0000_0000_0000.

 - When a new file descriptor is opened, a dynamic domain is cloned from the
   real domain - this does not program the hardware but it creates a pagetable
   and returns a pointer that we can use to map memory to - this is wrapped in a
   new addresss space and used for all allocations created with the file
   descriptor.

 - At command submission time, a SMMU_TABLE_UPDATE packet is set before every
   command which contains the physical address of the TTBR0 register for the
   pagetable associated with the process - the GPU will automatically switch
   the pagetable for the process.

Because no kernel side allocations are in the TTBR0 space there is no setup
required to switch the TTBR0 pagetable and we do not need to reprogram it
after the command is over since the next command will rewrite the register.
This makes the code significantly more simple than it could be (*cough*
downstream *cough*).

I'm sure there will be questions, and I'm sure that what we have won't be what
is finally decided upon in the arm-smmu driver (in particular there are some
nice parts of the arm-v3 SVM solution that we can borrow) but I think it is
important to get eyeballs on this for posterity.

Thanks!
Jordan

Jordan Crouse (6):
  drm/msm: Enable 64 bit mode by default
  drm/msm: Pass the MMU domain index in struct msm_file_private
  drm/msm: Make separate iommu function tables for v1 and v2 MMUs
  drm/msm: Use TTBR1 for kernel side GPU buffer objects
  drm/msm: Support dynamic IOMMU domains
  drm/msm: a5xx: Support per-instance pagetables

 arch/arm64/boot/dts/qcom/msm8996.dtsi     |   2 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |  78 ++++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  17 ++++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  61 +++++++++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  18 +++-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   2 +
 drivers/gpu/drm/msm/msm_drv.c             |  60 +++++++++---
 drivers/gpu/drm/msm/msm_drv.h             |   9 +-
 drivers/gpu/drm/msm/msm_gem.h             |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c      |  12 ++-
 drivers/gpu/drm/msm/msm_gem_vma.c         |  38 ++++++--
 drivers/gpu/drm/msm/msm_gpu.c             |   3 +-
 drivers/gpu/drm/msm/msm_iommu.c           | 151 ++++++++++++++++++++++++------
 drivers/gpu/drm/msm/msm_iommu.h           |  34 +++++++
 drivers/gpu/drm/msm/msm_mmu.h             |   2 +-
 15 files changed, 415 insertions(+), 73 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/msm_iommu.h

-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/6] drm/msm: Enable 64 bit mode by default
  2017-03-07 17:14 [PATCH 0/6] drm/msm: Add per-instance pagetables Jordan Crouse
@ 2017-03-07 17:14 ` Jordan Crouse
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-03-07 21:53 ` [PATCH 0/6] drm/msm: Add " Daniel Vetter
  2 siblings, 0 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

A5XX GPUs can be run in either 32 or 64 bit mode. The GPU registers
and the microcode use 64 bit virtual addressing in either case but the
upper 32 bits are ignored if the GPU is in 32 bit mode. There is no
performance disadvantage to remaining in 64 bit mode even if we are
only generating 32 bit addresses so switch over now to prepare for
using addresses above 4G for targets that support them.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 14 ++++++++++++++
 drivers/gpu/drm/msm/msm_iommu.c       |  7 +++----
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index fef1541..06238b7 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -800,6 +800,20 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		REG_A5XX_RBBM_SECVID_TSB_TRUSTED_BASE_HI, 0x00000000);
 	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_TRUSTED_SIZE, 0x00000000);
 
+	/* Put the GPU into 64 bit by default */
+	gpu_write(gpu, REG_A5XX_CP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VSC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_GRAS_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RB_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_PC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_HLSQ_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VFD_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_VPC_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_UCHE_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_SP_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_TPL1_ADDR_MODE_CNTL, 0x1);
+	gpu_write(gpu, REG_A5XX_RBBM_SECVID_TSB_ADDR_MODE_CNTL, 0x1);
+
 	/* Load the GPMU firmware before starting the HW init */
 	a5xx_gpmu_ucode_init(gpu);
 
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 7521582..d520db2 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -34,10 +34,9 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	if (iommu->base.handler)
 		ret = iommu->base.handler(iommu->base.arg, iova, flags);
 	else
-		pr_warn_ratelimited("*** fault: iova=%08lx, flags=%d\n", iova, flags);
+		pr_warn_ratelimited("*** fault: iova=%16lx, flags=%d\n", iova, flags);
 
 	iommu_domain_resume(domain, false);
-
 	return 0;
 }
 
@@ -104,7 +103,7 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 		dma_addr_t pa = sg_phys(sg) - sg->offset;
 		size_t bytes = sg->length + sg->offset;
 
-		VERB("map[%d]: %08lx %08lx(%zx)", i, da, (unsigned long)pa, bytes);
+		VERB("map[%d]: %16lx %16lx(%zx)", i, da, (unsigned long)pa, bytes);
 
 		ret = iommu_map(domain, da, pa, bytes, prot);
 		if (ret)
@@ -143,7 +142,7 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova,
 		if (unmapped < bytes)
 			return unmapped;
 
-		VERB("unmap[%d]: %08lx(%zx)", i, da, bytes);
+		VERB("unmap[%d]: %16lx(%zx)", i, da, bytes);
 
 		BUG_ON(!PAGE_ALIGNED(bytes));
 
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/6] drm/msm: Pass the MMU domain index in struct msm_file_private
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-03-07 17:14   ` Jordan Crouse
  2017-03-07 17:14   ` [PATCH 3/6] drm/msm: Make separate iommu function tables for v1 and v2 MMUs Jordan Crouse
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Pass the index of the MMU domain in struct msm_file_private instead
of assuming gpu->id throughout the submit path.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c        |  2 ++
 drivers/gpu/drm/msm/msm_drv.h        |  6 +-----
 drivers/gpu/drm/msm/msm_gem.h        |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c | 12 ++++++++----
 drivers/gpu/drm/msm/msm_gpu.c        |  3 +--
 5 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index b03e785..7b7a2e7 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -507,6 +507,7 @@ static void load_gpu(struct drm_device *dev)
 
 static int msm_open(struct drm_device *dev, struct drm_file *file)
 {
+	struct msm_drm_private *priv = dev->dev_private;
 	struct msm_file_private *ctx;
 
 	/* For now, load gpu on open.. to avoid the requirement of having
@@ -518,6 +519,7 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
 	if (!ctx)
 		return -ENOMEM;
 
+	ctx->aspace = priv->gpu->aspace;
 	file->driver_priv = ctx;
 
 	return 0;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index e740fa5..bbad6c7 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -58,11 +58,7 @@
 #define NUM_DOMAINS 2    /* one for KMS, then one per gpu core (?) */
 
 struct msm_file_private {
-	/* currently we don't do anything useful with this.. but when
-	 * per-context address spaces are supported we'd keep track of
-	 * the context's page-tables here.
-	 */
-	int dummy;
+	struct msm_gem_address_space *aspace;
 };
 
 enum msm_mdp_plane_property {
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index c57b54e..74a2e67 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -116,6 +116,7 @@ static inline bool is_vunmapable(struct msm_gem_object *msm_obj)
 struct msm_gem_submit {
 	struct drm_device *dev;
 	struct msm_gpu *gpu;
+	struct msm_gem_address_space *aspace;
 	struct list_head node;   /* node in gpu submit_list */
 	struct list_head bo_list;
 	struct ww_acquire_ctx ticket;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index d8021a0..214e6d9 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -31,17 +31,20 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-		struct msm_gpu *gpu, int nr_bos, int nr_cmds)
+		struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+		int nr_bos, int nr_cmds)
 {
 	struct msm_gem_submit *submit;
 	int sz = sizeof(*submit) + (nr_bos * sizeof(submit->bos[0])) +
 			(nr_cmds * sizeof(*submit->cmd));
 
 	submit = kmalloc(sz, GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
+
 	if (!submit)
 		return NULL;
 
 	submit->dev = dev;
+	submit->aspace = aspace;
 	submit->gpu = gpu;
 	submit->fence = NULL;
 	submit->pid = get_pid(task_pid(current));
@@ -158,7 +161,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit, int i)
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_put_iova(&msm_obj->base, submit->gpu->aspace);
+		msm_gem_put_iova(&msm_obj->base, submit->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		ww_mutex_unlock(&msm_obj->resv->lock);
@@ -246,7 +249,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 
 		/* if locking succeeded, pin bo: */
 		ret = msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+				submit->aspace, &iova);
 
 		if (ret)
 			break;
@@ -417,7 +420,8 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	}
 	priv->struct_mutex_task = current;
 
-	submit = submit_create(dev, gpu, args->nr_bos, args->nr_cmds);
+	submit = submit_create(dev, gpu, ctx->aspace, args->nr_bos,
+		args->nr_cmds);
 	if (!submit) {
 		ret = -ENOMEM;
 		goto out_unlock;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 25de46f..f7f85bd 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -562,8 +562,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_reference(&msm_obj->base);
-		msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->aspace, &iova);
+		msm_gem_get_iova_locked(&msm_obj->base, submit->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			msm_gem_move_to_active(&msm_obj->base, gpu, true, submit->fence);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/6] drm/msm: Make separate iommu function tables for v1 and v2 MMUs
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-03-07 17:14   ` [PATCH 2/6] drm/msm: Pass the MMU domain index in struct msm_file_private Jordan Crouse
@ 2017-03-07 17:14   ` Jordan Crouse
  2017-03-07 17:14   ` [PATCH 4/6] drm/msm: Use TTBR1 for kernel side GPU buffer objects Jordan Crouse
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Since we have the infrastructure for IOMMU function tables it makes
sense to use it to differentiate between v1 and v2 targets. It adds
a bit more infrastructure but it also gives us the freedom to expand
on each flavor (especially v2) for things like dynamic domains.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 60 ++++++++++++++++++++++++++++-------------
 1 file changed, 41 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index d520db2..c1bfc92 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -21,7 +21,6 @@
 struct msm_iommu {
 	struct msm_mmu base;
 	struct iommu_domain *domain;
-	bool has_ctx;
 };
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
@@ -40,15 +39,12 @@ static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 	return 0;
 }
 
-static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
-			    int cnt)
+static int msm_iommu_v1_attach(struct msm_mmu *mmu, const char *const *names,
+			       int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	int i, ret;
 
-	if (!iommu->has_ctx)
-		return iommu_attach_device(iommu->domain, mmu->dev);
-
 	for (i = 0; i < cnt; i++) {
 		struct device *ctx = msm_iommu_get_ctx(names[i]);
 
@@ -67,15 +63,12 @@ static int msm_iommu_attach(struct msm_mmu *mmu, const char * const *names,
 	return 0;
 }
 
-static void msm_iommu_detach(struct msm_mmu *mmu, const char * const *names,
-			     int cnt)
+static void msm_iommu_v1_detach(struct msm_mmu *mmu, const char * const *names,
+				int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	int i;
 
-	if (!iommu->has_ctx)
-		iommu_detach_device(iommu->domain, mmu->dev);
-
 	for (i = 0; i < cnt; i++) {
 		struct device *ctx = msm_iommu_get_ctx(names[i]);
 
@@ -86,6 +79,22 @@ static void msm_iommu_detach(struct msm_mmu *mmu, const char * const *names,
 	}
 }
 
+static int msm_iommu_v2_attach(struct msm_mmu *mmu, const char * const *names,
+			       int cnt)
+{
+	struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+	return iommu_attach_device(iommu->domain, mmu->dev);
+}
+
+static void msm_iommu_v2_detach(struct msm_mmu *mmu, const char * const *names,
+				int cnt)
+{
+	struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+	iommu_detach_device(iommu->domain, mmu->dev);
+}
+
 static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 		struct sg_table *sgt, unsigned len, int prot)
 {
@@ -159,9 +168,19 @@ static void msm_iommu_destroy(struct msm_mmu *mmu)
 	kfree(iommu);
 }
 
-static const struct msm_mmu_funcs funcs = {
-		.attach = msm_iommu_attach,
-		.detach = msm_iommu_detach,
+/* These are for qcom,msm-smmu-v2 and qcom,msm-mmu-500 based targets */
+static const struct msm_mmu_funcs funcs_v1 = {
+		.attach = msm_iommu_v1_attach,
+		.detach = msm_iommu_v1_detach,
+		.map = msm_iommu_map,
+		.unmap = msm_iommu_unmap,
+		.destroy = msm_iommu_destroy,
+};
+
+/* These are for the arm-smmu based targets */
+static const struct msm_mmu_funcs funcs_v2 = {
+		.attach = msm_iommu_v2_attach,
+		.detach = msm_iommu_v2_detach,
 		.map = msm_iommu_map,
 		.unmap = msm_iommu_unmap,
 		.destroy = msm_iommu_destroy,
@@ -170,18 +189,21 @@ static void msm_iommu_destroy(struct msm_mmu *mmu)
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
 {
 	struct msm_iommu *iommu;
+	const struct msm_mmu_funcs *funcs;
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
 		return ERR_PTR(-ENOMEM);
 
-	iommu->domain = domain;
-	msm_mmu_init(&iommu->base, dev, &funcs);
-	iommu_set_fault_handler(domain, msm_fault_handler, iommu, true);
-
 	if (of_find_compatible_node(NULL, NULL, "qcom,msm-smmu-v2") ||
 			of_find_compatible_node(NULL, NULL, "qcom,msm-mmu-500"))
-		iommu->has_ctx = true;
+		funcs = &funcs_v1;
+	else
+		funcs = &funcs_v2;
+
+	iommu->domain = domain;
+	msm_mmu_init(&iommu->base, dev, funcs);
+	iommu_set_fault_handler(domain, msm_fault_handler, iommu, true);
 
 	return &iommu->base;
 }
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/6] drm/msm: Use TTBR1 for kernel side GPU buffer objects
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-03-07 17:14   ` [PATCH 2/6] drm/msm: Pass the MMU domain index in struct msm_file_private Jordan Crouse
  2017-03-07 17:14   ` [PATCH 3/6] drm/msm: Make separate iommu function tables for v1 and v2 MMUs Jordan Crouse
@ 2017-03-07 17:14   ` Jordan Crouse
  2017-03-07 17:14   ` [PATCH 5/6] drm/msm: Support dynamic IOMMU domains Jordan Crouse
  2017-03-07 17:14   ` [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables Jordan Crouse
  4 siblings, 0 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Use a TTBR1 pagetable for the GPU IOMMU domain and map all
the GPU kernel side buffer objects into that range.  This
will make it easier to switch out TTBR0 for per-process
pagetables.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 18 ++++++++++++++++--
 drivers/gpu/drm/msm/msm_iommu.c         |  7 +++++++
 2 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index b41bd88..d7864ac 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -418,8 +418,22 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.ioname = "kgsl_3d0_reg_memory";
 	adreno_gpu_config.irqname = "kgsl_3d0_irq";
 
-	adreno_gpu_config.va_start = SZ_16M;
-	adreno_gpu_config.va_end = 0xffffffff;
+	if (adreno_gpu->revn >= 500) {
+		/*
+		 * By default map all A5XX buffers into the TTBR1 va space.
+		 * If per-instance pagetables are used then they will
+		 * use their own address space and the default domain will only
+		 * be used for kernel buffers. If per-instance pagetables aren't
+		 * enabled then we'll end up using the TTBR1 range as the
+		 * default global pagetable but that's okay because we have
+		 * plenty of room.
+		 */
+		adreno_gpu_config.va_start = 0xfffffff800000000ULL;
+		adreno_gpu_config.va_end =   0xfffffff8ffffffffULL;
+	} else {
+		adreno_gpu_config.va_start = SZ_16M;
+		adreno_gpu_config.va_end = 0xffffffff;
+	}
 
 	adreno_gpu_config.nr_rings = nr_rings;
 
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index c1bfc92..cc82410 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -83,6 +83,13 @@ static int msm_iommu_v2_attach(struct msm_mmu *mmu, const char * const *names,
 			       int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
+	int val = 1;
+
+	/* Use TTBR1 if it exists */
+	/* FIXME: This should only be for GPU and in theory only for A5XX */
+
+	iommu_domain_set_attr(iommu->domain, DOMAIN_ATTR_ENABLE_TTBR1,
+		&val);
 
 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 5/6] drm/msm: Support dynamic IOMMU domains
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-03-07 17:14   ` [PATCH 4/6] drm/msm: Use TTBR1 for kernel side GPU buffer objects Jordan Crouse
@ 2017-03-07 17:14   ` Jordan Crouse
  2017-03-07 17:14   ` [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables Jordan Crouse
  4 siblings, 0 replies; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Dynamic IOMMU domains allow multiple pagetables to be attached to the
same IOMMU device. These can be used by smart devices like the GPU
that can switch the pagetable dynamically between DRM instances.

Add support for dynamic IOMMU domains if they are enabled and
supported by your friendly neighborhood IOMMU driver.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_iommu.c | 99 ++++++++++++++++++++++++++++++++++-------
 drivers/gpu/drm/msm/msm_iommu.h | 34 ++++++++++++++
 drivers/gpu/drm/msm/msm_mmu.h   |  2 +-
 3 files changed, 117 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/msm_iommu.h

diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index cc82410..38ec5e8 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -16,13 +16,7 @@
  */
 
 #include "msm_drv.h"
-#include "msm_mmu.h"
-
-struct msm_iommu {
-	struct msm_mmu base;
-	struct iommu_domain *domain;
-};
-#define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
+#include "msm_iommu.h"
 
 static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 		unsigned long iova, int flags, void *arg)
@@ -83,14 +77,19 @@ static int msm_iommu_v2_attach(struct msm_mmu *mmu, const char * const *names,
 			       int cnt)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
-	int val = 1;
+	int val = 1, ret;
 
 	/* Use TTBR1 if it exists */
-	/* FIXME: This should only be for GPU and in theory only for A5XX */
-
-	iommu_domain_set_attr(iommu->domain, DOMAIN_ATTR_ENABLE_TTBR1,
+	ret = iommu_domain_set_attr(iommu->domain, DOMAIN_ATTR_ENABLE_TTBR1,
 		&val);
 
+	/*
+	 * We will only allow per-instance pagetables if TTBR1 is available
+	 * because the alternative is too labor intensive
+	 */
+	iommu->allow_dynamic = !ret ? true : false;
+
+	/* Attach the device to the domain */
 	return iommu_attach_device(iommu->domain, mmu->dev);
 }
 
@@ -102,6 +101,18 @@ static void msm_iommu_v2_detach(struct msm_mmu *mmu, const char * const *names,
 	iommu_detach_device(iommu->domain, mmu->dev);
 }
 
+/* Dynamic domains do not have attach/detach steps */
+static int msm_iommu_attach_dynamic(struct msm_mmu *mmu,
+		const char * const *names, int cnt)
+{
+	return 0;
+}
+
+static void msm_iommu_detach_dynamic(struct msm_mmu *mmu,
+				     const char * const *names, int cnt)
+{
+}
+
 static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 		struct sg_table *sgt, unsigned len, int prot)
 {
@@ -193,24 +204,78 @@ static void msm_iommu_destroy(struct msm_mmu *mmu)
 		.destroy = msm_iommu_destroy,
 };
 
-struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
+/* These are for dynamic per-instance domains */
+static const struct msm_mmu_funcs dynamic_funcs = {
+		.attach = msm_iommu_attach_dynamic,
+		.detach = msm_iommu_detach_dynamic,
+		.map = msm_iommu_map,
+		.unmap = msm_iommu_unmap,
+		.destroy = msm_iommu_destroy,
+};
+
+struct msm_mmu *_msm_iommu_new(struct device *dev, struct iommu_domain *domain,
+		const struct msm_mmu_funcs *funcs)
 {
 	struct msm_iommu *iommu;
-	const struct msm_mmu_funcs *funcs;
 
 	iommu = kzalloc(sizeof(*iommu), GFP_KERNEL);
 	if (!iommu)
 		return ERR_PTR(-ENOMEM);
 
+	iommu->domain = domain;
+	msm_mmu_init(&iommu->base, dev, funcs);
+	iommu_set_fault_handler(domain, msm_fault_handler, iommu, true);
+
+	return &iommu->base;
+}
+
+struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
+{
+	const struct msm_mmu_funcs *funcs;
+
 	if (of_find_compatible_node(NULL, NULL, "qcom,msm-smmu-v2") ||
 			of_find_compatible_node(NULL, NULL, "qcom,msm-mmu-500"))
 		funcs = &funcs_v1;
 	else
 		funcs = &funcs_v2;
 
-	iommu->domain = domain;
-	msm_mmu_init(&iommu->base, dev, funcs);
-	iommu_set_fault_handler(domain, msm_fault_handler, iommu, true);
+	return _msm_iommu_new(dev, domain, funcs);
+}
 
-	return &iommu->base;
+/*
+ * Given a base domain that is attached to a IOMMU device try to create a
+ * dynamic domain that is also attached to the same device but allocates a new
+ * pagetable. This is used to allow multiple pagetables to be attached to the
+ * same device.
+ */
+struct msm_mmu *msm_iommu_new_dynamic(struct msm_mmu *base)
+{
+	static unsigned int procid;
+	struct msm_iommu *base_iommu = to_msm_iommu(base);
+	struct msm_iommu *iommu;
+	struct iommu_domain *domain;
+	struct msm_mmu *mmu;
+
+	/* Don't continue if the base domain didn't have the support we need */
+	if (!base || base_iommu->allow_dynamic == false)
+		return ERR_PTR(-EOPNOTSUPP);
+
+	domain = iommu_domain_create_dynamic(base_iommu->domain);
+	if (!domain)
+		return ERR_PTR(-ENODEV);
+
+	mmu = _msm_iommu_new(base->dev, domain, &dynamic_funcs);
+
+	if (IS_ERR(mmu)) {
+		if (domain)
+			iommu_domain_free(domain);
+	}
+
+	iommu = to_msm_iommu(mmu);
+
+	iommu_domain_get_attr(iommu->domain, DOMAIN_ATTR_TTBR0, &iommu->ttbr0);
+
+	iommu->contextidr = ++procid;
+
+	return mmu;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.h b/drivers/gpu/drm/msm/msm_iommu.h
new file mode 100644
index 0000000..bfb7501
--- /dev/null
+++ b/drivers/gpu/drm/msm/msm_iommu.h
@@ -0,0 +1,34 @@
+/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _MSM_IOMMU_H_
+#define _MSM_IOMMU_H_
+
+#include "msm_mmu.h"
+
+struct msm_iommu {
+	struct msm_mmu base;
+	struct iommu_domain *domain;
+	int cb;
+	phys_addr_t ttbr0;
+	uint32_t contextidr;
+	bool allow_dynamic;
+};
+#define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
+
+static inline bool msm_iommu_allow_dynamic(struct msm_mmu *mmu)
+{
+	struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+	return iommu->allow_dynamic;
+}
+#endif
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index e7d1967..65e5202 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -45,7 +45,7 @@ static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
 }
 
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
-struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu);
+struct msm_mmu *msm_iommu_new_dynamic(struct msm_mmu *orig);
 
 #ifdef CONFIG_QCOM_IOMMU_V1
 struct device *msm_iommu_get_ctx(const char *ctx_name);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-03-07 17:14   ` [PATCH 5/6] drm/msm: Support dynamic IOMMU domains Jordan Crouse
@ 2017-03-07 17:14   ` Jordan Crouse
  2017-03-07 18:14     ` Mark Rutland
  4 siblings, 1 reply; 9+ messages in thread
From: Jordan Crouse @ 2017-03-07 17:14 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Support per-instance pagetables for 5XX targets. Per-instance
pagetables allow each open DRM instance to have its own VM memory
space to prevent accidently or maliciously copying or overwriting
buffers from other instances. It also opens the door for SVM since
any given CPU side address can be more reliably mapped into the
instance's GPU VM space without conflict.

To support this create a new dynamic domain (pagetable) for each open
DRM file and map buffer objects for each instance into that pagetable.
Use the GPU to switch to the pagetable for the instance while doing a
submit.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/msm8996.dtsi     |  2 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 64 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 17 ++++++++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 61 +++++++++++++++++++++++------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  2 +
 drivers/gpu/drm/msm/msm_drv.c             | 60 ++++++++++++++++++++++-------
 drivers/gpu/drm/msm/msm_drv.h             |  3 ++
 drivers/gpu/drm/msm/msm_gem_vma.c         | 38 +++++++++++++++---
 8 files changed, 216 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 2903020..6372f3a 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -867,7 +867,9 @@
 
 			qcom,skip-init;
 			qcom,register-save;
+
 			arm,smmu-enable-stall;
+			qcom,dynamic;
 
 			status = "okay";
 		};
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 06238b7..65cd3ef 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -18,7 +18,7 @@
 #include <linux/dma-mapping.h>
 #include <linux/of_reserved_mem.h>
 #include "msm_gem.h"
-#include "msm_mmu.h"
+#include "msm_iommu.h"
 #include "a5xx_gpu.h"
 
 extern bool hang_debug;
@@ -209,6 +209,66 @@ static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
+static void a5xx_set_pagetable(struct msm_gpu *gpu, struct msm_ringbuffer *ring,
+	struct msm_file_private *ctx)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_mmu *mmu = ctx->aspace->mmu;
+	struct msm_iommu *iommu = to_msm_iommu(mmu);
+
+	if (!iommu->ttbr0)
+		return;
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn on APIV mode to access critical regions */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 1);
+
+	/* Make sure the ME is syncronized before staring the update */
+	OUT_PKT7(ring, CP_WAIT_FOR_ME, 0);
+
+	/* Execute the table update */
+	OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 3);
+	OUT_RING(ring, lower_32_bits(iommu->ttbr0));
+	OUT_RING(ring, upper_32_bits(iommu->ttbr0));
+	OUT_RING(ring, iommu->contextidr);
+
+	/*
+	 * Write the new TTBR0 to the preemption records - this will be used to
+	 * reload the pagetable if the current ring gets preempted out.
+	 */
+	OUT_PKT7(ring, CP_MEM_WRITE, 4);
+	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id, ttbr0)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id, ttbr0)));
+	OUT_RING(ring, lower_32_bits(iommu->ttbr0));
+	OUT_RING(ring, upper_32_bits(iommu->ttbr0));
+
+	/* Also write the current contextidr (ASID) */
+	OUT_PKT7(ring, CP_MEM_WRITE, 3);
+	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id,
+		contextidr)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id,
+		contextidr)));
+	OUT_RING(ring, iommu->contextidr);
+
+	/* Invalidate the draw state so we start off fresh */
+	OUT_PKT7(ring, CP_SET_DRAW_STATE, 3);
+	OUT_RING(ring, 0x40000);
+	OUT_RING(ring, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off APRIV */
+	OUT_PKT4(ring, REG_A5XX_CP_CNTL, 1);
+	OUT_RING(ring, 0);
+
+	/* Turn off protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
@@ -219,6 +279,8 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	a5xx_set_pagetable(gpu, ring, ctx);
+
 	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
 	OUT_RING(ring, 0x02);
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index f042a78..19deea0 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -47,6 +47,9 @@ struct a5xx_gpu {
 	atomic_t preempt_state;
 	struct timer_list preempt_timer;
 
+	struct a5xx_smmu_info *smmu_info;
+	struct drm_gem_object *smmu_info_bo;
+	uint64_t smmu_info_iova;
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
@@ -127,6 +130,20 @@ struct a5xx_preempt_record {
  */
 #define A5XX_PREEMPT_COUNTER_SIZE (16 * 4)
 
+/*
+ * This is a global structure that the preemption code uses to switch in the
+ * pagetable for the preempted process - the code switches in whatever we
+ * after preempting in a new ring.
+ */
+struct a5xx_smmu_info {
+	uint32_t  magic;
+	uint32_t  _pad4;
+	uint64_t  ttbr0;
+	uint32_t  asid;
+	uint32_t  contextidr;
+};
+
+#define A5XX_SMMU_INFO_MAGIC 0x3618CDA3UL
 
 int a5xx_power_init(struct msm_gpu *gpu);
 void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
index 582ba9b..1b29049 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -12,6 +12,7 @@
  */
 
 #include "msm_gem.h"
+#include "msm_iommu.h"
 #include "a5xx_gpu.h"
 
 static void *alloc_kernel_bo(struct drm_device *drm, struct msm_gpu *gpu,
@@ -172,6 +173,17 @@ void a5xx_preempt_trigger(struct msm_gpu *gpu)
 	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
 	spin_unlock_irqrestore(&ring->lock, flags);
 
+	/* Do read barrier to make sure we have updated pagetable info */
+	rmb();
+
+	/* Set the SMMU info for the preemption */
+	if (a5xx_gpu->smmu_info) {
+		a5xx_gpu->smmu_info->ttbr0 =
+			adreno_gpu->memptrs->ttbr0[ring->id];
+		a5xx_gpu->smmu_info->contextidr =
+			adreno_gpu->memptrs->contextidr[ring->id];
+	}
+
 	/* Set the address of the incoming preemption record */
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
 		REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_HI,
@@ -247,9 +259,10 @@ void a5xx_preempt_hw_init(struct msm_gpu *gpu)
 		}
 	}
 
-	/* Write a 0 to signal that we aren't switching pagetables */
+	/* Tell the CP where to find the smmu_info buffer */
 	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_LO,
-		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI, 0);
+		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI,
+		a5xx_gpu->smmu_info_iova);
 
 	/* Reset the preemption state */
 	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
@@ -311,6 +324,13 @@ void a5xx_preempt_fini(struct msm_gpu *gpu)
 
 		a5xx_gpu->preempt_bo[i] = NULL;
 	}
+
+	if (a5xx_gpu->smmu_info_bo) {
+		if (a5xx_gpu->smmu_info_iova)
+			msm_gem_put_iova(a5xx_gpu->smmu_info_bo, gpu->aspace);
+		drm_gem_object_unreference_unlocked(a5xx_gpu->smmu_info_bo);
+		a5xx_gpu->smmu_info_bo = NULL;
+	}
 }
 
 void a5xx_preempt_init(struct msm_gpu *gpu)
@@ -318,6 +338,9 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
 	struct msm_ringbuffer *ring;
+	struct a5xx_smmu_info *ptr;
+	struct drm_gem_object *bo;
+	uint64_t iova;
 	int i;
 
 	/* No preemption if we only have one ring */
@@ -328,18 +351,34 @@ void a5xx_preempt_init(struct msm_gpu *gpu)
 		if (!ring)
 			continue;
 
-		if (preempt_init_ring(a5xx_gpu, ring)) {
-			/*
-			 * On any failure our adventure is over. Clean up and
-			 * set nr_rings to 1 to force preemption off
-			 */
-			a5xx_preempt_fini(gpu);
-			gpu->nr_rings = 1;
+		if (preempt_init_ring(a5xx_gpu, ring))
+			goto fail;
+	}
+
+	if (msm_iommu_allow_dynamic(gpu->aspace->mmu)) {
+		ptr = alloc_kernel_bo(gpu->dev, gpu,
+			sizeof(struct a5xx_smmu_info),
+			MSM_BO_UNCACHED, &bo, &iova);
 
-			return;
-		}
+		if (IS_ERR(ptr))
+			goto fail;
+
+		ptr->magic = A5XX_SMMU_INFO_MAGIC;
+
+		a5xx_gpu->smmu_info_bo = bo;
+		a5xx_gpu->smmu_info_iova = iova;
+		a5xx_gpu->smmu_info = ptr;
 	}
 
 	setup_timer(&a5xx_gpu->preempt_timer, a5xx_preempt_timer,
 		(unsigned long) a5xx_gpu);
+
+	return;
+fail:
+	/*
+	 * On any failure our adventure is over. Clean up and
+	 * set nr_rings to 1 to force preemption off
+	 */
+	a5xx_preempt_fini(gpu);
+	gpu->nr_rings = 1;
 }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 841ec30..b33cbf0 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -93,6 +93,8 @@ struct adreno_info {
 struct adreno_rbmemptrs {
 	volatile uint32_t rptr[MSM_GPU_MAX_RINGS];
 	volatile uint32_t fence[MSM_GPU_MAX_RINGS];
+	volatile uint64_t ttbr0[MSM_GPU_MAX_RINGS];
+	volatile unsigned int contextidr[MSM_GPU_MAX_RINGS];
 };
 
 struct adreno_gpu {
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 7b7a2e7..1ae4823 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -22,6 +22,8 @@
 #include "msm_fence.h"
 #include "msm_gpu.h"
 #include "msm_kms.h"
+#include "msm_gem.h"
+#include "msm_mmu.h"
 
 
 /*
@@ -515,11 +517,37 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
 	 */
 	load_gpu(dev);
 
+	if (!priv->gpu)
+		return 0;
+
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
 	if (!ctx)
 		return -ENOMEM;
 
-	ctx->aspace = priv->gpu->aspace;
+	/*
+	 * FIXME: we will want to use a dynamic name of some sort
+	 * FIXME: WE will need a smarter way to set the range based on target
+	 */
+	ctx->aspace = msm_gem_address_space_create_instance(
+		priv->gpu->aspace->mmu, "gpu", 0x100000000, 0x1ffffffff);
+
+	if (IS_ERR(ctx->aspace)) {
+		int ret = PTR_ERR(ctx->aspace);
+
+		/*
+		 * If dynamic domains are not supported, everybody uses the same
+		 * pagetable
+		 */
+		if (ret == -EOPNOTSUPP)
+			ctx->aspace = priv->gpu->aspace;
+		else {
+			kfree(ctx);
+			return ret;
+		}
+	} else {
+		ctx->aspace->mmu->funcs->attach(ctx->aspace->mmu, NULL, 0);
+	}
+
 	file->driver_priv = ctx;
 
 	return 0;
@@ -534,10 +562,25 @@ static void msm_preclose(struct drm_device *dev, struct drm_file *file)
 	if (ctx == priv->lastctx)
 		priv->lastctx = NULL;
 	mutex_unlock(&dev->struct_mutex);
+}
+
+static void msm_postclose(struct drm_device *dev, struct drm_file *file)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+	struct msm_file_private *ctx = file->driver_priv;
+
+
+	mutex_lock(&dev->struct_mutex);
+	if (ctx && ctx->aspace && ctx->aspace != priv->gpu->aspace) {
+		ctx->aspace->mmu->funcs->detach(ctx->aspace->mmu, NULL, 0);
+		msm_gem_address_space_put(ctx->aspace);
+	}
+	mutex_unlock(&dev->struct_mutex);
 
 	kfree(ctx);
 }
 
+
 static void msm_lastclose(struct drm_device *dev)
 {
 	struct msm_drm_private *priv = dev->dev_private;
@@ -684,17 +727,6 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 	return ret;
 }
 
-static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-		struct drm_gem_object *obj, uint64_t *iova)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-
-	if (!priv->gpu)
-		return -EINVAL;
-
-	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
-}
-
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -710,9 +742,10 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	if (args->flags & MSM_INFO_IOVA) {
+		struct msm_file_private *ctx = file->driver_priv;
 		uint64_t iova;
 
-		ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
+		ret = msm_gem_get_iova(obj, ctx->aspace, &iova);
 		if (!ret)
 			args->offset = iova;
 	} else {
@@ -818,6 +851,7 @@ static int msm_ioctl_gem_madvise(struct drm_device *dev, void *data,
 				DRIVER_MODESET,
 	.open               = msm_open,
 	.preclose           = msm_preclose,
+	.postclose	    = msm_postclose,
 	.lastclose          = msm_lastclose,
 	.irq_handler        = msm_irq,
 	.irq_preinstall     = msm_irq_preinstall,
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index bbad6c7..3efb3f1 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -191,6 +191,9 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 struct msm_gem_address_space *
 msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
 		const char *name);
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		uint64_t start, uint64_t end);
 
 void msm_gem_submit_free(struct msm_gem_submit *submit);
 int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c
index 2b1d2cb..0f65b19 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -85,9 +85,9 @@ void msm_gem_address_space_put(struct msm_gem_address_space *aspace)
 	return ret;
 }
 
-struct msm_gem_address_space *
-msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
-		const char *name)
+static struct msm_gem_address_space *
+msm_gem_address_space_new(struct msm_mmu *mmu, const char *name,
+		uint64_t start, uint64_t end)
 {
 	struct msm_gem_address_space *aspace;
 
@@ -96,12 +96,38 @@ struct msm_gem_address_space *
 		return ERR_PTR(-ENOMEM);
 
 	aspace->name = name;
-	aspace->mmu = msm_iommu_new(dev, domain);
+	aspace->mmu = mmu;
 
-	drm_mm_init(&aspace->mm, (domain->geometry.aperture_start >> PAGE_SHIFT),
-			(domain->geometry.aperture_end >> PAGE_SHIFT) - 1);
+	drm_mm_init(&aspace->mm, (start >> PAGE_SHIFT),
+		(end >> PAGE_SHIFT) - 1);
 
 	kref_init(&aspace->kref);
 
 	return aspace;
 }
+
+struct msm_gem_address_space *
+msm_gem_address_space_create(struct device *dev, struct iommu_domain *domain,
+		const char *name)
+{
+	struct msm_mmu *mmu = msm_iommu_new(dev, domain);
+
+	if (IS_ERR(mmu))
+		return (struct msm_gem_address_space *) mmu;
+
+	return msm_gem_address_space_new(mmu, name,
+		domain->geometry.aperture_start,
+		domain->geometry.aperture_end);
+}
+
+struct msm_gem_address_space *
+msm_gem_address_space_create_instance(struct msm_mmu *parent, const char *name,
+		uint64_t start, uint64_t end)
+{
+	struct msm_mmu *child = msm_iommu_new_dynamic(parent);
+
+	if (IS_ERR(child))
+		return (struct msm_gem_address_space *) child;
+
+	return msm_gem_address_space_new(child, name, start, end);
+}
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables
  2017-03-07 17:14   ` [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables Jordan Crouse
@ 2017-03-07 18:14     ` Mark Rutland
  0 siblings, 0 replies; 9+ messages in thread
From: Mark Rutland @ 2017-03-07 18:14 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: freedreno, dri-devel, linux-arm-msm

On Tue, Mar 07, 2017 at 10:14:20AM -0700, Jordan Crouse wrote:
> Support per-instance pagetables for 5XX targets. Per-instance
> pagetables allow each open DRM instance to have its own VM memory
> space to prevent accidently or maliciously copying or overwriting
> buffers from other instances. It also opens the door for SVM since
> any given CPU side address can be more reliably mapped into the
> instance's GPU VM space without conflict.
> 
> To support this create a new dynamic domain (pagetable) for each open
> DRM file and map buffer objects for each instance into that pagetable.
> Use the GPU to switch to the pagetable for the instance while doing a
> submit.
> 
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  arch/arm64/boot/dts/qcom/msm8996.dtsi     |  2 +
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 64 ++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 17 ++++++++
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 61 +++++++++++++++++++++++------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  2 +
>  drivers/gpu/drm/msm/msm_drv.c             | 60 ++++++++++++++++++++++-------
>  drivers/gpu/drm/msm/msm_drv.h             |  3 ++
>  drivers/gpu/drm/msm/msm_gem_vma.c         | 38 +++++++++++++++---
>  8 files changed, 216 insertions(+), 31 deletions(-)
> 
> diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> index 2903020..6372f3a 100644
> --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
> +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
> @@ -867,7 +867,9 @@
>  
>  			qcom,skip-init;
>  			qcom,register-save;
> +

Pointless whitespace change?

>  			arm,smmu-enable-stall;
> +			qcom,dynamic;

As commented on the RFC series adding code for this property, it is at
best not clear what this means, and at present I do not beleive this
should be in the DT at all.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/6] drm/msm: Add per-instance pagetables
  2017-03-07 17:14 [PATCH 0/6] drm/msm: Add per-instance pagetables Jordan Crouse
  2017-03-07 17:14 ` [PATCH 1/6] drm/msm: Enable 64 bit mode by default Jordan Crouse
       [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-03-07 21:53 ` Daniel Vetter
  2 siblings, 0 replies; 9+ messages in thread
From: Daniel Vetter @ 2017-03-07 21:53 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: linux-arm-msm, freedreno, dri-devel

On Tue, Mar 07, 2017 at 10:14:14AM -0700, Jordan Crouse wrote:
> Using the framework described here
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2017-March/020716.html
> 
> This implements per-instance pagetables for the GPU driver creating an
> individual pagetable for each file descriptor (so not strictly per-process
> but in practice we can't share buffers between file descriptors anyway without
> importing them).

This is pretty much how it's done everywhere else. Or well, maybe at a
context level, if your driver allos creation of additional contexts.

Might be good to document that somewhere in drm-mm.rst as best practices
for gem drivers ... Volunteered?
-Daniel

> 
> This is the brief workflow for the process:
> 
>  - At init, the driver attaches an UNMANGED domain to the IOMMU (context bank 0)
> 
>  - All "global" buffers (kernel side GPU buffers such as ringbuffers, etc) are
>    mapped into the TTBR1 space which is defined as any address with bit 48 set.
>    In pratice we have discovered that for reasons yet uknown, bit 47 also has
>    to be set for the GPU to sign extend correctly, so the TTBR1 region is
>    defined as starting at 0xffff8_0000_0000_0000.
> 
>  - When a new file descriptor is opened, a dynamic domain is cloned from the
>    real domain - this does not program the hardware but it creates a pagetable
>    and returns a pointer that we can use to map memory to - this is wrapped in a
>    new addresss space and used for all allocations created with the file
>    descriptor.
> 
>  - At command submission time, a SMMU_TABLE_UPDATE packet is set before every
>    command which contains the physical address of the TTBR0 register for the
>    pagetable associated with the process - the GPU will automatically switch
>    the pagetable for the process.
> 
> Because no kernel side allocations are in the TTBR0 space there is no setup
> required to switch the TTBR0 pagetable and we do not need to reprogram it
> after the command is over since the next command will rewrite the register.
> This makes the code significantly more simple than it could be (*cough*
> downstream *cough*).
> 
> I'm sure there will be questions, and I'm sure that what we have won't be what
> is finally decided upon in the arm-smmu driver (in particular there are some
> nice parts of the arm-v3 SVM solution that we can borrow) but I think it is
> important to get eyeballs on this for posterity.
> 
> Thanks!
> Jordan
> 
> Jordan Crouse (6):
>   drm/msm: Enable 64 bit mode by default
>   drm/msm: Pass the MMU domain index in struct msm_file_private
>   drm/msm: Make separate iommu function tables for v1 and v2 MMUs
>   drm/msm: Use TTBR1 for kernel side GPU buffer objects
>   drm/msm: Support dynamic IOMMU domains
>   drm/msm: a5xx: Support per-instance pagetables
> 
>  arch/arm64/boot/dts/qcom/msm8996.dtsi     |   2 +
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |  78 ++++++++++++++-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h     |  17 ++++
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c |  61 +++++++++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  18 +++-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   2 +
>  drivers/gpu/drm/msm/msm_drv.c             |  60 +++++++++---
>  drivers/gpu/drm/msm/msm_drv.h             |   9 +-
>  drivers/gpu/drm/msm/msm_gem.h             |   1 +
>  drivers/gpu/drm/msm/msm_gem_submit.c      |  12 ++-
>  drivers/gpu/drm/msm/msm_gem_vma.c         |  38 ++++++--
>  drivers/gpu/drm/msm/msm_gpu.c             |   3 +-
>  drivers/gpu/drm/msm/msm_iommu.c           | 151 ++++++++++++++++++++++++------
>  drivers/gpu/drm/msm/msm_iommu.h           |  34 +++++++
>  drivers/gpu/drm/msm/msm_mmu.h             |   2 +-
>  15 files changed, 415 insertions(+), 73 deletions(-)
>  create mode 100644 drivers/gpu/drm/msm/msm_iommu.h
> 
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-07 21:53 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-07 17:14 [PATCH 0/6] drm/msm: Add per-instance pagetables Jordan Crouse
2017-03-07 17:14 ` [PATCH 1/6] drm/msm: Enable 64 bit mode by default Jordan Crouse
     [not found] ` <1488906860-11073-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-03-07 17:14   ` [PATCH 2/6] drm/msm: Pass the MMU domain index in struct msm_file_private Jordan Crouse
2017-03-07 17:14   ` [PATCH 3/6] drm/msm: Make separate iommu function tables for v1 and v2 MMUs Jordan Crouse
2017-03-07 17:14   ` [PATCH 4/6] drm/msm: Use TTBR1 for kernel side GPU buffer objects Jordan Crouse
2017-03-07 17:14   ` [PATCH 5/6] drm/msm: Support dynamic IOMMU domains Jordan Crouse
2017-03-07 17:14   ` [PATCH 6/6] drm/msm: a5xx: Support per-instance pagetables Jordan Crouse
2017-03-07 18:14     ` Mark Rutland
2017-03-07 21:53 ` [PATCH 0/6] drm/msm: Add " Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.