All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/11] drm/msm: A5XX preemption
@ 2017-02-06 17:39 Jordan Crouse
  2017-02-06 17:39 ` [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

This series of patches implements multiple ringbuffers and preemption for Adreno
A5XX targets. Preemption allows a command to be interrupted at specific
preemption points and execution switched to a different ringbuffer.

The software alogrithm uses preemption to enforce quality of service for
priority levels - commands to a certain ring preempt the rings of lower
priority. Note that priority is a software construct; the driver chooses a ring
to switch to and the hardware executes. This is important because it shows that
preemption can be used for things other than priority (timeslices for quality of
service for example).

This initial series implements 4 ringbuffers to give sufficient coverage for the
range of priority levels requested by the GLES and compute extensions. The
targeted ringbuffer is specified in the command submission flags. The default
ring is 0 (lowest priority).

Jordan

Jordan Crouse (11):
  drm/msm: Make sure to detach the MMU during GPU cleanup
  drm/msm: Improve the zap shader
  drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
  drm/msm: Remove idle function hook
  drm/msm: get an iova from the address space instead of an id
  drm/msm: Add a struct to pass configuration to msm_gpu_init()
  drm/msm: Remove memptrs->wptr
  drm/msm: Support multiple ringbuffers
  drm/msm: Shadow current pointer in the ring until command is complete
  drm/msm: Make the value of RB_CNTL (almost) generic
  drm/msm: Implement preemption for A5XX targets

 drivers/gpu/drm/msm/Makefile              |   1 +
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  13 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  13 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 278 +++++++++++++++++-----
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 106 +++++++++
 drivers/gpu/drm/msm/adreno/a5xx_power.c   |  11 +-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 367 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 215 +++++++++++------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  42 ++--
 drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   3 -
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 +-
 drivers/gpu/drm/msm/msm_drv.c             |  43 ++--
 drivers/gpu/drm/msm/msm_drv.h             |  27 ++-
 drivers/gpu/drm/msm/msm_fb.c              |  15 +-
 drivers/gpu/drm/msm/msm_fbdev.c           |  10 +-
 drivers/gpu/drm/msm/msm_fence.c           |  85 +++++--
 drivers/gpu/drm/msm/msm_fence.h           |  13 +-
 drivers/gpu/drm/msm/msm_gem.c             | 124 +++++++---
 drivers/gpu/drm/msm/msm_gem.h             |   5 +-
 drivers/gpu/drm/msm/msm_gem_submit.c      |  14 +-
 drivers/gpu/drm/msm/msm_gpu.c             | 140 +++++++-----
 drivers/gpu/drm/msm/msm_gpu.h             |  54 ++++-
 drivers/gpu/drm/msm/msm_kms.h             |   3 +
 drivers/gpu/drm/msm/msm_ringbuffer.c      |  14 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  21 +-
 include/uapi/drm/msm_drm.h                |   9 +-
 33 files changed, 1324 insertions(+), 389 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup
       [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-02-06 17:39   ` Jordan Crouse
  2017-02-06 17:39   ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
  2017-03-07 16:58   ` [v2] [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
  2 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c.  Plus it is better symmetry to have
the attach and detach at the same code level.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 29 +++++++++++++++++++----------
 drivers/gpu/drm/msm/msm_gpu.c           |  3 ---
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index bc2224b..acb685a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -421,18 +421,27 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	return 0;
 }
 
-void adreno_gpu_cleanup(struct adreno_gpu *gpu)
+void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
 {
-	if (gpu->memptrs_bo) {
-		if (gpu->memptrs)
-			msm_gem_put_vaddr(gpu->memptrs_bo);
+	struct msm_gpu *gpu = &adreno_gpu->base;
+
+	if (adreno_gpu->memptrs_bo) {
+		if (adreno_gpu->memptrs)
+			msm_gem_put_vaddr(adreno_gpu->memptrs_bo);
+
+		if (adreno_gpu->memptrs_iova)
+			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->id);
+
+		drm_gem_object_unreference_unlocked(adreno_gpu->memptrs_bo);
+	}
+	release_firmware(adreno_gpu->pm4);
+	release_firmware(adreno_gpu->pfp);
 
-		if (gpu->memptrs_iova)
-			msm_gem_put_iova(gpu->memptrs_bo, gpu->base.id);
+	msm_gpu_cleanup(gpu);
 
-		drm_gem_object_unreference_unlocked(gpu->memptrs_bo);
+	if (gpu->aspace) {
+		gpu->aspace->mmu->funcs->detach(gpu->aspace->mmu,
+			iommu_ports, ARRAY_SIZE(iommu_ports));
+		msm_gem_address_space_destroy(gpu->aspace);
 	}
-	release_firmware(gpu->pm4);
-	release_firmware(gpu->pfp);
-	msm_gpu_cleanup(&gpu->base);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index d8420be..403cca1 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -711,9 +711,6 @@ void msm_gpu_cleanup(struct msm_gpu *gpu)
 		msm_ringbuffer_destroy(gpu->rb);
 	}
 
-	if (gpu->aspace)
-		msm_gem_address_space_destroy(gpu->aspace);
-
 	if (gpu->fctx)
 		msm_fence_context_free(gpu->fctx);
 }
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/11] drm/msm: Improve the zap shader
       [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-02-06 17:39   ` [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup Jordan Crouse
@ 2017-02-06 17:39   ` Jordan Crouse
  2017-03-07 16:58   ` [v2] [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
  2 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Simply the code, use snprintf correctly and make sure that we memset
the rest of the segment if the memory size in the ELF file is larger
than the file size.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 60 +++++++++++++++++------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 5f8b368..23eeed2 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -31,11 +31,11 @@ static inline bool _check_segment(const struct elf32_phdr *phdr)
 		phdr->p_memsz);
 }
 
-static int __pil_tz_load_image(struct platform_device *pdev,
+static int zap_load_segments(struct platform_device *pdev,
 		const struct firmware *mdt, const char *fwname,
 		void *fwptr, size_t fw_size, unsigned long fw_min_addr)
 {
-	char str[64] = { 0 };
+	char filename[64];
 	const struct elf32_hdr *ehdr = (struct elf32_hdr *) mdt->data;
 	const struct elf32_phdr *phdrs = (struct elf32_phdr *) (ehdr + 1);
 	const struct firmware *fw;
@@ -53,16 +53,18 @@ static int __pil_tz_load_image(struct platform_device *pdev,
 		offset = (phdr->p_paddr - fw_min_addr);
 
 		/* Request the file containing the segment */
-		snprintf(str, sizeof(str) - 1, "%s.b%02d", fwname, i);
+		snprintf(filename, sizeof(filename), "%s.b%02d", fwname, i);
 
-		ret = request_firmware(&fw, str, &pdev->dev);
+		ret = request_firmware(&fw, filename, &pdev->dev);
 		if (ret) {
-			dev_err(&pdev->dev, "Failed to load segment %s\n", str);
+			dev_err(&pdev->dev, "Failed to load segment %s\n",
+				filename);
 			break;
 		}
 
 		if (offset + fw->size > fw_size) {
-			dev_err(&pdev->dev, "Segment %s is too big\n", str);
+			dev_err(&pdev->dev, "Segment %s is too big\n",
+				filename);
 			ret = -EINVAL;
 			release_firmware(fw);
 			break;
@@ -70,15 +72,19 @@ static int __pil_tz_load_image(struct platform_device *pdev,
 
 		/* Copy the segment into place */
 		memcpy(fwptr + offset, fw->data, fw->size);
+
+		if (phdr->p_memsz > phdr->p_filesz)
+			memset(fwptr + fw->size, 0,
+				phdr->p_memsz - phdr->p_filesz);
 		release_firmware(fw);
 	}
 
 	return ret;
 }
 
-static int _pil_tz_load_image(struct platform_device *pdev)
+static int zap_load_mdt(struct platform_device *pdev)
 {
-	char str[64] = { 0 };
+	char filename[64];
 	const char *fwname;
 	const struct elf32_hdr *ehdr;
 	const struct elf32_phdr *phdrs;
@@ -86,7 +92,6 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 	phys_addr_t fw_min_addr, fw_max_addr;
 	dma_addr_t fw_phys;
 	size_t fw_size;
-	u32 pas_id;
 	void *ptr;
 	int i, ret;
 
@@ -95,11 +100,10 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 
 	if (!qcom_scm_is_available()) {
 		dev_err(&pdev->dev, "SCM is not available\n");
-		return -EINVAL;
+		return -EPROBE_DEFER;
 	}
 
 	ret = of_reserved_mem_device_init(&pdev->dev);
-
 	if (ret) {
 		dev_err(&pdev->dev, "Unable to set up the reserved memory\n");
 		return ret;
@@ -112,17 +116,12 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 		return -EINVAL;
 	}
 
-	if (of_property_read_u32(pdev->dev.of_node, "qcom,pas-id", &pas_id)) {
-		dev_err(&pdev->dev, "Could not read the pas ID\n");
-		return -EINVAL;
-	}
-
-	snprintf(str, sizeof(str) - 1, "%s.mdt", fwname);
+	snprintf(filename, sizeof(filename), "%s.mdt", fwname);
 
 	/* Request the MDT file for the firmware */
-	ret = request_firmware(&mdt, str, &pdev->dev);
+	ret = request_firmware(&mdt, filename, &pdev->dev);
 	if (ret) {
-		dev_err(&pdev->dev, "Unable to load %s\n", str);
+		dev_err(&pdev->dev, "Unable to load %s\n", filename);
 		return ret;
 	}
 
@@ -151,7 +150,7 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 	fw_size = (size_t) (fw_max_addr - fw_min_addr);
 
 	/* Verify the MDT header */
-	ret = qcom_scm_pas_init_image(pas_id, mdt->data, mdt->size);
+	ret = qcom_scm_pas_init_image(13, mdt->data, mdt->size);
 	if (ret) {
 		dev_err(&pdev->dev, "Invalid firmware metadata\n");
 		goto out;
@@ -163,18 +162,19 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 		goto out;
 
 	/* Set up the newly allocated memory region */
-	ret = qcom_scm_pas_mem_setup(pas_id, fw_phys, fw_size);
+	ret = qcom_scm_pas_mem_setup(13, fw_phys, fw_size);
 	if (ret) {
 		dev_err(&pdev->dev, "Unable to set up firmware memory\n");
 		goto out;
 	}
 
-	ret = __pil_tz_load_image(pdev, mdt, fwname, ptr, fw_size, fw_min_addr);
-	if (!ret) {
-		ret = qcom_scm_pas_auth_and_reset(pas_id);
-		if (ret)
-			dev_err(&pdev->dev, "Unable to authorize the image\n");
-	}
+	ret = zap_load_segments(pdev, mdt, fwname, ptr, fw_size, fw_min_addr);
+	if (ret)
+		goto out;
+
+	ret = qcom_scm_pas_auth_and_reset(13);
+	if (ret)
+		dev_err(&pdev->dev, "Unable to authorize the image\n");
 
 out:
 	if (ret && ptr)
@@ -502,14 +502,14 @@ static int a5xx_zap_shader_init(struct msm_gpu *gpu)
 	of_platform_populate(pdev->dev.of_node, NULL, NULL, &pdev->dev);
 
 	/* Find the sub-node for the zap shader */
-	node = of_find_node_by_name(pdev->dev.of_node, "qcom,zap-shader");
+	node = of_get_child_by_name(pdev->dev.of_node, "zap-shader");
 	if (!node) {
-		DRM_ERROR("%s: qcom,zap-shader not found in device tree\n",
+		DRM_ERROR("%s: zap-shader not found in device tree\n",
 			gpu->name);
 		return -ENODEV;
 	}
 
-	ret = _pil_tz_load_image(of_find_device_by_node(node));
+	ret = zap_load_mdt(of_find_device_by_node(node));
 	if (ret)
 		DRM_ERROR("%s: Unable to load the zap shader\n",
 			gpu->name);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 19:20   ` Emil Velikov
  2017-02-06 17:39 ` [PATCH 04/11] drm/msm: Remove idle function hook Jordan Crouse
                   ` (9 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
user sets 'hint' to non-zero it means that they want a IOVA for the
GEM object instead of a mmap() offset. Return the iova in the 'offset'
member.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 29 +++++++++++++++++++++++++----
 include/uapi/drm/msm_drm.h    |  4 ++--
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index e29bb66..1e4e022 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -677,6 +677,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int msm_ioctl_gem_info_iova(struct drm_device *dev,
+		struct drm_gem_object *obj, uint64_t *iova)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!priv->gpu)
+		return -EINVAL;
+
+	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+}
+
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -684,14 +695,24 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 	struct drm_gem_object *obj;
 	int ret = 0;
 
-	if (args->pad)
-		return -EINVAL;
-
 	obj = drm_gem_object_lookup(file, args->handle);
 	if (!obj)
 		return -ENOENT;
 
-	args->offset = msm_gem_mmap_offset(obj);
+	/*
+	 * If the hint variable is set, it means that the user wants a IOVA for
+	 * this buffer.  Return the address from the GPU because that is
+	 * probably what it is looking for
+	 */
+	if (args->hint) {
+		uint64_t iova;
+
+		ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
+		if (!ret)
+			args->offset = iova;
+	} else {
+		args->offset = msm_gem_mmap_offset(obj);
+	}
 
 	drm_gem_object_unreference_unlocked(obj);
 
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 4d5d6a2..045ad20 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -105,8 +105,8 @@ struct drm_msm_gem_new {
 
 struct drm_msm_gem_info {
 	__u32 handle;         /* in */
-	__u32 pad;
-	__u64 offset;         /* out, offset to pass to mmap() */
+	__u32 hint;	      /* in */
+	__u64 offset;         /* out, mmap() offset if hint is 0, iova if 1 */
 };
 
 #define MSM_PREP_READ        0x01
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/11] drm/msm: Remove idle function hook
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
  2017-02-06 17:39 ` [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

There isn't any generic code that uses ->idle so remove it.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 9 ++++-----
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h   | 1 +
 drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +-
 drivers/gpu/drm/msm/msm_gpu.h           | 1 -
 6 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index b999349..fc4fd2d 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -40,6 +40,7 @@
 extern bool hang_debug;
 
 static void a3xx_dump(struct msm_gpu *gpu);
+static bool a3xx_idle(struct msm_gpu *gpu);
 
 static bool a3xx_me_init(struct msm_gpu *gpu)
 {
@@ -65,7 +66,7 @@ static bool a3xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	return gpu->funcs->idle(gpu);
+	return a3xx_idle(gpu);
 }
 
 static int a3xx_hw_init(struct msm_gpu *gpu)
@@ -448,7 +449,6 @@ static void a3xx_dump(struct msm_gpu *gpu)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
-		.idle = a3xx_idle,
 		.irq = a3xx_irq,
 		.destroy = a3xx_destroy,
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 511bc85..6bc948b 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -31,6 +31,7 @@
 
 extern bool hang_debug;
 static void a4xx_dump(struct msm_gpu *gpu);
+static bool a4xx_idle(struct msm_gpu *gpu);
 
 /*
  * a4xx_enable_hwcg() - Program the clock control registers
@@ -137,7 +138,7 @@ static bool a4xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	return gpu->funcs->idle(gpu);
+	return a4xx_idle(gpu);
 }
 
 static int a4xx_hw_init(struct msm_gpu *gpu)
@@ -538,7 +539,6 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
-		.idle = a4xx_idle,
 		.irq = a4xx_irq,
 		.destroy = a4xx_destroy,
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 23eeed2..2074f64 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -391,7 +391,7 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 
 	gpu->funcs->flush(gpu);
 
-	return gpu->funcs->idle(gpu) ? 0 : -EINVAL;
+	return a5xx_idle(gpu) ? 0 : -EINVAL;
 }
 
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
@@ -699,7 +699,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		OUT_RING(gpu->rb, 0x0F);
 
 		gpu->funcs->flush(gpu);
-		if (!gpu->funcs->idle(gpu))
+		if (!a5xx_idle(gpu))
 			return -EINVAL;
 	}
 
@@ -716,7 +716,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		OUT_RING(gpu->rb, 0x00000000);
 
 		gpu->funcs->flush(gpu);
-		if (!gpu->funcs->idle(gpu))
+		if (!a5xx_idle(gpu))
 			return -EINVAL;
 	} else {
 		/* Print a warning so if we die, we know why */
@@ -790,7 +790,7 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 		A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT);
 }
 
-static bool a5xx_idle(struct msm_gpu *gpu)
+bool a5xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for CP to drain ringbuffer: */
 	if (!adreno_idle(gpu))
@@ -1091,7 +1091,6 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
 		.flush = adreno_flush,
-		.idle = a5xx_idle,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 1590f84..6b20f28 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -56,5 +56,6 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 	return -ETIMEDOUT;
 }
 
+bool a5xx_idle(struct msm_gpu *gpu);
 
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index 72d52c7..ed0802e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -194,7 +194,7 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 
 	gpu->funcs->flush(gpu);
 
-	if (!gpu->funcs->idle(gpu)) {
+	if (!a5xx_idle(gpu)) {
 		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
 			gpu->name);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index c4c39d3..267723f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -50,7 +50,6 @@ struct msm_gpu_funcs {
 	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			struct msm_file_private *ctx);
 	void (*flush)(struct msm_gpu *gpu);
-	bool (*idle)(struct msm_gpu *gpu);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
 	uint32_t (*last_fence)(struct msm_gpu *gpu);
 	void (*recover)(struct msm_gpu *gpu);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/11] drm/msm: get an iova from the address space instead of an id
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
  2017-02-06 17:39 ` [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
  2017-02-06 17:39 ` [PATCH 04/11] drm/msm: Remove idle function hook Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-09  5:01   ` Archit Taneja
  2017-02-06 17:39 ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

In the future we won't have a fixed set of addresses spaces.
Instead of going through the effort of assigning a ID for each
address space just use the address space itself as a token for
getting / putting an iova.

This forces a few changes in the gem object however: instead
of using a simple index into a list of domains, we need to
maintain a list of them. Luckily the list will be pretty small;
even with dynamic address spaces we wouldn't ever see more than
two or three.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |   8 +-
 drivers/gpu/drm/msm/adreno/a5xx_power.c   |   5 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   6 +-
 drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +++-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 ++---
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   3 -
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 ++--
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +--
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 ++--
 drivers/gpu/drm/msm/msm_drv.c             |  14 ----
 drivers/gpu/drm/msm/msm_drv.h             |  25 +++---
 drivers/gpu/drm/msm/msm_fb.c              |  15 ++--
 drivers/gpu/drm/msm/msm_fbdev.c           |  10 ++-
 drivers/gpu/drm/msm/msm_gem.c             | 124 +++++++++++++++++++++---------
 drivers/gpu/drm/msm/msm_gem.h             |   4 +-
 drivers/gpu/drm/msm/msm_gem_submit.c      |   4 +-
 drivers/gpu/drm/msm/msm_gpu.c             |   8 +-
 drivers/gpu/drm/msm/msm_gpu.h             |   1 -
 drivers/gpu/drm/msm/msm_kms.h             |   3 +
 22 files changed, 184 insertions(+), 133 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 2074f64..546280c 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -415,7 +415,7 @@ static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
 	}
 
 	if (iova) {
-		int ret = msm_gem_get_iova(bo, gpu->id, iova);
+		int ret = msm_gem_get_iova(bo, gpu->aspace, iova);
 
 		if (ret) {
 			drm_gem_object_unreference_unlocked(bo);
@@ -757,19 +757,19 @@ static void a5xx_destroy(struct msm_gpu *gpu)
 
 	if (a5xx_gpu->pm4_bo) {
 		if (a5xx_gpu->pm4_iova)
-			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->pm4_bo);
 	}
 
 	if (a5xx_gpu->pfp_bo) {
 		if (a5xx_gpu->pfp_iova)
-			msm_gem_put_iova(a5xx_gpu->pfp_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->pfp_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->pfp_bo);
 	}
 
 	if (a5xx_gpu->gpmu_bo) {
 		if (a5xx_gpu->gpmu_bo)
-			msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
 	}
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index ed0802e..2fdee44 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -301,7 +301,8 @@ void a5xx_gpmu_ucode_init(struct msm_gpu *gpu)
 	if (IS_ERR(a5xx_gpu->gpmu_bo))
 		goto err;
 
-	if (msm_gem_get_iova(a5xx_gpu->gpmu_bo, gpu->id, &a5xx_gpu->gpmu_iova))
+	if (msm_gem_get_iova(a5xx_gpu->gpmu_bo, gpu->aspace,
+		&a5xx_gpu->gpmu_iova))
 		goto err;
 
 	ptr = msm_gem_get_vaddr(a5xx_gpu->gpmu_bo);
@@ -330,7 +331,7 @@ void a5xx_gpmu_ucode_init(struct msm_gpu *gpu)
 
 err:
 	if (a5xx_gpu->gpmu_iova)
-		msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+		msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->aspace);
 	if (a5xx_gpu->gpmu_bo)
 		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index acb685a..247f017 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -61,7 +61,7 @@ int adreno_hw_init(struct msm_gpu *gpu)
 
 	DBG("%s", gpu->name);
 
-	ret = msm_gem_get_iova(gpu->rb->bo, gpu->id, &gpu->rb_iova);
+	ret = msm_gem_get_iova(gpu->rb->bo, gpu->aspace, &gpu->rb_iova);
 	if (ret) {
 		gpu->rb_iova = 0;
 		dev_err(gpu->dev->dev, "could not map ringbuffer: %d\n", ret);
@@ -411,7 +411,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		return -ENOMEM;
 	}
 
-	ret = msm_gem_get_iova(adreno_gpu->memptrs_bo, gpu->id,
+	ret = msm_gem_get_iova(adreno_gpu->memptrs_bo, gpu->aspace,
 			&adreno_gpu->memptrs_iova);
 	if (ret) {
 		dev_err(drm->dev, "could not map memptrs: %d\n", ret);
@@ -430,7 +430,7 @@ void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
 			msm_gem_put_vaddr(adreno_gpu->memptrs_bo);
 
 		if (adreno_gpu->memptrs_iova)
-			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->id);
+			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->aspace);
 
 		drm_gem_object_unreference_unlocked(adreno_gpu->memptrs_bo);
 	}
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 3819fde..0b24b37 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
 #include <linux/regmap.h>
 #include <video/mipi_display.h>
 
+#include "msm_kms.h"
 #include "dsi.h"
 #include "dsi.xml.h"
 #include "sfpb.xml.h"
@@ -980,6 +981,7 @@ static void dsi_wait4video_eng_busy(struct msm_dsi_host *msm_host)
 static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 {
 	struct drm_device *dev = msm_host->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 	const struct msm_dsi_cfg_handler *cfg_hnd = msm_host->cfg_hnd;
 	int ret;
 	uint64_t iova;
@@ -996,7 +998,13 @@ static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 			return ret;
 		}
 
-		ret = msm_gem_get_iova_locked(msm_host->tx_gem_obj, 0, &iova);
+		if (!priv->kms) {
+			pr_err("%s: No KMS is initalized\n", __func__);
+			return -ENODEV;
+		}
+
+		ret = msm_gem_get_iova_locked(msm_host->tx_gem_obj,
+			priv->kms->aspace, &iova);
 		mutex_unlock(&dev->struct_mutex);
 		if (ret) {
 			pr_err("%s: failed to get iova, %d\n", __func__, ret);
@@ -1028,9 +1036,12 @@ static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 static void dsi_tx_buf_free(struct msm_dsi_host *msm_host)
 {
 	struct drm_device *dev = msm_host->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 
 	if (msm_host->tx_gem_obj) {
-		msm_gem_put_iova(msm_host->tx_gem_obj, 0);
+		if (priv->kms)
+			msm_gem_put_iova(msm_host->tx_gem_obj,
+				priv->kms->aspace);
 		mutex_lock(&dev->struct_mutex);
 		msm_gem_free_object(msm_host->tx_gem_obj);
 		msm_host->tx_gem_obj = NULL;
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
index 1c29618..1dfad91 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
@@ -133,7 +133,7 @@ static void unref_cursor_worker(struct drm_flip_work *work, void *val)
 		container_of(work, struct mdp4_crtc, unref_cursor_work);
 	struct mdp4_kms *mdp4_kms = get_kms(&mdp4_crtc->base);
 
-	msm_gem_put_iova(val, mdp4_kms->id);
+	msm_gem_put_iova(val, mdp4_kms->base.base.aspace);
 	drm_gem_object_unreference_unlocked(val);
 }
 
@@ -378,7 +378,8 @@ static void update_cursor(struct drm_crtc *crtc)
 		if (next_bo) {
 			/* take a obj ref + iova ref when we start scanning out: */
 			drm_gem_object_reference(next_bo);
-			msm_gem_get_iova_locked(next_bo, mdp4_kms->id, &iova);
+			msm_gem_get_iova_locked(next_bo,
+				mdp4_kms->base.base.aspace, &iova);
 
 			/* enable cursor: */
 			mdp4_write(mdp4_kms, REG_MDP4_DMA_CURSOR_SIZE(dma),
@@ -435,7 +436,8 @@ static int mdp4_crtc_cursor_set(struct drm_crtc *crtc,
 	}
 
 	if (cursor_bo) {
-		ret = msm_gem_get_iova(cursor_bo, mdp4_kms->id, &iova);
+		ret = msm_gem_get_iova(cursor_bo, mdp4_kms->base.base.aspace,
+			&iova);
 		if (ret)
 			goto fail;
 	} else {
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
index dbdbc2a..50144db 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
@@ -160,7 +160,10 @@ static void mdp4_destroy(struct msm_kms *kms)
 {
 	struct mdp4_kms *mdp4_kms = to_mdp4_kms(to_mdp_kms(kms));
 	struct device *dev = mdp4_kms->dev->dev;
-	struct msm_gem_address_space *aspace = mdp4_kms->aspace;
+	struct msm_gem_address_space *aspace = kms->aspace;
+
+	if (mdp4_kms->blank_cursor_iova)
+		msm_gem_put_iova(mdp4_kms->blank_cursor_bo, aspace);
 
 	if (aspace) {
 		aspace->mmu->funcs->detach(aspace->mmu,
@@ -168,8 +171,6 @@ static void mdp4_destroy(struct msm_kms *kms)
 		msm_gem_address_space_destroy(aspace);
 	}
 
-	if (mdp4_kms->blank_cursor_iova)
-		msm_gem_put_iova(mdp4_kms->blank_cursor_bo, mdp4_kms->id);
 	drm_gem_object_unreference_unlocked(mdp4_kms->blank_cursor_bo);
 
 	if (mdp4_kms->rpm_enabled)
@@ -540,7 +541,7 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 			goto fail;
 		}
 
-		mdp4_kms->aspace = aspace;
+		kms->aspace = aspace;
 
 		ret = aspace->mmu->funcs->attach(aspace->mmu, iommu_ports,
 				ARRAY_SIZE(iommu_ports));
@@ -552,13 +553,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 		aspace = NULL;
 	}
 
-	mdp4_kms->id = msm_register_address_space(dev, aspace);
-	if (mdp4_kms->id < 0) {
-		ret = mdp4_kms->id;
-		dev_err(dev->dev, "failed to register mdp4 iommu: %d\n", ret);
-		goto fail;
-	}
-
 	ret = modeset_init(mdp4_kms);
 	if (ret) {
 		dev_err(dev->dev, "modeset_init failed: %d\n", ret);
@@ -575,7 +569,7 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 		goto fail;
 	}
 
-	ret = msm_gem_get_iova(mdp4_kms->blank_cursor_bo, mdp4_kms->id,
+	ret = msm_gem_get_iova(mdp4_kms->blank_cursor_bo, kms->aspace,
 			&mdp4_kms->blank_cursor_iova);
 	if (ret) {
 		dev_err(dev->dev, "could not pin blank-cursor bo: %d\n", ret);
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
index 62712ca..e9a129d 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
@@ -33,8 +33,6 @@ struct mdp4_kms {
 	int rev;
 
 	/* mapper-id used to request GEM buffer mapped for scanout: */
-	int id;
-
 	void __iomem *mmio;
 
 	struct regulator *vdd;
@@ -43,7 +41,6 @@ struct mdp4_kms {
 	struct clk *pclk;
 	struct clk *lut_clk;
 	struct clk *axi_clk;
-	struct msm_gem_address_space *aspace;
 
 	struct mdp_irq error_handler;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
index 3903dbc..88f3b86 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
@@ -109,7 +109,7 @@ static int mdp4_plane_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	DBG("%s: prepare: FB[%u]", mdp4_plane->name, fb->base.id);
-	return msm_framebuffer_prepare(fb, mdp4_kms->id);
+	return msm_framebuffer_prepare(fb, mdp4_kms->base.base.aspace);
 }
 
 static void mdp4_plane_cleanup_fb(struct drm_plane *plane,
@@ -123,7 +123,7 @@ static void mdp4_plane_cleanup_fb(struct drm_plane *plane,
 		return;
 
 	DBG("%s: cleanup: FB[%u]", mdp4_plane->name, fb->base.id);
-	msm_framebuffer_cleanup(fb, mdp4_kms->id);
+	msm_framebuffer_cleanup(fb, mdp4_kms->base.base.aspace);
 }
 
 
@@ -161,6 +161,7 @@ static void mdp4_plane_set_scanout(struct drm_plane *plane,
 {
 	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
 	struct mdp4_kms *mdp4_kms = get_kms(plane);
+	struct msm_kms *kms = &mdp4_kms->base.base;
 	enum mdp4_pipe pipe = mdp4_plane->pipe;
 
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRC_STRIDE_A(pipe),
@@ -172,13 +173,13 @@ static void mdp4_plane_set_scanout(struct drm_plane *plane,
 			MDP4_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
 
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP0_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 0));
+			msm_framebuffer_iova(fb, kms->aspace, 0));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP1_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 1));
+			msm_framebuffer_iova(fb, kms->aspace, 1));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP2_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 2));
+			msm_framebuffer_iova(fb, kms->aspace, 2));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP3_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 3));
+			msm_framebuffer_iova(fb, kms->aspace, 3));
 
 	plane->fb = fb;
 }
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
index 8e155a5..17e7c26 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
@@ -165,7 +165,7 @@ static void unref_cursor_worker(struct drm_flip_work *work, void *val)
 		container_of(work, struct mdp5_crtc, unref_cursor_work);
 	struct mdp5_kms *mdp5_kms = get_kms(&mdp5_crtc->base);
 
-	msm_gem_put_iova(val, mdp5_kms->id);
+	msm_gem_put_iova(val, mdp5_kms->base.base.aspace);
 	drm_gem_object_unreference_unlocked(val);
 }
 
@@ -520,7 +520,8 @@ static int mdp5_crtc_cursor_set(struct drm_crtc *crtc,
 	if (!cursor_bo)
 		return -ENOENT;
 
-	ret = msm_gem_get_iova(cursor_bo, mdp5_kms->id, &cursor_addr);
+	ret = msm_gem_get_iova(cursor_bo, mdp5_kms->base.base.aspace,
+		&cursor_addr);
 	if (ret)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
index 1142a9c..93beb11 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
@@ -151,7 +151,7 @@ static int mdp5_set_split_display(struct msm_kms *kms,
 static void mdp5_kms_destroy(struct msm_kms *kms)
 {
 	struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
-	struct msm_gem_address_space *aspace = mdp5_kms->aspace;
+	struct msm_gem_address_space *aspace = kms->aspace;
 	int i;
 
 	for (i = 0; i < mdp5_kms->num_hwpipes; i++)
@@ -707,7 +707,7 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 			goto fail;
 		}
 
-		mdp5_kms->aspace = aspace;
+		kms->aspace = aspace;
 
 		ret = aspace->mmu->funcs->attach(aspace->mmu, iommu_ports,
 				ARRAY_SIZE(iommu_ports));
@@ -722,13 +722,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 		aspace = NULL;;
 	}
 
-	mdp5_kms->id = msm_register_address_space(dev, aspace);
-	if (mdp5_kms->id < 0) {
-		ret = mdp5_kms->id;
-		dev_err(&pdev->dev, "failed to register mdp5 iommu: %d\n", ret);
-		goto fail;
-	}
-
 	ret = modeset_init(mdp5_kms);
 	if (ret) {
 		dev_err(&pdev->dev, "modeset_init failed: %d\n", ret);
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
index 2e5a768..1020c38 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
@@ -48,10 +48,6 @@ struct mdp5_kms {
 	struct mdp5_state *state;
 	struct drm_modeset_lock state_lock;
 
-	/* mapper-id used to request GEM buffer mapped for scanout: */
-	int id;
-	struct msm_gem_address_space *aspace;
-
 	struct mdp5_smp *smp;
 	struct mdp5_ctl_manager *ctlm;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
index 19fc997..2d52d4f 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
@@ -259,7 +259,7 @@ static int mdp5_plane_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	DBG("%s: prepare: FB[%u]", plane->name, fb->base.id);
-	return msm_framebuffer_prepare(fb, mdp5_kms->id);
+	return msm_framebuffer_prepare(fb, mdp5_kms->base.base.aspace);
 }
 
 static void mdp5_plane_cleanup_fb(struct drm_plane *plane,
@@ -272,7 +272,7 @@ static void mdp5_plane_cleanup_fb(struct drm_plane *plane,
 		return;
 
 	DBG("%s: cleanup: FB[%u]", plane->name, fb->base.id);
-	msm_framebuffer_cleanup(fb, mdp5_kms->id);
+	msm_framebuffer_cleanup(fb, mdp5_kms->base.base.aspace);
 }
 
 static int mdp5_plane_atomic_check(struct drm_plane *plane,
@@ -391,6 +391,7 @@ static void set_scanout_locked(struct drm_plane *plane,
 		struct drm_framebuffer *fb)
 {
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
+	struct msm_kms *kms = &mdp5_kms->base.base;
 	struct mdp5_hw_pipe *hwpipe = to_mdp5_plane_state(plane->state)->hwpipe;
 	enum mdp5_pipe pipe = hwpipe->pipe;
 
@@ -403,13 +404,13 @@ static void set_scanout_locked(struct drm_plane *plane,
 			MDP5_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
 
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC0_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 0));
+			msm_framebuffer_iova(fb, kms->aspace, 0));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC1_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 1));
+			msm_framebuffer_iova(fb, kms->aspace, 1));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC2_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 2));
+			msm_framebuffer_iova(fb, kms->aspace, 2));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC3_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 3));
+			msm_framebuffer_iova(fb, kms->aspace, 3));
 
 	plane->fb = fb;
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 1e4e022..1dbcd61 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -51,20 +51,6 @@ static void msm_fb_output_poll_changed(struct drm_device *dev)
 	.atomic_state_free = msm_atomic_state_free,
 };
 
-int msm_register_address_space(struct drm_device *dev,
-		struct msm_gem_address_space *aspace)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-	int idx = priv->num_aspaces++;
-
-	if (WARN_ON(idx >= ARRAY_SIZE(priv->aspace)))
-		return -EINVAL;
-
-	priv->aspace[idx] = aspace;
-
-	return idx;
-}
-
 #ifdef CONFIG_DRM_MSM_REGISTER_LOGGING
 static bool reglog = false;
 MODULE_PARM_DESC(reglog, "Enable register read/write logging");
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index ed4dad3..0c1a49e 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -183,9 +183,6 @@ int msm_atomic_commit(struct drm_device *dev,
 void msm_atomic_state_clear(struct drm_atomic_state *state);
 void msm_atomic_state_free(struct drm_atomic_state *state);
 
-int msm_register_address_space(struct drm_device *dev,
-		struct msm_gem_address_space *aspace);
-
 void msm_gem_unmap_vma(struct msm_gem_address_space *aspace,
 		struct msm_gem_vma *vma, struct sg_table *sgt);
 int msm_gem_map_vma(struct msm_gem_address_space *aspace,
@@ -208,13 +205,16 @@ int msm_gem_mmap_obj(struct drm_gem_object *obj,
 int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
 int msm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj);
-int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
-		uint64_t *iova);
-int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova);
-uint64_t msm_gem_iova(struct drm_gem_object *obj, int id);
+int msm_gem_get_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+int msm_gem_get_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+uint64_t msm_gem_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
 struct page **msm_gem_get_pages(struct drm_gem_object *obj);
 void msm_gem_put_pages(struct drm_gem_object *obj);
-void msm_gem_put_iova(struct drm_gem_object *obj, int id);
+void msm_gem_put_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
 int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
 		struct drm_mode_create_dumb *args);
 int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
@@ -249,9 +249,12 @@ struct drm_gem_object *msm_gem_new(struct drm_device *dev,
 struct drm_gem_object *msm_gem_import(struct drm_device *dev,
 		struct dma_buf *dmabuf, struct sg_table *sgt);
 
-int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id);
-void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id);
-uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane);
+int msm_framebuffer_prepare(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace);
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace);
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace, int plane);
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane);
 const struct msm_format *msm_framebuffer_format(struct drm_framebuffer *fb);
 struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev,
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index 9acf544..cedadbf 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -84,14 +84,15 @@ void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m)
  * should be fine, since only the scanout (mdpN) side of things needs
  * this, the gpu doesn't care about fb's.
  */
-int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id)
+int msm_framebuffer_prepare(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	int ret, i, n = drm_format_num_planes(fb->pixel_format);
 	uint64_t iova;
 
 	for (i = 0; i < n; i++) {
-		ret = msm_gem_get_iova(msm_fb->planes[i], id, &iova);
+		ret = msm_gem_get_iova(msm_fb->planes[i], aspace, &iova);
 		DBG("FB[%u]: iova[%d]: %08llx (%d)", fb->base.id, i, iova, ret);
 		if (ret)
 			return ret;
@@ -100,21 +101,23 @@ int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id)
 	return 0;
 }
 
-void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id)
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	int i, n = drm_format_num_planes(fb->pixel_format);
 
 	for (i = 0; i < n; i++)
-		msm_gem_put_iova(msm_fb->planes[i], id);
+		msm_gem_put_iova(msm_fb->planes[i], aspace);
 }
 
-uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane)
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace, int plane)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	if (!msm_fb->planes[plane])
 		return 0;
-	return msm_gem_iova(msm_fb->planes[plane], id) + fb->offsets[plane];
+	return msm_gem_iova(msm_fb->planes[plane], aspace) + fb->offsets[plane];
 }
 
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane)
diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index e8f41eb..0b5b839 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -20,6 +20,7 @@
 #include "drm_crtc.h"
 #include "drm_fb_helper.h"
 #include "msm_gem.h"
+#include "msm_kms.h"
 
 extern int msm_gem_mmap_obj(struct drm_gem_object *obj,
 					struct vm_area_struct *vma);
@@ -78,6 +79,7 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
 {
 	struct msm_fbdev *fbdev = to_msm_fbdev(helper);
 	struct drm_device *dev = helper->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 	struct drm_framebuffer *fb = NULL;
 	struct fb_info *fbi = NULL;
 	struct drm_mode_fb_cmd2 mode_cmd = {0};
@@ -129,7 +131,13 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
 	 * in panic (ie. lock-safe, etc) we could avoid pinning the
 	 * buffer now:
 	 */
-	ret = msm_gem_get_iova_locked(fbdev->bo, 0, &paddr);
+
+	if (!priv->kms) {
+		ret = -ENODEV;
+		goto fail_unlock;
+	}
+
+	ret = msm_gem_get_iova_locked(fbdev->bo, priv->kms->aspace, &paddr);
 	if (ret) {
 		dev_err(dev->dev, "failed to get buffer obj iova: %d\n", ret);
 		goto fail_unlock;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index cd06cfd..a428377 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -289,18 +289,49 @@ uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj)
 put_iova(struct drm_gem_object *obj)
 {
 	struct drm_device *dev = obj->dev;
-	struct msm_drm_private *priv = obj->dev->dev_private;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	int id;
+	struct msm_gem_vma *domain, *tmp;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	for (id = 0; id < ARRAY_SIZE(msm_obj->domain); id++) {
-		msm_gem_unmap_vma(priv->aspace[id],
-				&msm_obj->domain[id], msm_obj->sgt);
+	list_for_each_entry_safe(domain, tmp, &msm_obj->domains, list) {
+		if (iommu_present(&platform_bus_type))
+			msm_gem_unmap_vma(domain->aspace, domain, msm_obj->sgt);
+		list_del(&domain->list);
+		kfree(domain);
 	}
 }
 
+static struct msm_gem_vma *obj_add_domain(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+
+	if (!domain)
+		return ERR_PTR(-ENOMEM);
+
+	domain->aspace = aspace;
+
+	list_add_tail(&domain->list, &msm_obj->domains);
+
+	return domain;
+}
+
+static struct msm_gem_vma *obj_get_domain(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain;
+
+	list_for_each_entry(domain, &msm_obj->domains, list) {
+		if (domain->aspace == aspace)
+			return domain;
+	}
+
+	return NULL;
+}
+
 /* should be called under struct_mutex.. although it can be called
  * from atomic context without struct_mutex to acquire an extra
  * iova ref if you know one is already held.
@@ -308,49 +339,60 @@ uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj)
  * That means when I do eventually need to add support for unpinning
  * the refcnt counter needs to be atomic_t.
  */
-int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
-		uint64_t *iova)
+int msm_gem_get_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct page **pages;
+	struct msm_gem_vma *domain;
 	int ret = 0;
 
-	if (!msm_obj->domain[id].iova) {
-		struct msm_drm_private *priv = obj->dev->dev_private;
-		struct page **pages = get_pages(obj);
+	if (!iommu_present(&platform_bus_type)) {
+		pages = get_pages(obj);
 
 		if (IS_ERR(pages))
 			return PTR_ERR(pages);
 
-		if (iommu_present(&platform_bus_type)) {
-			ret = msm_gem_map_vma(priv->aspace[id], &msm_obj->domain[id],
-					msm_obj->sgt, obj->size >> PAGE_SHIFT);
-		} else {
-			msm_obj->domain[id].iova = physaddr(obj);
-		}
+		*iova = physaddr(obj);
+		return 0;
+	}
+
+	domain = obj_get_domain(obj, aspace);
+
+	if (!domain) {
+		domain = obj_add_domain(obj, aspace);
+		if (IS_ERR(domain))
+			return  PTR_ERR(domain);
+
+		pages = get_pages(obj);
+		if (IS_ERR(pages))
+			return PTR_ERR(pages);
+
+		ret = msm_gem_map_vma(aspace, domain, msm_obj->sgt,
+			obj->size >> PAGE_SHIFT);
 	}
 
 	if (!ret)
-		*iova = msm_obj->domain[id].iova;
+		*iova = domain->iova;
 
 	return ret;
 }
 
 /* get iova, taking a reference.  Should have a matching put */
-int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova)
+int msm_gem_get_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain;
 	int ret;
 
-	/* this is safe right now because we don't unmap until the
-	 * bo is deleted:
-	 */
-	if (msm_obj->domain[id].iova) {
-		*iova = msm_obj->domain[id].iova;
+	domain = obj_get_domain(obj, aspace);
+	if (domain) {
+		*iova = domain->iova;
 		return 0;
 	}
 
 	mutex_lock(&obj->dev->struct_mutex);
-	ret = msm_gem_get_iova_locked(obj, id, iova);
+	ret = msm_gem_get_iova_locked(obj, aspace, iova);
 	mutex_unlock(&obj->dev->struct_mutex);
 	return ret;
 }
@@ -358,14 +400,18 @@ int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova)
 /* get iova without taking a reference, used in places where you have
  * already done a 'msm_gem_get_iova()'.
  */
-uint64_t msm_gem_iova(struct drm_gem_object *obj, int id)
+uint64_t msm_gem_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	WARN_ON(!msm_obj->domain[id].iova);
-	return msm_obj->domain[id].iova;
+	struct msm_gem_vma *domain = obj_get_domain(obj, aspace);
+
+	WARN_ON(!domain);
+
+	return domain ? domain->iova : 0;
 }
 
-void msm_gem_put_iova(struct drm_gem_object *obj, int id)
+void msm_gem_put_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
 {
 	// XXX TODO ..
 	// NOTE: probably don't need a _locked() version.. we wouldn't
@@ -619,11 +665,10 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct reservation_object *robj = msm_obj->resv;
 	struct reservation_object_list *fobj;
-	struct msm_drm_private *priv = obj->dev->dev_private;
 	struct dma_fence *fence;
 	uint64_t off = drm_vma_node_start(&obj->vma_node);
 	const char *madv;
-	unsigned id;
+	struct msm_gem_vma *domain;
 
 	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
 
@@ -645,8 +690,9 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 			obj->name, obj->refcount.refcount.counter,
 			off, msm_obj->vaddr);
 
-	for (id = 0; id < priv->num_aspaces; id++)
-		seq_printf(m, " %08llx", msm_obj->domain[id].iova);
+	/* FIXME: we need to print the address space here too */
+	list_for_each_entry(domain, &msm_obj->domains, list)
+		seq_printf(m, " %08llx", domain->iova);
 
 	seq_printf(m, " %zu%s\n", obj->size, madv);
 
@@ -781,8 +827,12 @@ static int msm_gem_new_impl(struct drm_device *dev,
 	if (!msm_obj)
 		return -ENOMEM;
 
-	if (use_vram)
-		msm_obj->vram_node = &msm_obj->domain[0].node;
+	if (use_vram) {
+		struct msm_gem_vma *domain = obj_add_domain(&msm_obj->base, 0);
+		/* FIXME: Error here? */
+		if (domain)
+			msm_obj->vram_node = &domain->node;
+	}
 
 	msm_obj->flags = flags;
 	msm_obj->madv = MSM_MADV_WILLNEED;
@@ -795,6 +845,8 @@ static int msm_gem_new_impl(struct drm_device *dev,
 	}
 
 	INIT_LIST_HEAD(&msm_obj->submit_entry);
+	INIT_LIST_HEAD(&msm_obj->domains);
+
 	list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
 
 	*obj = &msm_obj->base;
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7d52951..40cd0b6 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -35,7 +35,9 @@ struct msm_gem_address_space {
 
 struct msm_gem_vma {
 	struct drm_mm_node node;
+	struct msm_gem_address_space *aspace;
 	uint64_t iova;
+	struct list_head list;
 };
 
 struct msm_gem_object {
@@ -75,7 +77,7 @@ struct msm_gem_object {
 	struct sg_table *sgt;
 	void *vaddr;
 
-	struct msm_gem_vma domain[NUM_DOMAINS];
+	struct list_head domains;
 
 	/* normally (resv == &_resv) except for imported bo's */
 	struct reservation_object *resv;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 4896765..6fd7446 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -158,7 +158,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit, int i)
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_put_iova(&msm_obj->base, submit->gpu->id);
+		msm_gem_put_iova(&msm_obj->base, submit->gpu->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		ww_mutex_unlock(&msm_obj->resv->lock);
@@ -246,7 +246,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 
 		/* if locking succeeded, pin bo: */
 		ret = msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->id, &iova);
+				submit->gpu->aspace, &iova);
 
 		if (ret)
 			break;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 403cca1..d336c24 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -458,7 +458,7 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		struct msm_gem_object *msm_obj = submit->bos[i].obj;
 		/* move to inactive: */
 		msm_gem_move_to_inactive(&msm_obj->base);
-		msm_gem_put_iova(&msm_obj->base, gpu->id);
+		msm_gem_put_iova(&msm_obj->base, gpu->aspace);
 		drm_gem_object_unreference(&msm_obj->base);
 	}
 
@@ -539,7 +539,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_reference(&msm_obj->base);
 		msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->id, &iova);
+				submit->gpu->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			msm_gem_move_to_active(&msm_obj->base, gpu, true, submit->fence);
@@ -675,8 +675,6 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	} else {
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
 	}
-	gpu->id = msm_register_address_space(drm, gpu->aspace);
-
 
 	/* Create ringbuffer: */
 	mutex_lock(&drm->struct_mutex);
@@ -707,7 +705,7 @@ void msm_gpu_cleanup(struct msm_gpu *gpu)
 
 	if (gpu->rb) {
 		if (gpu->rb_iova)
-			msm_gem_put_iova(gpu->rb->bo, gpu->id);
+			msm_gem_put_iova(gpu->rb->bo, gpu->aspace);
 		msm_ringbuffer_destroy(gpu->rb);
 	}
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 267723f..ad6d13a 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -98,7 +98,6 @@ struct msm_gpu {
 	int irq;
 
 	struct msm_gem_address_space *aspace;
-	int id;
 
 	/* Power Control: */
 	struct regulator *gpu_reg, *gpu_cx;
diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
index e470f4c..722a250 100644
--- a/drivers/gpu/drm/msm/msm_kms.h
+++ b/drivers/gpu/drm/msm/msm_kms.h
@@ -70,6 +70,9 @@ struct msm_kms {
 
 	/* irq number to be passed on to drm_irq_install */
 	int irq;
+
+	/* mapper-id used to request GEM buffer mapped for scanout: */
+	struct msm_gem_address_space *aspace;
 };
 
 /**
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init()
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (2 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 12 ++++++++++--
 drivers/gpu/drm/msm/msm_gpu.c           | 13 ++++++-------
 drivers/gpu/drm/msm/msm_gpu.h           | 11 ++++++++++-
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 247f017..53f9dea 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -344,6 +344,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu, const struct adreno_gpu_funcs *funcs)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
+	struct msm_gpu_config adreno_gpu_config  = { 0 };
 	struct msm_gpu *gpu = &adreno_gpu->base;
 	int ret;
 
@@ -364,9 +365,16 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	DBG("fast_rate=%u, slow_rate=%u, bus_freq=%u",
 			gpu->fast_rate, gpu->slow_rate, gpu->bus_freq);
 
+	adreno_gpu_config.ioname = "kgsl_3d0_reg_memory";
+	adreno_gpu_config.irqname = "kgsl_3d0_irq";
+
+	adreno_gpu_config.va_start = SZ_16M;
+	adreno_gpu_config.va_end = 0xffffffff;
+
+	adreno_gpu_config.ringsz = RB_SIZE;
+
 	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
-			adreno_gpu->info->name, "kgsl_3d0_reg_memory", "kgsl_3d0_irq",
-			RB_SIZE);
+			adreno_gpu->info->name, &adreno_gpu_config);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index d336c24..bc75425 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -570,7 +570,7 @@ static irqreturn_t irq_handler(int irq, void *data)
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
-		const char *name, const char *ioname, const char *irqname, int ringsz)
+		const char *name, struct msm_gpu_config *config)
 {
 	struct iommu_domain *iommu;
 	int i, ret;
@@ -606,14 +606,14 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	BUG_ON(ARRAY_SIZE(clk_names) != ARRAY_SIZE(gpu->grp_clks));
 
 	/* Map registers: */
-	gpu->mmio = msm_ioremap(pdev, ioname, name);
+	gpu->mmio = msm_ioremap(pdev, config->ioname, name);
 	if (IS_ERR(gpu->mmio)) {
 		ret = PTR_ERR(gpu->mmio);
 		goto fail;
 	}
 
 	/* Get Interrupt: */
-	gpu->irq = platform_get_irq_byname(pdev, irqname);
+	gpu->irq = platform_get_irq_byname(pdev, config->irqname);
 	if (gpu->irq < 0) {
 		ret = gpu->irq;
 		dev_err(drm->dev, "failed to get irq: %d\n", ret);
@@ -657,9 +657,8 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	 */
 	iommu = iommu_domain_alloc(&platform_bus_type);
 	if (iommu) {
-		/* TODO 32b vs 64b address space.. */
-		iommu->geometry.aperture_start = SZ_16M;
-		iommu->geometry.aperture_end = 0xffffffff;
+		iommu->geometry.aperture_start = config->va_start;
+		iommu->geometry.aperture_end = config->va_end;
 
 		dev_info(drm->dev, "%s: using IOMMU\n", name);
 		gpu->aspace = msm_gem_address_space_create(&pdev->dev,
@@ -678,7 +677,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 
 	/* Create ringbuffer: */
 	mutex_lock(&drm->struct_mutex);
-	gpu->rb = msm_ringbuffer_new(gpu, ringsz);
+	gpu->rb = msm_ringbuffer_new(gpu, config->ringsz);
 	mutex_unlock(&drm->struct_mutex);
 	if (IS_ERR(gpu->rb)) {
 		ret = PTR_ERR(gpu->rb);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index ad6d13a..cc6530f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -28,6 +28,14 @@
 struct msm_gem_submit;
 struct msm_gpu_perfcntr;
 
+struct msm_gpu_config {
+	const char *ioname;
+	const char *irqname;
+	uint64_t va_start;
+	uint64_t va_end;
+	unsigned int ringsz;
+};
+
 /* So far, with hardware that I've seen to date, we can have:
  *  + zero, one, or two z180 2d cores
  *  + a3xx or a2xx 3d core, which share a common CP (the firmware
@@ -205,7 +213,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
-		const char *name, const char *ioname, const char *irqname, int ringsz);
+		const char *name, struct msm_gpu_config *config);
+
 void msm_gpu_cleanup(struct msm_gpu *gpu);
 
 struct msm_gpu *adreno_load_gpu(struct drm_device *dev);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/11] drm/msm: Remove memptrs->wptr
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (3 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

memptrs->wptr seems to be unused. Remove it to avoid
confusing the upcoming preemption code.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 3 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 53f9dea..4c3e9b3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -123,7 +123,6 @@ void adreno_recover(struct msm_gpu *gpu)
 	/* reset completed fence seqno: */
 	adreno_gpu->memptrs->fence = gpu->fctx->completed_fence;
 	adreno_gpu->memptrs->rptr  = 0;
-	adreno_gpu->memptrs->wptr  = 0;
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -256,7 +255,6 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 	seq_printf(m, "fence:    %d/%d\n", adreno_gpu->memptrs->fence,
 			gpu->fctx->last_fence);
 	seq_printf(m, "rptr:     %d\n", get_rptr(adreno_gpu));
-	seq_printf(m, "wptr:     %d\n", adreno_gpu->memptrs->wptr);
 	seq_printf(m, "rb wptr:  %d\n", get_wptr(gpu->rb));
 
 	gpu->funcs->pm_resume(gpu);
@@ -296,7 +294,6 @@ void adreno_dump_info(struct msm_gpu *gpu)
 	printk("fence:    %d/%d\n", adreno_gpu->memptrs->fence,
 			gpu->fctx->last_fence);
 	printk("rptr:     %d\n", get_rptr(adreno_gpu));
-	printk("wptr:     %d\n", adreno_gpu->memptrs->wptr);
 	printk("rb wptr:  %d\n", get_wptr(gpu->rb));
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index e8d55b0..fdf4ef3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -85,7 +85,6 @@ struct adreno_info {
 
 struct adreno_rbmemptrs {
 	volatile uint32_t rptr;
-	volatile uint32_t wptr;
 	volatile uint32_t fence;
 };
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/11] drm/msm: Support multiple ringbuffers
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (4 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.

The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifer for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.

The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   9 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   9 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  45 +++++-----
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_power.c |   6 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 155 +++++++++++++++++++++-----------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  36 +++++---
 drivers/gpu/drm/msm/msm_drv.h           |   2 +
 drivers/gpu/drm/msm/msm_fence.c         |  85 +++++++++++++-----
 drivers/gpu/drm/msm/msm_fence.h         |  13 +--
 drivers/gpu/drm/msm/msm_gem.h           |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c    |  10 ++-
 drivers/gpu/drm/msm/msm_gpu.c           | 122 ++++++++++++++++---------
 drivers/gpu/drm/msm/msm_gpu.h           |  38 ++++++--
 drivers/gpu/drm/msm/msm_ringbuffer.c    |  13 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.h    |   8 +-
 include/uapi/drm/msm_drm.h              |   5 ++
 17 files changed, 373 insertions(+), 186 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index fc4fd2d..2f72848 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -44,7 +44,7 @@
 
 static bool a3xx_me_init(struct msm_gpu *gpu)
 {
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT3(ring, CP_ME_INIT, 17);
 	OUT_RING(ring, 0x000003f7);
@@ -65,7 +65,7 @@ static bool a3xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 	return a3xx_idle(gpu);
 }
 
@@ -339,7 +339,7 @@ static void a3xx_destroy(struct msm_gpu *gpu)
 static bool a3xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, gpu->rb[0]))
 		return false;
 
 	/* then wait for GPU to finish: */
@@ -449,6 +449,7 @@ static void a3xx_dump(struct msm_gpu *gpu)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a3xx_irq,
 		.destroy = a3xx_destroy,
 #ifdef CONFIG_DEBUG_FS
@@ -496,7 +497,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 6bc948b..bdd2a24 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -116,7 +116,7 @@ static void a4xx_enable_hwcg(struct msm_gpu *gpu)
 
 static bool a4xx_me_init(struct msm_gpu *gpu)
 {
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT3(ring, CP_ME_INIT, 17);
 	OUT_RING(ring, 0x000003f7);
@@ -137,7 +137,7 @@ static bool a4xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 	return a4xx_idle(gpu);
 }
 
@@ -337,7 +337,7 @@ static void a4xx_destroy(struct msm_gpu *gpu)
 static bool a4xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, gpu->rb[0]))
 		return false;
 
 	/* then wait for GPU to finish: */
@@ -539,6 +539,7 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a4xx_irq,
 		.destroy = a4xx_destroy,
 #ifdef CONFIG_DEBUG_FS
@@ -580,7 +581,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 546280c..5f02ff3 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -189,7 +189,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct msm_drm_private *priv = gpu->dev->dev_private;
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
 	for (i = 0; i < submit->nr_cmds; i++) {
@@ -214,11 +214,11 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
 	OUT_RING(ring, CACHE_FLUSH_TS | (1 << 31));
-	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, fence)));
-	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, fence)));
+	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, submit->fence->seqno);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 }
 
 struct a5xx_hwcg {
@@ -358,7 +358,7 @@ static void a5xx_enable_hwcg(struct msm_gpu *gpu)
 static int a5xx_me_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT7(ring, CP_ME_INIT, 8);
 
@@ -389,9 +389,8 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
-
-	return a5xx_idle(gpu) ? 0 : -EINVAL;
+	gpu->funcs->flush(gpu, ring);
+	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
 }
 
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
@@ -695,11 +694,11 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	 * ticking correctly
 	 */
 	if (adreno_is_a530(adreno_gpu)) {
-		OUT_PKT7(gpu->rb, CP_EVENT_WRITE, 1);
-		OUT_RING(gpu->rb, 0x0F);
+		OUT_PKT7(gpu->rb[0], CP_EVENT_WRITE, 1);
+		OUT_RING(gpu->rb[0], 0x0F);
 
-		gpu->funcs->flush(gpu);
-		if (!a5xx_idle(gpu))
+		gpu->funcs->flush(gpu, gpu->rb[0]);
+		if (!a5xx_idle(gpu, gpu->rb[0]))
 			return -EINVAL;
 	}
 
@@ -712,11 +711,11 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	 */
 	ret = a5xx_zap_shader_init(gpu);
 	if (!ret) {
-		OUT_PKT7(gpu->rb, CP_SET_SECURE_MODE, 1);
-		OUT_RING(gpu->rb, 0x00000000);
+		OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
+		OUT_RING(gpu->rb[0], 0x00000000);
 
-		gpu->funcs->flush(gpu);
-		if (!a5xx_idle(gpu))
+		gpu->funcs->flush(gpu, gpu->rb[0]);
+		if (!a5xx_idle(gpu, gpu->rb[0]))
 			return -EINVAL;
 	} else {
 		/* Print a warning so if we die, we know why */
@@ -790,15 +789,18 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 		A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT);
 }
 
-bool a5xx_idle(struct msm_gpu *gpu)
+bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	/* wait for CP to drain ringbuffer: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, ring))
 		return false;
 
 	if (spin_until(_a5xx_check_idle(gpu))) {
-		DRM_ERROR("%s: %ps: timeout waiting for GPU to idle: status %8.8X irq %8.8X\n",
-			gpu->name, __builtin_return_address(0),
+		DRM_ERROR(
+			"%s: timeout waiting for GPU RB %d to idle: status %8.8X rptr/wptr: %4.4X/%4.4X irq %8.8X\n",
+			gpu->name, ring->id,
+			gpu_read(gpu, REG_A5XX_CP_RB_RPTR),
+			gpu_read(gpu, REG_A5XX_CP_RB_WPTR),
 			gpu_read(gpu, REG_A5XX_RBBM_STATUS),
 			gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS));
 
@@ -1091,6 +1093,7 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
@@ -1125,7 +1128,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	a5xx_gpu->lm_leakage = 0x4E001A;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 6b20f28..405b563 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -56,6 +56,6 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 	return -ETIMEDOUT;
 }
 
-bool a5xx_idle(struct msm_gpu *gpu);
+bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index 2fdee44..a7d91ac 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -173,7 +173,7 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	if (!a5xx_gpu->gpmu_dwords)
 		return 0;
@@ -192,9 +192,9 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
 	OUT_RING(ring, 1);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 
-	if (!a5xx_idle(gpu)) {
+	if (!a5xx_idle(gpu, ring)) {
 		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
 			gpu->name);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 4c3e9b3..f5c2bad 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -21,7 +21,6 @@
 #include "msm_gem.h"
 #include "msm_mmu.h"
 
-#define RB_SIZE    SZ_32K
 #define RB_BLKSIZE 32
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
@@ -57,32 +56,36 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 int adreno_hw_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	int ret;
+	int i;
 
 	DBG("%s", gpu->name);
 
-	ret = msm_gem_get_iova(gpu->rb->bo, gpu->aspace, &gpu->rb_iova);
-	if (ret) {
-		gpu->rb_iova = 0;
-		dev_err(gpu->dev->dev, "could not map ringbuffer: %d\n", ret);
-		return ret;
+	for (i = 0; i < gpu->nr_rings; i++) {
+		int ret = msm_gem_get_iova(gpu->rb[i]->bo, gpu->aspace,
+			&gpu->rb[i]->iova);
+		if (ret) {
+			gpu->rb[i]->iova = 0;
+			dev_err(gpu->dev->dev,
+				"could not map ringbuffer %d: %d\n", i, ret);
+			return ret;
+		}
 	}
 
 	/* Setup REG_CP_RB_CNTL: */
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_CNTL,
-			/* size is log2(quad-words): */
-			AXXX_CP_RB_CNTL_BUFSZ(ilog2(gpu->rb->size / 8)) |
-			AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
-			(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
+		/* size is log2(quad-words): */
+		AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) |
+		AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
+		(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
 
-	/* Setup ringbuffer address: */
+	/* Setup ringbuffer address - use ringbuffer[0] for GPU init */
 	adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_BASE,
-		REG_ADRENO_CP_RB_BASE_HI, gpu->rb_iova);
+		REG_ADRENO_CP_RB_BASE_HI, gpu->rb[0]->iova);
 
 	if (!adreno_is_a430(adreno_gpu)) {
 		adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
 			REG_ADRENO_CP_RB_RPTR_ADDR_HI,
-			rbmemptr(adreno_gpu, rptr));
+			rbmemptr(adreno_gpu, 0, rptr));
 	}
 
 	return 0;
@@ -94,35 +97,58 @@ static uint32_t get_wptr(struct msm_ringbuffer *ring)
 }
 
 /* Use this helper to read rptr, since a430 doesn't update rptr in memory */
-static uint32_t get_rptr(struct adreno_gpu *adreno_gpu)
+static uint32_t get_rptr(struct adreno_gpu *adreno_gpu,
+		struct msm_ringbuffer *ring)
 {
-	if (adreno_is_a430(adreno_gpu))
-		return adreno_gpu->memptrs->rptr = adreno_gpu_read(
+	if (adreno_is_a430(adreno_gpu)) {
+		/*
+		 * If index is anything but 0 this will probably break horribly,
+		 * but I think that we have enough infrastructure in place to
+		 * ensure that it won't be. If not then this is why your
+		 * a430 stopped working.
+		 */
+		return adreno_gpu->memptrs->rptr[ring->id] = adreno_gpu_read(
 			adreno_gpu, REG_ADRENO_CP_RB_RPTR);
-	else
-		return adreno_gpu->memptrs->rptr;
+	} else
+		return adreno_gpu->memptrs->rptr[ring->id];
 }
 
-uint32_t adreno_last_fence(struct msm_gpu *gpu)
+struct msm_ringbuffer *adreno_active_ring(struct msm_gpu *gpu)
+{
+	return gpu->rb[0];
+}
+
+uint32_t adreno_last_fence(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	return adreno_gpu->memptrs->fence;
+
+	if (!ring)
+		return 0;
+
+	return adreno_gpu->memptrs->fence[ring->id];
 }
 
 void adreno_recover(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct drm_device *dev = gpu->dev;
-	int ret;
+	struct msm_ringbuffer *ring;
+	int ret, i;
 
 	gpu->funcs->pm_suspend(gpu);
 
-	/* reset ringbuffer: */
-	gpu->rb->cur = gpu->rb->start;
+	/* reset ringbuffer(s): */
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		ring->cur = ring->start;
 
-	/* reset completed fence seqno: */
-	adreno_gpu->memptrs->fence = gpu->fctx->completed_fence;
-	adreno_gpu->memptrs->rptr  = 0;
+		/* reset completed fence seqno, discard anything pending: */
+		adreno_gpu->memptrs->fence[ring->id] = ring->completed_fence;
+		adreno_gpu->memptrs->rptr[ring->id]  = 0;
+	}
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -140,7 +166,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct msm_drm_private *priv = gpu->dev->dev_private;
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = submit->ring;
 	unsigned i;
 
 	for (i = 0; i < submit->nr_cmds; i++) {
@@ -179,7 +205,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 	OUT_PKT3(ring, CP_EVENT_WRITE, 3);
 	OUT_RING(ring, CACHE_FLUSH_TS);
-	OUT_RING(ring, rbmemptr(adreno_gpu, fence));
+	OUT_RING(ring, rbmemptr(adreno_gpu, ring->id, fence));
 	OUT_RING(ring, submit->fence->seqno);
 
 	/* we could maybe be clever and only CP_COND_EXEC the interrupt: */
@@ -206,10 +232,10 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	}
 #endif
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 }
 
-void adreno_flush(struct msm_gpu *gpu)
+void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr;
@@ -219,7 +245,7 @@ void adreno_flush(struct msm_gpu *gpu)
 	 * to account for the possibility that the last command fit exactly into
 	 * the ringbuffer and rb->next hasn't wrapped to zero yet
 	 */
-	wptr = get_wptr(gpu->rb) & ((gpu->rb->size / 4) - 1);
+	wptr = get_wptr(ring) % (MSM_GPU_RINGBUFFER_SZ >> 2);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
@@ -227,17 +253,18 @@ void adreno_flush(struct msm_gpu *gpu)
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_WPTR, wptr);
 }
 
-bool adreno_idle(struct msm_gpu *gpu)
+bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	uint32_t wptr = get_wptr(gpu->rb);
+	uint32_t wptr = get_wptr(ring);
 
 	/* wait for CP to drain ringbuffer: */
-	if (!spin_until(get_rptr(adreno_gpu) == wptr))
+	if (!spin_until(get_rptr(adreno_gpu, ring) == wptr))
 		return true;
 
 	/* TODO maybe we need to reset GPU here to recover from hang? */
-	DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
+	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d!\n", gpu->name,
+		ring->id);
 	return false;
 }
 
@@ -245,6 +272,7 @@ bool adreno_idle(struct msm_gpu *gpu)
 void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_ringbuffer *ring;
 	int i;
 
 	seq_printf(m, "revision: %d (%d.%d.%d.%d)\n",
@@ -252,10 +280,18 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 			adreno_gpu->rev.major, adreno_gpu->rev.minor,
 			adreno_gpu->rev.patchid);
 
-	seq_printf(m, "fence:    %d/%d\n", adreno_gpu->memptrs->fence,
-			gpu->fctx->last_fence);
-	seq_printf(m, "rptr:     %d\n", get_rptr(adreno_gpu));
-	seq_printf(m, "rb wptr:  %d\n", get_wptr(gpu->rb));
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		seq_printf(m, "rb %d: fence:    %d/%d\n", i,
+			adreno_last_fence(gpu, ring),
+			ring->completed_fence);
+
+		seq_printf(m, "      rptr:     %d\n",
+			get_rptr(adreno_gpu, ring));
+		seq_printf(m, "rb wptr:  %d\n", get_wptr(ring));
+	}
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -285,16 +321,25 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 void adreno_dump_info(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
 
 	printk("revision: %d (%d.%d.%d.%d)\n",
 			adreno_gpu->info->revn, adreno_gpu->rev.core,
 			adreno_gpu->rev.major, adreno_gpu->rev.minor,
 			adreno_gpu->rev.patchid);
 
-	printk("fence:    %d/%d\n", adreno_gpu->memptrs->fence,
-			gpu->fctx->last_fence);
-	printk("rptr:     %d\n", get_rptr(adreno_gpu));
-	printk("rb wptr:  %d\n", get_wptr(gpu->rb));
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		printk("rb %d: fence:    %d/%d\n", i,
+			adreno_last_fence(gpu, ring),
+			ring->completed_fence);
+
+		printk("rptr:     %d\n", get_rptr(adreno_gpu, ring));
+		printk("rb wptr:  %d\n", get_wptr(ring));
+	}
 }
 
 /* would be nice to not have to duplicate the _show() stuff with printk(): */
@@ -317,19 +362,20 @@ void adreno_dump(struct msm_gpu *gpu)
 	}
 }
 
-static uint32_t ring_freewords(struct msm_gpu *gpu)
+static uint32_t ring_freewords(struct msm_ringbuffer *ring)
 {
-	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	uint32_t size = gpu->rb->size / 4;
-	uint32_t wptr = get_wptr(gpu->rb);
-	uint32_t rptr = get_rptr(adreno_gpu);
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(ring->gpu);
+	uint32_t size = MSM_GPU_RINGBUFFER_SZ >> 2;
+	uint32_t wptr = get_wptr(ring);
+	uint32_t rptr = get_rptr(adreno_gpu, ring);
 	return (rptr + (size - 1) - wptr) % size;
 }
 
-void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords)
+void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords)
 {
-	if (spin_until(ring_freewords(gpu) >= ndwords))
-		DRM_ERROR("%s: timeout waiting for ringbuffer space\n", gpu->name);
+	if (spin_until(ring_freewords(ring) >= ndwords))
+		DRM_ERROR("%s: timeout waiting for space in ringubffer %d\n",
+			ring->gpu->name, ring->id);
 }
 
 static const char *iommu_ports[] = {
@@ -338,7 +384,8 @@ void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords)
 };
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
-		struct adreno_gpu *adreno_gpu, const struct adreno_gpu_funcs *funcs)
+		struct adreno_gpu *adreno_gpu,
+		const struct adreno_gpu_funcs *funcs, int nr_rings)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -368,7 +415,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.va_start = SZ_16M;
 	adreno_gpu_config.va_end = 0xffffffff;
 
-	adreno_gpu_config.ringsz = RB_SIZE;
+	adreno_gpu_config.nr_rings = nr_rings;
 
 	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
 			adreno_gpu->info->name, &adreno_gpu_config);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index fdf4ef3..f5118ad 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -80,12 +80,18 @@ struct adreno_info {
 
 const struct adreno_info *adreno_info(struct adreno_rev rev);
 
-#define rbmemptr(adreno_gpu, member)  \
+#define _sizeof(member) \
+	sizeof(((struct adreno_rbmemptrs *) 0)->member[0])
+
+#define _base(adreno_gpu, member)  \
 	((adreno_gpu)->memptrs_iova + offsetof(struct adreno_rbmemptrs, member))
 
+#define rbmemptr(adreno_gpu, index, member) \
+	(_base((adreno_gpu), member) + ((index) * _sizeof(member)))
+
 struct adreno_rbmemptrs {
-	volatile uint32_t rptr;
-	volatile uint32_t fence;
+	volatile uint32_t rptr[MSM_GPU_MAX_RINGS];
+	volatile uint32_t fence[MSM_GPU_MAX_RINGS];
 };
 
 struct adreno_gpu {
@@ -198,21 +204,25 @@ static inline int adreno_is_a530(struct adreno_gpu *gpu)
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 int adreno_hw_init(struct msm_gpu *gpu);
-uint32_t adreno_last_fence(struct msm_gpu *gpu);
+uint32_t adreno_last_fence(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
+uint32_t adreno_submitted_fence(struct msm_gpu *gpu,
+		struct msm_ringbuffer *ring);
 void adreno_recover(struct msm_gpu *gpu);
 void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		struct msm_file_private *ctx);
-void adreno_flush(struct msm_gpu *gpu);
-bool adreno_idle(struct msm_gpu *gpu);
+void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
+bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 #ifdef CONFIG_DEBUG_FS
 void adreno_show(struct msm_gpu *gpu, struct seq_file *m);
 #endif
 void adreno_dump_info(struct msm_gpu *gpu);
 void adreno_dump(struct msm_gpu *gpu);
-void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords);
+void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords);
+struct msm_ringbuffer *adreno_active_ring(struct msm_gpu *gpu);
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
-		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs);
+		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
+		int nr_rings);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 
 
@@ -221,7 +231,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 static inline void
 OUT_PKT0(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt+1);
+	adreno_wait_ring(ring, cnt+1);
 	OUT_RING(ring, CP_TYPE0_PKT | ((cnt-1) << 16) | (regindx & 0x7FFF));
 }
 
@@ -229,14 +239,14 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 static inline void
 OUT_PKT2(struct msm_ringbuffer *ring)
 {
-	adreno_wait_ring(ring->gpu, 1);
+	adreno_wait_ring(ring, 1);
 	OUT_RING(ring, CP_TYPE2_PKT);
 }
 
 static inline void
 OUT_PKT3(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt+1);
+	adreno_wait_ring(ring, cnt+1);
 	OUT_RING(ring, CP_TYPE3_PKT | ((cnt-1) << 16) | ((opcode & 0xFF) << 8));
 }
 
@@ -258,14 +268,14 @@ static inline u32 PM4_PARITY(u32 val)
 static inline void
 OUT_PKT4(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt + 1);
+	adreno_wait_ring(ring, cnt + 1);
 	OUT_RING(ring, PKT4(regindx, cnt));
 }
 
 static inline void
 OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt + 1);
+	adreno_wait_ring(ring, cnt + 1);
 	OUT_RING(ring, CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) |
 		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23));
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 0c1a49e..7ff7a83 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -78,6 +78,8 @@ struct msm_vblank_ctrl {
 	spinlock_t lock;
 };
 
+#define MSM_GPU_MAX_RINGS 1
+
 struct msm_drm_private {
 
 	struct drm_device *dev;
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 3f299c5..c1e7614 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -20,7 +20,6 @@
 #include "msm_drv.h"
 #include "msm_fence.h"
 
-
 struct msm_fence_context *
 msm_fence_context_alloc(struct drm_device *dev, const char *name)
 {
@@ -32,9 +31,10 @@ struct msm_fence_context *
 
 	fctx->dev = dev;
 	fctx->name = name;
-	fctx->context = dma_fence_context_alloc(1);
+	fctx->context = dma_fence_context_alloc(MSM_GPU_MAX_RINGS);
 	init_waitqueue_head(&fctx->event);
 	spin_lock_init(&fctx->spinlock);
+	hash_init(fctx->hash);
 
 	return fctx;
 }
@@ -44,64 +44,94 @@ void msm_fence_context_free(struct msm_fence_context *fctx)
 	kfree(fctx);
 }
 
-static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fence)
+static inline bool fence_completed(struct msm_ringbuffer *ring, uint32_t fence)
+{
+	return (int32_t)(ring->completed_fence - fence) >= 0;
+}
+
+struct msm_fence {
+	struct msm_fence_context *fctx;
+	struct msm_ringbuffer *ring;
+	struct dma_fence base;
+	struct hlist_node node;
+	u32 fence_id;
+};
+
+static struct msm_fence *fence_from_id(struct msm_fence_context *fctx,
+		uint32_t id)
 {
-	return (int32_t)(fctx->completed_fence - fence) >= 0;
+	struct msm_fence *f;
+
+	hash_for_each_possible_rcu(fctx->hash, f, node, id) {
+		if (f->fence_id == id) {
+			if (dma_fence_get_rcu(&f->base))
+				return f;
+		}
+	}
+
+	return NULL;
 }
 
 /* legacy path for WAIT_FENCE ioctl: */
 int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
 		ktime_t *timeout, bool interruptible)
 {
+	struct msm_fence *f = fence_from_id(fctx, fence);
 	int ret;
 
-	if (fence > fctx->last_fence) {
-		DRM_ERROR("%s: waiting on invalid fence: %u (of %u)\n",
-				fctx->name, fence, fctx->last_fence);
-		return -EINVAL;
+	/* If no active fence was found, there are two possibilities */
+	if (!f) {
+		/* The requested ID is newer than last issued - return error */
+		if (fence > fctx->fence_id) {
+			DRM_ERROR("%s: waiting on invalid fence: %u (of %u)\n",
+				fctx->name, fence, fctx->fence_id);
+			return -EINVAL;
+		}
+
+		/* If the id has been issued assume fence has been retired */
+		return 0;
 	}
 
 	if (!timeout) {
 		/* no-wait: */
-		ret = fence_completed(fctx, fence) ? 0 : -EBUSY;
+		ret = fence_completed(f->ring, f->base.seqno) ? 0 : -EBUSY;
 	} else {
 		unsigned long remaining_jiffies = timeout_to_jiffies(timeout);
 
 		if (interruptible)
 			ret = wait_event_interruptible_timeout(fctx->event,
-				fence_completed(fctx, fence),
+				fence_completed(f->ring, f->base.seqno),
 				remaining_jiffies);
 		else
 			ret = wait_event_timeout(fctx->event,
-				fence_completed(fctx, fence),
+				fence_completed(f->ring, f->base.seqno),
 				remaining_jiffies);
 
 		if (ret == 0) {
 			DBG("timeout waiting for fence: %u (completed: %u)",
-					fence, fctx->completed_fence);
+				f->base.seqno, f->ring->completed_fence);
 			ret = -ETIMEDOUT;
 		} else if (ret != -ERESTARTSYS) {
 			ret = 0;
 		}
 	}
 
+	dma_fence_put(&f->base);
+
 	return ret;
 }
 
 /* called from workqueue */
-void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
+void msm_update_fence(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring, uint32_t fence)
 {
 	spin_lock(&fctx->spinlock);
-	fctx->completed_fence = max(fence, fctx->completed_fence);
+	ring->completed_fence = max(fence, ring->completed_fence);
 	spin_unlock(&fctx->spinlock);
 
 	wake_up_all(&fctx->event);
 }
 
-struct msm_fence {
-	struct msm_fence_context *fctx;
-	struct dma_fence base;
-};
 
 static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
 {
@@ -127,12 +157,17 @@ static bool msm_fence_enable_signaling(struct dma_fence *fence)
 static bool msm_fence_signaled(struct dma_fence *fence)
 {
 	struct msm_fence *f = to_msm_fence(fence);
-	return fence_completed(f->fctx, f->base.seqno);
+	return fence_completed(f->ring, f->base.seqno);
 }
 
 static void msm_fence_release(struct dma_fence *fence)
 {
 	struct msm_fence *f = to_msm_fence(fence);
+
+	spin_lock(&f->fctx->spinlock);
+	hash_del_rcu(&f->node);
+	spin_unlock(&f->fctx->spinlock);
+
 	kfree_rcu(f, base.rcu);
 }
 
@@ -146,7 +181,7 @@ static void msm_fence_release(struct dma_fence *fence)
 };
 
 struct dma_fence *
-msm_fence_alloc(struct msm_fence_context *fctx)
+msm_fence_alloc(struct msm_fence_context *fctx, struct msm_ringbuffer *ring)
 {
 	struct msm_fence *f;
 
@@ -155,9 +190,17 @@ struct dma_fence *
 		return ERR_PTR(-ENOMEM);
 
 	f->fctx = fctx;
+	f->ring = ring;
+
+	/* Make a user fence ID to pass back for the legacy functions */
+	f->fence_id = ++fctx->fence_id;
+
+	spin_lock(&fctx->spinlock);
+	hash_add(fctx->hash, &f->node, f->fence_id);
+	spin_unlock(&fctx->spinlock);
 
 	dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
-		       fctx->context, ++fctx->last_fence);
+			fctx->context + ring->id, ++ring->last_fence);
 
 	return &f->base;
 }
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index 56061aa..540dc61 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -18,17 +18,18 @@
 #ifndef __MSM_FENCE_H__
 #define __MSM_FENCE_H__
 
+#include <linux/hashtable.h>
 #include "msm_drv.h"
+#include "msm_ringbuffer.h"
 
 struct msm_fence_context {
 	struct drm_device *dev;
 	const char *name;
 	unsigned context;
-	/* last_fence == completed_fence --> no pending work */
-	uint32_t last_fence;          /* last assigned fence */
-	uint32_t completed_fence;     /* last completed fence */
+	u32 fence_id;
 	wait_queue_head_t event;
 	spinlock_t spinlock;
+	DECLARE_HASHTABLE(hash, 4);
 };
 
 struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
@@ -39,8 +40,10 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
 		ktime_t *timeout, bool interruptible);
 int msm_queue_fence_cb(struct msm_fence_context *fctx,
 		struct msm_fence_cb *cb, uint32_t fence);
-void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
+void msm_update_fence(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring, uint32_t fence);
 
-struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
+struct dma_fence *msm_fence_alloc(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring);
 
 #endif
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 40cd0b6..633b5c1 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -120,6 +120,7 @@ struct msm_gem_submit {
 	struct dma_fence *fence;
 	struct pid *pid;    /* submitting process */
 	bool valid;         /* true if no cmdstream patching needed */
+	struct msm_ringbuffer *ring;
 	unsigned int nr_cmds;
 	unsigned int nr_bos;
 	struct {
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 6fd7446..0907c60 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -389,7 +389,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	struct sync_file *sync_file = NULL;
 	int out_fence_fd = -1;
 	unsigned i;
-	int ret;
+	int ret, ring;
 
 	if (!gpu)
 		return -ENXIO;
@@ -521,7 +521,13 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	submit->nr_cmds = i;
 
-	submit->fence = msm_fence_alloc(gpu->fctx);
+	ring = clamp_t(uint32_t,
+		(args->flags & MSM_SUBMIT_RING_MASK) >> MSM_SUBMIT_RING_SHIFT,
+		0, gpu->nr_rings - 1);
+
+	submit->ring = gpu->rb[ring];
+
+	submit->fence = msm_fence_alloc(gpu->fctx, submit->ring);
 	if (IS_ERR(submit->fence)) {
 		ret = PTR_ERR(submit->fence);
 		submit->fence = NULL;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index bc75425..c7f0ea9 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -274,15 +274,37 @@ static void recover_worker(struct work_struct *work)
 	struct msm_gpu *gpu = container_of(work, struct msm_gpu, recover_work);
 	struct drm_device *dev = gpu->dev;
 	struct msm_gem_submit *submit;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
+	int i;
+
+	/* Update all the rings with the latest and greatest fence */
+	FOR_EACH_RING(gpu, ring, i) {
+		uint32_t fence = gpu->funcs->last_fence(gpu, ring);
 
-	msm_update_fence(gpu->fctx, fence + 1);
+		/*
+		 * For the current (faulting?) ring/submit advance the fence by
+		 * one more to clear the faulting submit
+		 */
+		if (ring == cur_ring)
+			fence = fence + 1;
+
+		msm_update_fence(gpu->fctx, cur_ring, fence);
+	}
 
 	mutex_lock(&dev->struct_mutex);
 
 	dev_err(dev->dev, "%s: hangcheck recover!\n", gpu->name);
 	list_for_each_entry(submit, &gpu->submit_list, node) {
-		if (submit->fence->seqno == (fence + 1)) {
+		uint32_t fence;
+
+		/* Only consider submits for the current ring */
+		if (submit->ring != cur_ring)
+			continue;
+
+		fence = gpu->funcs->last_fence(gpu, ring) + 1;
+
+		if (submit->fence->seqno == fence) {
 			struct task_struct *task;
 
 			rcu_read_lock();
@@ -303,10 +325,9 @@ static void recover_worker(struct work_struct *work)
 		inactive_cancel(gpu);
 		gpu->funcs->recover(gpu);
 
-		/* replay the remaining submits after the one that hung: */
-		list_for_each_entry(submit, &gpu->submit_list, node) {
+		/* replay the remaining submits for all rings: */
+		list_for_each_entry(submit, &gpu->submit_list, node)
 			gpu->funcs->submit(gpu, submit, NULL);
-		}
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -326,25 +347,27 @@ static void hangcheck_handler(unsigned long data)
 	struct msm_gpu *gpu = (struct msm_gpu *)data;
 	struct drm_device *dev = gpu->dev;
 	struct msm_drm_private *priv = dev->dev_private;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring = gpu->funcs->active_ring(gpu);
+	uint32_t fence = gpu->funcs->last_fence(gpu, ring);
 
-	if (fence != gpu->hangcheck_fence) {
+	if (fence != gpu->hangcheck_fence[ring->id]) {
 		/* some progress has been made.. ya! */
-		gpu->hangcheck_fence = fence;
-	} else if (fence < gpu->fctx->last_fence) {
+		gpu->hangcheck_fence[ring->id] = fence;
+	} else if (fence < ring->last_fence) {
 		/* no progress and not done.. hung! */
-		gpu->hangcheck_fence = fence;
-		dev_err(dev->dev, "%s: hangcheck detected gpu lockup!\n",
-				gpu->name);
+		gpu->hangcheck_fence[ring->id] = fence;
+		dev_err(dev->dev, "%s: hangcheck detected gpu lockup rb %d!\n",
+				gpu->name, ring->id);
 		dev_err(dev->dev, "%s:     completed fence: %u\n",
 				gpu->name, fence);
 		dev_err(dev->dev, "%s:     submitted fence: %u\n",
-				gpu->name, gpu->fctx->last_fence);
+				gpu->name, ring->last_fence);
+
 		queue_work(priv->wq, &gpu->recover_work);
 	}
 
 	/* if still more pending work, reset the hangcheck timer: */
-	if (gpu->fctx->last_fence > gpu->hangcheck_fence)
+	if (ring->last_fence > gpu->hangcheck_fence[ring->id])
 		hangcheck_timer_reset(gpu);
 
 	/* workaround for missing irq: */
@@ -468,20 +491,13 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 static void retire_submits(struct msm_gpu *gpu)
 {
 	struct drm_device *dev = gpu->dev;
+	struct msm_gem_submit *submit, *tmp;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	while (!list_empty(&gpu->submit_list)) {
-		struct msm_gem_submit *submit;
-
-		submit = list_first_entry(&gpu->submit_list,
-				struct msm_gem_submit, node);
-
-		if (dma_fence_is_signaled(submit->fence)) {
+	list_for_each_entry_safe(submit, tmp, &gpu->submit_list, node) {
+		if (dma_fence_is_signaled(submit->fence))
 			retire_submit(gpu, submit);
-		} else {
-			break;
-		}
 	}
 }
 
@@ -489,9 +505,12 @@ static void retire_worker(struct work_struct *work)
 {
 	struct msm_gpu *gpu = container_of(work, struct msm_gpu, retire_work);
 	struct drm_device *dev = gpu->dev;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
 
-	msm_update_fence(gpu->fctx, fence);
+	FOR_EACH_RING(gpu, ring, i)
+		msm_update_fence(gpu->fctx, ring,
+			gpu->funcs->last_fence(gpu, ring));
 
 	mutex_lock(&dev->struct_mutex);
 	retire_submits(gpu);
@@ -573,7 +592,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		const char *name, struct msm_gpu_config *config)
 {
 	struct iommu_domain *iommu;
-	int i, ret;
+	int i, ret, nr_rings = config->nr_rings;
 
 	if (WARN_ON(gpu->num_perfcntrs > ARRAY_SIZE(gpu->last_cntrs)))
 		gpu->num_perfcntrs = ARRAY_SIZE(gpu->last_cntrs);
@@ -675,37 +694,58 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
 	}
 
-	/* Create ringbuffer: */
-	mutex_lock(&drm->struct_mutex);
-	gpu->rb = msm_ringbuffer_new(gpu, config->ringsz);
-	mutex_unlock(&drm->struct_mutex);
-	if (IS_ERR(gpu->rb)) {
-		ret = PTR_ERR(gpu->rb);
-		gpu->rb = NULL;
-		dev_err(drm->dev, "could not create ringbuffer: %d\n", ret);
-		goto fail;
+	if (nr_rings > ARRAY_SIZE(gpu->rb)) {
+		WARN(1, "Only creating %lu ringbuffers\n", ARRAY_SIZE(gpu->rb));
+		nr_rings = ARRAY_SIZE(gpu->rb);
+	}
+
+	/* Create ringbuffer(s): */
+	for (i = 0; i < nr_rings; i++) {
+		mutex_lock(&drm->struct_mutex);
+		gpu->rb[i] = msm_ringbuffer_new(gpu, i);
+		mutex_unlock(&drm->struct_mutex);
+
+		if (IS_ERR(gpu->rb[i])) {
+			ret = PTR_ERR(gpu->rb[i]);
+			gpu->rb[i] = NULL;
+			dev_err(drm->dev,
+				"could not create ringbuffer %d: %d\n", i, ret);
+			goto fail;
+		}
 	}
 
+	gpu->nr_rings = nr_rings;
 	bs_init(gpu);
 
 	return 0;
 
 fail:
+	for (i = 0; i < ARRAY_SIZE(gpu->rb); i++) {
+		if (gpu->rb[i])
+			msm_ringbuffer_destroy(gpu->rb[i]);
+	}
+
 	return ret;
 }
 
 void msm_gpu_cleanup(struct msm_gpu *gpu)
 {
+	int i;
+
 	DBG("%s", gpu->name);
 
 	WARN_ON(!list_empty(&gpu->active_list));
 
 	bs_fini(gpu);
 
-	if (gpu->rb) {
-		if (gpu->rb_iova)
-			msm_gem_put_iova(gpu->rb->bo, gpu->aspace);
-		msm_ringbuffer_destroy(gpu->rb);
+	for (i = 0; i < ARRAY_SIZE(gpu->rb); i++) {
+		if (!gpu->rb[i])
+			continue;
+
+		if (gpu->rb[i]->iova)
+			msm_gem_put_iova(gpu->rb[i]->bo, gpu->aspace);
+
+		msm_ringbuffer_destroy(gpu->rb[i]);
 	}
 
 	if (gpu->fctx)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index cc6530f..38d826a 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -33,7 +33,7 @@ struct msm_gpu_config {
 	const char *irqname;
 	uint64_t va_start;
 	uint64_t va_end;
-	unsigned int ringsz;
+	unsigned int nr_rings;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
@@ -57,9 +57,11 @@ struct msm_gpu_funcs {
 	int (*pm_resume)(struct msm_gpu *gpu);
 	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			struct msm_file_private *ctx);
-	void (*flush)(struct msm_gpu *gpu);
+	void (*flush)(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
-	uint32_t (*last_fence)(struct msm_gpu *gpu);
+	uint32_t (*last_fence)(struct msm_gpu *gpu,
+			struct msm_ringbuffer *ring);
+	struct msm_ringbuffer *(*active_ring)(struct msm_gpu *gpu);
 	void (*recover)(struct msm_gpu *gpu);
 	void (*destroy)(struct msm_gpu *gpu);
 #ifdef CONFIG_DEBUG_FS
@@ -85,9 +87,8 @@ struct msm_gpu {
 	const struct msm_gpu_perfcntr *perfcntrs;
 	uint32_t num_perfcntrs;
 
-	/* ringbuffer: */
-	struct msm_ringbuffer *rb;
-	uint64_t rb_iova;
+	struct msm_ringbuffer *rb[MSM_GPU_MAX_RINGS];
+	int nr_rings;
 
 	/* list of GEM active objects: */
 	struct list_head active_list;
@@ -126,15 +127,36 @@ struct msm_gpu {
 #define DRM_MSM_HANGCHECK_PERIOD 500 /* in ms */
 #define DRM_MSM_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_MSM_HANGCHECK_PERIOD)
 	struct timer_list hangcheck_timer;
-	uint32_t hangcheck_fence;
+	uint32_t hangcheck_fence[MSM_GPU_MAX_RINGS];
 	struct work_struct recover_work;
 
 	struct list_head submit_list;
 };
 
+/* It turns out that all targets use the same ringbuffer size */
+#define MSM_GPU_RINGBUFFER_SZ SZ_32K
+
+static inline struct msm_ringbuffer *__get_ring(struct msm_gpu *gpu, int index)
+{
+	return (index < ARRAY_SIZE(gpu->rb) ? gpu->rb[index] : NULL);
+}
+
+#define FOR_EACH_RING(gpu, ring, index) \
+	for (index = 0, ring = (gpu)->rb[0]; \
+		index < (gpu)->nr_rings && index < ARRAY_SIZE((gpu)->rb); \
+		index++, ring = __get_ring(gpu, index))
+
 static inline bool msm_gpu_active(struct msm_gpu *gpu)
 {
-	return gpu->fctx->last_fence > gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (ring->last_fence > gpu->funcs->last_fence(gpu, ring))
+			return true;
+	}
+
+	return false;
 }
 
 /* Perf-Counters:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 67b34e0..2ab31c7 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -18,13 +18,13 @@
 #include "msm_ringbuffer.h"
 #include "msm_gpu.h"
 
-struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
+struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 {
 	struct msm_ringbuffer *ring;
 	int ret;
 
-	if (WARN_ON(!is_power_of_2(size)))
-		return ERR_PTR(-EINVAL);
+	/* We assume everwhere that MSM_GPU_RINGBUFFER_SZ is a power of 2 */
+	BUILD_BUG_ON(!is_power_of_2(MSM_GPU_RINGBUFFER_SZ));
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
 	if (!ring) {
@@ -33,7 +33,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
 	}
 
 	ring->gpu = gpu;
-	ring->bo = msm_gem_new(gpu->dev, size, MSM_BO_WC);
+	ring->id = id;
+	ring->bo = msm_gem_new(gpu->dev, MSM_GPU_RINGBUFFER_SZ, MSM_BO_WC);
 	if (IS_ERR(ring->bo)) {
 		ret = PTR_ERR(ring->bo);
 		ring->bo = NULL;
@@ -45,11 +46,9 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
 		ret = PTR_ERR(ring->start);
 		goto fail;
 	}
-	ring->end   = ring->start + (size / 4);
+	ring->end   = ring->start + (MSM_GPU_RINGBUFFER_SZ >> 2);
 	ring->cur   = ring->start;
 
-	ring->size = size;
-
 	return ring;
 
 fail:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 6e0e104..4eb05fe 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -22,12 +22,16 @@
 
 struct msm_ringbuffer {
 	struct msm_gpu *gpu;
-	int size;
+	int id;
 	struct drm_gem_object *bo;
 	uint32_t *start, *end, *cur;
+	uint64_t iova;
+	/* last_fence == completed_fence --> no pending work */
+	uint32_t last_fence;
+	uint32_t completed_fence;
 };
 
-struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size);
+struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id);
 void msm_ringbuffer_destroy(struct msm_ringbuffer *ring);
 
 /* ringbuffer helpers (the parts that are same for a3xx/a2xx/z180..) */
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 045ad20..5134f4a 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -195,10 +195,15 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+
+#define MSM_SUBMIT_RING_MASK 0x000F0000
+#define MSM_SUBMIT_RING_SHIFT 16
+
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_RING_MASK	 | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (5 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

Add a shadow pointer to track the current command being written into
the ring. Don't commit it as 'cur' until the command is submitted.
Because 'cur' is used to construct the software copy of the wptr this
ensures that somebody peeking in on the ring doesn't assume that a
command is inflight while it is being written. This isn't a huge deal
with a single ring (though technically the hangcheck could assume
the system is prematurely busy when it isn't) but it will be rather
important for preemption where the decision to preempt is based
on a non-empty ringbuffer. Without a shadow an aggressive preemption
scheme could assume that the ringbuffer is non empty and switch to it
before the CPU is done writing the command and boom.

Even though preemption won't be supported for all targets because of
the way the code is organized it is simpler to make this generic for
all targets. The extra load for non-preemption targets should be
minimal.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  9 +++++++--
 drivers/gpu/drm/msm/msm_ringbuffer.c    |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h    | 12 ++++++++----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index f5c2bad..44a95ea 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -144,6 +144,7 @@ void adreno_recover(struct msm_gpu *gpu)
 			continue;
 
 		ring->cur = ring->start;
+		ring->next = ring->start;
 
 		/* reset completed fence seqno, discard anything pending: */
 		adreno_gpu->memptrs->fence[ring->id] = ring->completed_fence;
@@ -240,12 +241,15 @@ void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr;
 
+	/* Copy the shadow to the actual register */
+	ring->cur = ring->next;
+
 	/*
 	 * Mask wptr value that we calculate to fit in the HW range. This is
 	 * to account for the possibility that the last command fit exactly into
 	 * the ringbuffer and rb->next hasn't wrapped to zero yet
 	 */
-	wptr = get_wptr(ring) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+	wptr = (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
@@ -366,7 +370,8 @@ static uint32_t ring_freewords(struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(ring->gpu);
 	uint32_t size = MSM_GPU_RINGBUFFER_SZ >> 2;
-	uint32_t wptr = get_wptr(ring);
+	/* Use ring->next to calculate free size */
+	uint32_t wptr = ring->next - ring->start;
 	uint32_t rptr = get_rptr(adreno_gpu, ring);
 	return (rptr + (size - 1) - wptr) % size;
 }
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 2ab31c7..b885979 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -47,6 +47,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 		goto fail;
 	}
 	ring->end   = ring->start + (MSM_GPU_RINGBUFFER_SZ >> 2);
+	ring->next  = ring->start;
 	ring->cur   = ring->start;
 
 	return ring;
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 4eb05fe..865b21a 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -24,7 +24,7 @@ struct msm_ringbuffer {
 	struct msm_gpu *gpu;
 	int id;
 	struct drm_gem_object *bo;
-	uint32_t *start, *end, *cur;
+	uint32_t *start, *end, *cur, *next;
 	uint64_t iova;
 	/* last_fence == completed_fence --> no pending work */
 	uint32_t last_fence;
@@ -39,9 +39,13 @@ struct msm_ringbuffer {
 static inline void
 OUT_RING(struct msm_ringbuffer *ring, uint32_t data)
 {
-	if (ring->cur == ring->end)
-		ring->cur = ring->start;
-	*(ring->cur++) = data;
+	/*
+	 * ring->next points to the current command being written - it won't be
+	 * committed as ring->cur until the flush
+	 */
+	if (ring->next == ring->end)
+		ring->next = ring->start;
+	*(ring->next++) = data;
 }
 
 #endif /* __MSM_RINGBUFFER_H__ */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (6 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
  2017-02-06 17:39 ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

We use a global ringbuffer size and block size for all targets and
at least for 5XX preemption we need to know the value the RB_CNTL
in several locations so it makes sense to caculate it once and use
it everywhere.

The only monkey wrench is that we need to disable the RPTR shadow
for A430 targets but that only needs to be done once and doesn't
affect A5XX so we can or in the value at init time.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 12 +++++++-----
 drivers/gpu/drm/msm/msm_gpu.h           |  5 +++++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 44a95ea..aca1fc3 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -21,7 +21,6 @@
 #include "msm_gem.h"
 #include "msm_mmu.h"
 
-#define RB_BLKSIZE 32
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 {
@@ -71,11 +70,14 @@ int adreno_hw_init(struct msm_gpu *gpu)
 		}
 	}
 
-	/* Setup REG_CP_RB_CNTL: */
+	/*
+	 * Setup REG_CP_RB_CNTL.  The same value is used across targets (with
+	 * the excpetion of A430 that disables the RPTR shadow) - the cacluation
+	 * for the ringbuffer size and block size is moved to msm_gpu.h for the
+	 * pre-processor to deal with and the A430 variant is ORed in here
+	 */
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_CNTL,
-		/* size is log2(quad-words): */
-		AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) |
-		AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
+		MSM_GPU_RB_CNTL_DEFAULT |
 		(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
 
 	/* Setup ringbuffer address - use ringbuffer[0] for GPU init */
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 38d826a..50fef27 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -135,6 +135,11 @@ struct msm_gpu {
 
 /* It turns out that all targets use the same ringbuffer size */
 #define MSM_GPU_RINGBUFFER_SZ SZ_32K
+#define MSM_GPU_RINGBUFFER_BLKSIZE 32
+
+#define MSM_GPU_RB_CNTL_DEFAULT \
+		(AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) | \
+		AXXX_CP_RB_CNTL_BLKSZ(ilog2(MSM_GPU_RINGBUFFER_BLKSIZE / 8)))
 
 static inline struct msm_ringbuffer *__get_ring(struct msm_gpu *gpu, int index)
 {
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/11] drm/msm: Implement preemption for A5XX targets
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (7 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
@ 2017-02-06 17:39 ` Jordan Crouse
       [not found]   ` <1486402779-9024-12-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-02-06 17:59 ` [PATCH 00/11] drm/msm: A5XX preemption Daniel Vetter
       [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  10 siblings, 1 reply; 36+ messages in thread
From: Jordan Crouse @ 2017-02-06 17:39 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

Implement preemption for A5XX targets - this allows multiple
ringbuffers for different priorities with automatic preemption
of a lower priority ringbuffer if a higher one is ready.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/Makefile              |   1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 172 +++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 105 +++++++++
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 367 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  11 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   5 +
 drivers/gpu/drm/msm/msm_drv.h             |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c      |   2 +
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 9 files changed, 654 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 028c24d..8a3d74e 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -8,6 +8,7 @@ msm-y := \
 	adreno/a4xx_gpu.o \
 	adreno/a5xx_gpu.o \
 	adreno/a5xx_power.o \
+	adreno/a5xx_preempt.o \
 	hdmi/hdmi.o \
 	hdmi/hdmi_audio.o \
 	hdmi/hdmi_bridge.o \
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 5f02ff3..b7c6158 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -184,14 +184,66 @@ static int zap_load_mdt(struct platform_device *pdev)
 	return ret;
 }
 
+static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	uint32_t wptr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ring->lock, flags);
+
+	/* Copy the shadow to the actual register */
+	ring->cur = ring->next;
+
+	/* Make sure to wrap wptr if we need to */
+	wptr = (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	/* Make sure everything is posted before making a decision */
+	mb();
+
+	/* Update HW if this is the current ring and we are not in preempt */
+	if (a5xx_gpu->cur_ring == ring && !a5xx_in_preempt(a5xx_gpu))
+		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
 	struct msm_drm_private *priv = gpu->dev->dev_private;
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Turn off protected mode to write to special registers */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Set the save preemption record for the ring/command */
+	OUT_PKT4(ring, REG_A5XX_CP_CONTEXT_SWITCH_SAVE_ADDR_LO, 2);
+	OUT_RING(ring, lower_32_bits(a5xx_gpu->preempt_iova[submit->ring->id]));
+	OUT_RING(ring, upper_32_bits(a5xx_gpu->preempt_iova[submit->ring->id]));
+
+	/* Turn back on protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+
+	/* Enable local preemption for finegrain preemption */
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Allow CP_CONTEXT_SWITCH_YIELD packets in the IB2 */
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Submit the commands */
 	for (i = 0; i < submit->nr_cmds; i++) {
 		switch (submit->cmd[i].type) {
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
@@ -209,16 +261,54 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		}
 	}
 
+	/*
+	 * Write the render mode to NULL (0) to indicate to the CP that the IBs
+	 * are done rendering - otherwise a lucky preemption would start
+	 * replaying from the last checkpoint
+	 */
+	OUT_PKT7(ring, CP_SET_RENDER_MODE, 5);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+
+	/* Turn off IB level preemptions */
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x01);
+
+	/* Write the fence to the scratch register */
 	OUT_PKT4(ring, REG_A5XX_CP_SCRATCH_REG(2), 1);
 	OUT_RING(ring, submit->fence->seqno);
 
+	/*
+	 * Execute a CACHE_FLUSH_TS event. This will ensure that the
+	 * timestamp is written to the memory and then triggers the interrupt
+	 */
 	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
 	OUT_RING(ring, CACHE_FLUSH_TS | (1 << 31));
 	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, submit->fence->seqno);
 
-	gpu->funcs->flush(gpu, ring);
+	/* Yield the floor on command completion */
+	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+	/*
+	 * If dword[2:1] are non zero, they specify an address for the CP to
+	 * write the value of dword[3] to on preemption complete. Write 0 to
+	 * skip the write
+	 */
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	/* Data value - not used if the address above is 0 */
+	OUT_RING(ring, 0x01);
+	/* Set bit 0 to trigger an interrupt on preempt complete */
+	OUT_RING(ring, 0x01);
+
+	a5xx_flush(gpu, ring);
+
+	/* Check to see if we need to start preemption */
+	a5xx_preempt_trigger(gpu);
 }
 
 struct a5xx_hwcg {
@@ -393,6 +483,50 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
 }
 
+static int a5xx_preempt_start(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring = gpu->rb[0];
+
+	if (gpu->nr_rings == 1)
+		return 0;
+
+	/* Turn off protected mode to write to special registers */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Set the save preemption record for the ring/command */
+	OUT_PKT4(ring, REG_A5XX_CP_CONTEXT_SWITCH_SAVE_ADDR_LO, 2);
+	OUT_RING(ring, lower_32_bits(a5xx_gpu->preempt_iova[ring->id]));
+	OUT_RING(ring, upper_32_bits(a5xx_gpu->preempt_iova[ring->id]));
+
+	/* Turn back on protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x00);
+
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_LOCAL, 1);
+	OUT_RING(ring, 0x01);
+
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x01);
+
+	/* Yield the floor on command completion */
+	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x01);
+	OUT_RING(ring, 0x01);
+
+	gpu->funcs->flush(gpu, ring);
+
+	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
+}
+
+
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
 		const struct firmware *fw, u64 *iova)
 {
@@ -525,6 +659,7 @@ static int a5xx_zap_shader_init(struct msm_gpu *gpu)
 	  A5XX_RBBM_INT_0_MASK_RBBM_ETS_MS_TIMEOUT | \
 	  A5XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNC_OVERFLOW | \
 	  A5XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
+	  A5XX_RBBM_INT_0_MASK_CP_SW | \
 	  A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS | \
 	  A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS | \
 	  A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
@@ -672,6 +807,8 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	if (ret)
 		return ret;
 
+	a5xx_preempt_hw_init(gpu);
+
 	ret = a5xx_ucode_init(gpu);
 	if (ret)
 		return ret;
@@ -724,6 +861,9 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
 	}
 
+	/* Last step - yield the ringbuffer */
+	a5xx_preempt_start(gpu);
+
 	return 0;
 }
 
@@ -754,6 +894,8 @@ static void a5xx_destroy(struct msm_gpu *gpu)
 
 	DBG("%s", gpu->name);
 
+	a5xx_preempt_fini(gpu);
+
 	if (a5xx_gpu->pm4_bo) {
 		if (a5xx_gpu->pm4_iova)
 			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->aspace);
@@ -791,6 +933,14 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 
 bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
+	if (ring != a5xx_gpu->cur_ring) {
+		WARN(1, "Tried to idle a non-current ringbuffer\n");
+		return false;
+	}
+
 	/* wait for CP to drain ringbuffer: */
 	if (!adreno_idle(gpu, ring))
 		return false;
@@ -957,6 +1107,9 @@ static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
 	if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
 		msm_gpu_retire(gpu);
 
+	if (status & A5XX_RBBM_INT_0_MASK_CP_SW)
+		a5xx_preempt_irq(gpu);
+
 	return IRQ_HANDLED;
 }
 
@@ -1083,6 +1236,14 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 }
 #endif
 
+static struct msm_ringbuffer *a5xx_active_ring(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
+	return a5xx_gpu->cur_ring;
+}
+
 static const struct adreno_gpu_funcs funcs = {
 	.base = {
 		.get_param = adreno_get_param,
@@ -1092,8 +1253,8 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.recover = a5xx_recover,
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
-		.flush = adreno_flush,
-		.active_ring = adreno_active_ring,
+		.flush = a5xx_flush,
+		.active_ring = a5xx_active_ring,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
@@ -1128,7 +1289,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	a5xx_gpu->lm_leakage = 0x4E001A;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
@@ -1137,5 +1298,8 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 	if (gpu->aspace)
 		msm_mmu_set_fault_handler(gpu->aspace->mmu, gpu, a5xx_fault_handler);
 
+	/* Set up the preemption specific bits and pieces for each ringbuffer */
+	a5xx_preempt_init(gpu);
+
 	return gpu;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 405b563..5993d3a 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -36,10 +36,103 @@ struct a5xx_gpu {
 	uint32_t gpmu_dwords;
 
 	uint32_t lm_leakage;
+
+	struct msm_ringbuffer *cur_ring;
+	struct msm_ringbuffer *next_ring;
+
+	struct drm_gem_object *preempt_bo[MSM_GPU_MAX_RINGS];
+	struct a5xx_preempt_record *preempt[MSM_GPU_MAX_RINGS];
+	uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
+
+	atomic_t preempt_state;
+	struct work_struct preempt_work;
+	struct timer_list preempt_timer;
+
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
 
+/*
+ * In order to do lockless preemption we use a simple state machine to progress
+ * through the process.
+ *
+ * PREEMPT_NONE - no preemption in progress.  Next state START.
+ * PREEMPT_START - The trigger is evaulating if preemption is possible. Next
+ * states: TRIGGERED, NONE
+ * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
+ * states: FAULTED, PENDING
+ * PREEMPT_FAULTED: A preemption timed out (never completed). This will trigger
+ * recovery.  Next state: N/A
+ * PREEMPT_PENDING: Preemption complete interrupt fired - the callback is
+ * checking the success of the operation. Next state: COMPLETE, NONE.
+ * PREEMPT_COMPLETE: The complete interrupt fired but the status has not yet
+ * indicated that the preemption was done.  This is likely a temporary condition
+ * and a worker has been scheduled to clean up
+ */
+
+enum preempt_state {
+	PREEMPT_NONE = 0,
+	PREEMPT_START,
+	PREEMPT_TRIGGERED,
+	PREEMPT_FAULTED,
+	PREEMPT_PENDING,
+	PREEMPT_COMPLETE
+};
+
+/*
+ * struct a5xx_preempt_record is a shared buffer between the microcode and the
+ * CPU to store the state for preemption. The record itself is much larger
+ * (64k) but most of that is used by the CP for storage.
+ *
+ * There is a preemption record assigned per ringbuffer. When the CPU triggers a
+ * preemption, it fills out the record with the useful information (wptr, ring
+ * base, etc) and the microcode uses that information to set up the CP following
+ * the preemption.  When a ring is switched out, the CP will save the ringbuffer
+ * state back to the record. In this way, once the records are properly set up
+ * the CPU can quickly switch back and forth between ringbuffers by only
+ * updating a few registers (often only the wptr).
+ *
+ * These are the CPU aware registers in the record:
+ * @magic: Must always be 0x27C4BAFC
+ * @info: Type of the record - written 0 by the CPU, updated by the CP
+ * @data: Data field from SET_RENDER_MODE or a checkpoint. Written and used by
+ * the CP
+ * @cntl: Value of RB_CNTL written by CPU, save/restored by CP
+ * @rptr: Value of RB_RPTR written by CPU, save/restored by CP
+ * @wptr: Value of RB_WPTR written by CPU, save/restored by CP
+ * @rptr_addr: Value of RB_RPTR_ADDR written by CPU, save/restored by CP
+ * @rbase: Value of RB_BASE written by CPU, save/restored by CP
+ * @counter: GPU address of the storage area for the performance counters
+ */
+struct a5xx_preempt_record {
+	uint32_t magic;
+	uint32_t info;
+	uint32_t data;
+	uint32_t cntl;
+	uint32_t rptr;
+	uint32_t wptr;
+	uint64_t rptr_addr;
+	uint64_t rbase;
+	uint64_t counter;
+};
+
+/* Magic identifier for the preemption record */
+#define A5XX_PREEMPT_RECORD_MAGIC 0x27C4BAFCUL
+
+/*
+ * Even though the structure above is only a few bytes, we need a full 64k to
+ * store the entire preemption record from the CP
+ */
+#define A5XX_PREEMPT_RECORD_SIZE (64 * 1024)
+
+/*
+ * The preemption counter block is a storage area for the value of the
+ * preemption counters that are saved immediately before context switch. We
+ * append it on to the end of the allocadtion for the preemption record.
+ */
+#define A5XX_PREEMPT_COUNTER_SIZE (16 * 4)
+
+
 int a5xx_power_init(struct msm_gpu *gpu);
 void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
 
@@ -58,4 +151,16 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 
 bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 
+void a5xx_preempt_init(struct msm_gpu *gpu);
+void a5xx_preempt_hw_init(struct msm_gpu *gpu);
+void a5xx_preempt_trigger(struct msm_gpu *gpu);
+void a5xx_preempt_irq(struct msm_gpu *gpu);
+void a5xx_preempt_fini(struct msm_gpu *gpu);
+
+/* Return true if we are in a preempt state */
+static inline bool a5xx_in_preempt(struct a5xx_gpu *a5xx_gpu)
+{
+	return !(atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_NONE);
+}
+
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
new file mode 100644
index 0000000..348ead7
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -0,0 +1,367 @@
+/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "msm_gem.h"
+#include "a5xx_gpu.h"
+
+static void *alloc_kernel_bo(struct drm_device *drm, struct msm_gpu *gpu,
+		size_t size, uint32_t flags, struct drm_gem_object **bo,
+		u64 *iova)
+{
+	struct drm_gem_object *_bo;
+	u64 _iova;
+	void *ptr;
+	int ret;
+
+	mutex_lock(&drm->struct_mutex);
+	_bo = msm_gem_new(drm, size, flags);
+	mutex_unlock(&drm->struct_mutex);
+
+	if (IS_ERR(_bo))
+		return _bo;
+
+	ret = msm_gem_get_iova(_bo, gpu->aspace, &_iova);
+	if (ret)
+		goto out;
+
+	ptr = msm_gem_get_vaddr(_bo);
+	if (!ptr) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (bo)
+		*bo = _bo;
+	if (iova)
+		*iova = _iova;
+
+	return ptr;
+out:
+	drm_gem_object_unreference_unlocked(_bo);
+	return ERR_PTR(ret);
+}
+
+/*
+ * Try to transition the preemption state from old to new. Return
+ * true on success or false if the original state wasn't 'old'
+ */
+static inline bool try_preempt_state(struct a5xx_gpu *a5xx_gpu,
+		enum preempt_state old, enum preempt_state new)
+{
+	enum preempt_state cur = atomic_cmpxchg(&a5xx_gpu->preempt_state,
+		old, new);
+
+	return (cur == old);
+}
+
+/*
+ * Force the preemption state to the specified state.  This is used in cases
+ * where the current state is known and won't change
+ */
+static inline void set_preempt_state(struct a5xx_gpu *gpu,
+		enum preempt_state new)
+{
+	/* atomic_set() doesn't automatically do barriers, so one before.. */
+	smp_wmb();
+	atomic_set(&gpu->preempt_state, new);
+	/* ... and one after*/
+	smp_wmb();
+}
+
+/* Write the most recent wptr for the given ring into the hardware */
+static inline void update_wptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+	unsigned long flags;
+	uint32_t wptr;
+
+	if (!ring)
+		return;
+
+	spin_lock_irqsave(&ring->lock, flags);
+	wptr = (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
+}
+
+/* Return the highest priority ringbuffer with something in it */
+static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	unsigned long flags;
+	int i;
+
+	for (i = gpu->nr_rings - 1; i >= 0; i--) {
+		bool empty;
+		struct msm_ringbuffer *ring = gpu->rb[i];
+
+		spin_lock_irqsave(&ring->lock, flags);
+		empty = (get_wptr(ring) == adreno_gpu->memptrs->rptr[ring->id]);
+		spin_unlock_irqrestore(&ring->lock, flags);
+
+		if (!empty)
+			return ring;
+	}
+
+	return NULL;
+}
+
+static void a5xx_preempt_worker(struct work_struct *work)
+{
+	struct a5xx_gpu *a5xx_gpu =
+		container_of(work, struct a5xx_gpu, preempt_work);
+	struct msm_gpu *gpu = &a5xx_gpu->base.base;
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_COMPLETE) {
+		uint32_t status = gpu_read(gpu,
+			REG_A5XX_CP_CONTEXT_SWITCH_CNTL);
+
+		if (status == 0) {
+			del_timer(&a5xx_gpu->preempt_timer);
+			a5xx_gpu->cur_ring = a5xx_gpu->next_ring;
+			a5xx_gpu->next_ring = NULL;
+
+			update_wptr(gpu, a5xx_gpu->cur_ring);
+
+			set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+			return;
+		}
+
+		dev_err(dev->dev, "%s: Preemption failed to complete\n",
+			gpu->name);
+	} else if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_FAULTED)
+		dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
+	else
+		return;
+
+	/* Trigger recovery */
+	queue_work(priv->wq, &gpu->recover_work);
+}
+
+static void a5xx_preempt_timer(unsigned long data)
+{
+	struct a5xx_gpu *a5xx_gpu = (struct a5xx_gpu *) data;
+	struct msm_gpu *gpu = &a5xx_gpu->base.base;
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED, PREEMPT_FAULTED))
+		return;
+
+	queue_work(priv->wq, &a5xx_gpu->preempt_work);
+}
+
+/* Try to trigger a preemption switch */
+void a5xx_preempt_trigger(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	unsigned long flags;
+	struct msm_ringbuffer *ring;
+
+	if (gpu->nr_rings == 1)
+		return;
+
+	/*
+	 * Try to start preemption by moving from NONE to START. If
+	 * unsuccessful, a preemption is already in flight
+	 */
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_NONE, PREEMPT_START))
+		return;
+
+	/* Get the next ring to preempt to */
+	ring = get_next_ring(gpu);
+
+	/*
+	 * If no ring is populated or the highest priority ring is the current
+	 * one do nothing except to update the wptr to the latest and greatest
+	 */
+	if (!ring || (a5xx_gpu->cur_ring == ring)) {
+		update_wptr(gpu, ring);
+
+		/* Set the state back to NONE */
+		set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+		return;
+	}
+
+	/* Make sure the wptr doesn't update while we're in motion */
+	spin_lock_irqsave(&ring->lock, flags);
+	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	/* Set the address of the incoming preemption record */
+	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
+		REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_HI,
+		a5xx_gpu->preempt_iova[ring->id]);
+
+	a5xx_gpu->next_ring = ring;
+
+	/* Start a timer to catch a stuck preemption */
+	mod_timer(&a5xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
+
+	/* Set the preemption state to triggered */
+	set_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED);
+
+	/* Make sure everything is written before hitting the button */
+	wmb();
+
+	/* And actually start the preemption */
+	gpu_write(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL, 1);
+}
+
+void a5xx_preempt_irq(struct msm_gpu *gpu)
+{
+	uint32_t status;
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED, PREEMPT_PENDING))
+		return;
+
+	status = gpu_read(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL);
+	if (status) {
+		set_preempt_state(a5xx_gpu, PREEMPT_COMPLETE);
+		queue_work(priv->wq, &a5xx_gpu->preempt_work);
+		return;
+	}
+
+	del_timer(&a5xx_gpu->preempt_timer);
+
+	a5xx_gpu->cur_ring = a5xx_gpu->next_ring;
+	a5xx_gpu->next_ring = NULL;
+
+	update_wptr(gpu, a5xx_gpu->cur_ring);
+
+	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+}
+
+void a5xx_preempt_hw_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	if (gpu->nr_rings > 1) {
+		/* Clear the preemption records */
+		FOR_EACH_RING(gpu, ring, i) {
+			if (ring) {
+				a5xx_gpu->preempt[ring->id]->wptr = 0;
+				a5xx_gpu->preempt[ring->id]->rptr = 0;
+				a5xx_gpu->preempt[ring->id]->rbase = ring->iova;
+			}
+		}
+	}
+
+	/* Write a 0 to signal that we aren't switching pagetables */
+	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_LO,
+		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI, 0);
+
+	/* Reset the preemption state */
+	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+
+	/* Always come up on rb 0 */
+	a5xx_gpu->cur_ring = gpu->rb[0];
+}
+
+static int preempt_init_ring(struct a5xx_gpu *a5xx_gpu,
+		struct msm_ringbuffer *ring)
+{
+	struct adreno_gpu *adreno_gpu = &a5xx_gpu->base;
+	struct msm_gpu *gpu = &adreno_gpu->base;
+	struct a5xx_preempt_record *ptr;
+	struct drm_gem_object *bo;
+	u64 iova;
+
+	ptr = alloc_kernel_bo(gpu->dev, gpu,
+		A5XX_PREEMPT_RECORD_SIZE + A5XX_PREEMPT_COUNTER_SIZE,
+		MSM_BO_UNCACHED, &bo, &iova);
+
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	a5xx_gpu->preempt_bo[ring->id] = bo;
+	a5xx_gpu->preempt_iova[ring->id] = iova;
+	a5xx_gpu->preempt[ring->id] = ptr;
+
+	/* Set up the defaults on the preemption record */
+
+	ptr->magic = A5XX_PREEMPT_RECORD_MAGIC;
+	ptr->info = 0;
+	ptr->data = 0;
+	ptr->cntl = MSM_GPU_RB_CNTL_DEFAULT;
+	ptr->rptr_addr = rbmemptr(adreno_gpu, ring->id, rptr);
+	ptr->counter = iova + A5XX_PREEMPT_RECORD_SIZE;
+
+	return 0;
+}
+
+void a5xx_preempt_fini(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring || !a5xx_gpu->preempt_bo[i])
+			continue;
+
+		if (a5xx_gpu->preempt[i])
+			msm_gem_put_vaddr(a5xx_gpu->preempt_bo[i]);
+
+		if (a5xx_gpu->preempt_iova[i])
+			msm_gem_put_iova(a5xx_gpu->preempt_bo[i], gpu->aspace);
+
+		drm_gem_object_unreference_unlocked(a5xx_gpu->preempt_bo[i]);
+
+		a5xx_gpu->preempt_bo[i] = NULL;
+	}
+}
+
+void a5xx_preempt_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	/* No preemption if we only have one ring */
+	if (gpu->nr_rings <= 1)
+		return;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		if (preempt_init_ring(a5xx_gpu, ring)) {
+			/*
+			 * On any failure our adventure is over. Clean up and
+			 * set nr_rings to 1 to force preemption off
+			 */
+			a5xx_preempt_fini(gpu);
+			gpu->nr_rings = 1;
+
+			return;
+		}
+	}
+
+	INIT_WORK(&a5xx_gpu->preempt_work, a5xx_preempt_worker);
+
+	setup_timer(&a5xx_gpu->preempt_timer, a5xx_preempt_timer,
+		(unsigned long) a5xx_gpu);
+}
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index aca1fc3..5149188 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -93,11 +93,6 @@ int adreno_hw_init(struct msm_gpu *gpu)
 	return 0;
 }
 
-static uint32_t get_wptr(struct msm_ringbuffer *ring)
-{
-	return ring->cur - ring->start;
-}
-
 /* Use this helper to read rptr, since a430 doesn't update rptr in memory */
 static uint32_t get_rptr(struct adreno_gpu *adreno_gpu,
 		struct msm_ringbuffer *ring)
@@ -145,6 +140,7 @@ void adreno_recover(struct msm_gpu *gpu)
 		if (!ring)
 			continue;
 
+		/* No need for a lock here, nobody else is peeking in */
 		ring->cur = ring->start;
 		ring->next = ring->start;
 
@@ -269,8 +265,9 @@ bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		return true;
 
 	/* TODO maybe we need to reset GPU here to recover from hang? */
-	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d!\n", gpu->name,
-		ring->id);
+	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d rptr/wptr = %X/%X\n",
+		gpu->name, ring->id, get_rptr(adreno_gpu, ring), wptr);
+
 	return false;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index f5118ad..4fcccd4 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -334,6 +334,11 @@ static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
 	adreno_gpu_write(gpu, hi, upper_32_bits(data));
 }
 
+static inline uint32_t get_wptr(struct msm_ringbuffer *ring)
+{
+	return ring->cur - ring->start;
+}
+
 /*
  * Given a register and a count, return a value to program into
  * REG_CP_PROTECT_REG(n) - this will block both reads and writes for _len
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 7ff7a83..e54baba 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -78,7 +78,7 @@ struct msm_vblank_ctrl {
 	spinlock_t lock;
 };
 
-#define MSM_GPU_MAX_RINGS 1
+#define MSM_GPU_MAX_RINGS 4
 
 struct msm_drm_private {
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index b885979..f42ce09 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -50,6 +50,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 	ring->next  = ring->start;
 	ring->cur   = ring->start;
 
+	spin_lock_init(&ring->lock);
+
 	return ring;
 
 fail:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 865b21a..0f91db0 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -29,6 +29,7 @@ struct msm_ringbuffer {
 	/* last_fence == completed_fence --> no pending work */
 	uint32_t last_fence;
 	uint32_t completed_fence;
+	spinlock_t lock;
 };
 
 struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/11] drm/msm: A5XX preemption
  2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
                   ` (8 preceding siblings ...)
  2017-02-06 17:39 ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse
@ 2017-02-06 17:59 ` Daniel Vetter
  2017-02-06 18:23   ` Daniel Stone
  2017-02-06 18:29   ` Alex Deucher
       [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  10 siblings, 2 replies; 36+ messages in thread
From: Daniel Vetter @ 2017-02-06 17:59 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: linux-arm-msm, Intel Graphics Development, freedreno, dri-devel

On Mon, Feb 06, 2017 at 10:39:28AM -0700, Jordan Crouse wrote:
> This series of patches implements multiple ringbuffers and preemption for Adreno
> A5XX targets. Preemption allows a command to be interrupted at specific
> preemption points and execution switched to a different ringbuffer.
> 
> The software alogrithm uses preemption to enforce quality of service for
> priority levels - commands to a certain ring preempt the rings of lower
> priority. Note that priority is a software construct; the driver chooses a ring
> to switch to and the hardware executes. This is important because it shows that
> preemption can be used for things other than priority (timeslices for quality of
> service for example).
> 
> This initial series implements 4 ringbuffers to give sufficient coverage for the
> range of priority levels requested by the GLES and compute extensions. The
> targeted ringbuffer is specified in the command submission flags. The default
> ring is 0 (lowest priority).

Link to userspace part that implements these extensions? Also which
gles/compute extensions are you talking about? Asking not just because of
the open source userspace requirement, but also because we want to
upstream a scheduler on the i915 side. Getting alignment on that across
drm drivers would be sweet.

Adding intel-gfx, I'll poke the folks working on this too.
-Daniel

> 
> Jordan
> 
> Jordan Crouse (11):
>   drm/msm: Make sure to detach the MMU during GPU cleanup
>   drm/msm: Improve the zap shader
>   drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
>   drm/msm: Remove idle function hook
>   drm/msm: get an iova from the address space instead of an id
>   drm/msm: Add a struct to pass configuration to msm_gpu_init()
>   drm/msm: Remove memptrs->wptr
>   drm/msm: Support multiple ringbuffers
>   drm/msm: Shadow current pointer in the ring until command is complete
>   drm/msm: Make the value of RB_CNTL (almost) generic
>   drm/msm: Implement preemption for A5XX targets
> 
>  drivers/gpu/drm/msm/Makefile              |   1 +
>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  13 +-
>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  13 +-
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 278 +++++++++++++++++-----
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 106 +++++++++
>  drivers/gpu/drm/msm/adreno/a5xx_power.c   |  11 +-
>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 367 ++++++++++++++++++++++++++++++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 215 +++++++++++------
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  42 ++--
>  drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +-
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 +-
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   3 -
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 +-
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +-
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 +-
>  drivers/gpu/drm/msm/msm_drv.c             |  43 ++--
>  drivers/gpu/drm/msm/msm_drv.h             |  27 ++-
>  drivers/gpu/drm/msm/msm_fb.c              |  15 +-
>  drivers/gpu/drm/msm/msm_fbdev.c           |  10 +-
>  drivers/gpu/drm/msm/msm_fence.c           |  85 +++++--
>  drivers/gpu/drm/msm/msm_fence.h           |  13 +-
>  drivers/gpu/drm/msm/msm_gem.c             | 124 +++++++---
>  drivers/gpu/drm/msm/msm_gem.h             |   5 +-
>  drivers/gpu/drm/msm/msm_gem_submit.c      |  14 +-
>  drivers/gpu/drm/msm/msm_gpu.c             | 140 +++++++-----
>  drivers/gpu/drm/msm/msm_gpu.h             |  54 ++++-
>  drivers/gpu/drm/msm/msm_kms.h             |   3 +
>  drivers/gpu/drm/msm/msm_ringbuffer.c      |  14 +-
>  drivers/gpu/drm/msm/msm_ringbuffer.h      |  21 +-
>  include/uapi/drm/msm_drm.h                |   9 +-
>  33 files changed, 1324 insertions(+), 389 deletions(-)
>  create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> 
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/11] drm/msm: A5XX preemption
  2017-02-06 17:59 ` [PATCH 00/11] drm/msm: A5XX preemption Daniel Vetter
@ 2017-02-06 18:23   ` Daniel Stone
  2017-02-06 18:29     ` [Intel-gfx] " Rob Clark
  2017-02-06 18:29   ` Alex Deucher
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel Stone @ 2017-02-06 18:23 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jordan Crouse, linux-arm-msm, Intel Graphics Development,
	freedreno, dri-devel

Hi,

On 6 February 2017 at 17:59, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Feb 06, 2017 at 10:39:28AM -0700, Jordan Crouse wrote:
>> This initial series implements 4 ringbuffers to give sufficient coverage for the
>> range of priority levels requested by the GLES and compute extensions. The
>> targeted ringbuffer is specified in the command submission flags. The default
>> ring is 0 (lowest priority).
>
> Link to userspace part that implements these extensions? Also which
> gles/compute extensions are you talking about? Asking not just because of
> the open source userspace requirement, but also because we want to
> upstream a scheduler on the i915 side. Getting alignment on that across
> drm drivers would be sweet.

Assuming he meant EGL rather than GLES, this is the usual one:
https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Intel-gfx] [PATCH 00/11] drm/msm: A5XX preemption
  2017-02-06 18:23   ` Daniel Stone
@ 2017-02-06 18:29     ` Rob Clark
  0 siblings, 0 replies; 36+ messages in thread
From: Rob Clark @ 2017-02-06 18:29 UTC (permalink / raw)
  To: Daniel Stone
  Cc: Daniel Vetter, linux-arm-msm, Jordan Crouse, freedreno,
	Intel Graphics Development, dri-devel, Chris Wilson

On Mon, Feb 6, 2017 at 1:23 PM, Daniel Stone <daniel@fooishbar.org> wrote:
> Hi,
>
> On 6 February 2017 at 17:59, Daniel Vetter <daniel@ffwll.ch> wrote:
>> On Mon, Feb 06, 2017 at 10:39:28AM -0700, Jordan Crouse wrote:
>>> This initial series implements 4 ringbuffers to give sufficient coverage for the
>>> range of priority levels requested by the GLES and compute extensions. The
>>> targeted ringbuffer is specified in the command submission flags. The default
>>> ring is 0 (lowest priority).
>>
>> Link to userspace part that implements these extensions? Also which
>> gles/compute extensions are you talking about? Asking not just because of
>> the open source userspace requirement, but also because we want to
>> upstream a scheduler on the i915 side. Getting alignment on that across
>> drm drivers would be sweet.
>
> Assuming he meant EGL rather than GLES, this is the usual one:
> https://www.khronos.org/registry/EGL/extensions/IMG/EGL_IMG_context_priority.txt
>

There was an RFC for this from Chris.. not sure if it landed yet, but
that is what I was planning to use from the userspace side.

Probably first step would be to enable this without preemption points.
Although I think I have a reasonable idea how the preemption points
work..

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 00/11] drm/msm: A5XX preemption
  2017-02-06 17:59 ` [PATCH 00/11] drm/msm: A5XX preemption Daniel Vetter
  2017-02-06 18:23   ` Daniel Stone
@ 2017-02-06 18:29   ` Alex Deucher
  1 sibling, 0 replies; 36+ messages in thread
From: Alex Deucher @ 2017-02-06 18:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: linux-arm-msm, freedreno, Intel Graphics Development,
	Maling list - DRI developers

On Mon, Feb 6, 2017 at 12:59 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Feb 06, 2017 at 10:39:28AM -0700, Jordan Crouse wrote:
>> This series of patches implements multiple ringbuffers and preemption for Adreno
>> A5XX targets. Preemption allows a command to be interrupted at specific
>> preemption points and execution switched to a different ringbuffer.
>>
>> The software alogrithm uses preemption to enforce quality of service for
>> priority levels - commands to a certain ring preempt the rings of lower
>> priority. Note that priority is a software construct; the driver chooses a ring
>> to switch to and the hardware executes. This is important because it shows that
>> preemption can be used for things other than priority (timeslices for quality of
>> service for example).
>>
>> This initial series implements 4 ringbuffers to give sufficient coverage for the
>> range of priority levels requested by the GLES and compute extensions. The
>> targeted ringbuffer is specified in the command submission flags. The default
>> ring is 0 (lowest priority).
>
> Link to userspace part that implements these extensions? Also which
> gles/compute extensions are you talking about? Asking not just because of
> the open source userspace requirement, but also because we want to
> upstream a scheduler on the i915 side. Getting alignment on that across
> drm drivers would be sweet.

FWIW, we have had a GPU scheduler in amdgpu for several years now.  We
purposely tried to keep it largely separate from our driver so others
could leverage it if they wanted to.  See
drivers/gpu/drm/amd/scheduler in the kernel.

Alex

>
> Adding intel-gfx, I'll poke the folks working on this too.
> -Daniel
>
>>
>> Jordan
>>
>> Jordan Crouse (11):
>>   drm/msm: Make sure to detach the MMU during GPU cleanup
>>   drm/msm: Improve the zap shader
>>   drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
>>   drm/msm: Remove idle function hook
>>   drm/msm: get an iova from the address space instead of an id
>>   drm/msm: Add a struct to pass configuration to msm_gpu_init()
>>   drm/msm: Remove memptrs->wptr
>>   drm/msm: Support multiple ringbuffers
>>   drm/msm: Shadow current pointer in the ring until command is complete
>>   drm/msm: Make the value of RB_CNTL (almost) generic
>>   drm/msm: Implement preemption for A5XX targets
>>
>>  drivers/gpu/drm/msm/Makefile              |   1 +
>>  drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  13 +-
>>  drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  13 +-
>>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 278 +++++++++++++++++-----
>>  drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 106 +++++++++
>>  drivers/gpu/drm/msm/adreno/a5xx_power.c   |  11 +-
>>  drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 367 ++++++++++++++++++++++++++++++
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 215 +++++++++++------
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  42 ++--
>>  drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +-
>>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
>>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 +-
>>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   3 -
>>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 +-
>>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
>>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +-
>>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
>>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 +-
>>  drivers/gpu/drm/msm/msm_drv.c             |  43 ++--
>>  drivers/gpu/drm/msm/msm_drv.h             |  27 ++-
>>  drivers/gpu/drm/msm/msm_fb.c              |  15 +-
>>  drivers/gpu/drm/msm/msm_fbdev.c           |  10 +-
>>  drivers/gpu/drm/msm/msm_fence.c           |  85 +++++--
>>  drivers/gpu/drm/msm/msm_fence.h           |  13 +-
>>  drivers/gpu/drm/msm/msm_gem.c             | 124 +++++++---
>>  drivers/gpu/drm/msm/msm_gem.h             |   5 +-
>>  drivers/gpu/drm/msm/msm_gem_submit.c      |  14 +-
>>  drivers/gpu/drm/msm/msm_gpu.c             | 140 +++++++-----
>>  drivers/gpu/drm/msm/msm_gpu.h             |  54 ++++-
>>  drivers/gpu/drm/msm/msm_kms.h             |   3 +
>>  drivers/gpu/drm/msm/msm_ringbuffer.c      |  14 +-
>>  drivers/gpu/drm/msm/msm_ringbuffer.h      |  21 +-
>>  include/uapi/drm/msm_drm.h                |   9 +-
>>  33 files changed, 1324 insertions(+), 389 deletions(-)
>>  create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c
>>
>> --
>> 1.9.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
  2017-02-06 17:39 ` [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
@ 2017-02-06 19:20   ` Emil Velikov
       [not found]     ` <CACvgo513+d19O2rzZ8NXEFgojUQkm2XPae-AdOXXReLM_a1euw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Emil Velikov @ 2017-02-06 19:20 UTC (permalink / raw)
  To: Jordan Crouse; +Cc: freedreno, linux-arm-msm, ML dri-devel

Hi Jordan,

On 6 February 2017 at 17:39, Jordan Crouse <jcrouse@codeaurora.org> wrote:
> Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
> user sets 'hint' to non-zero it means that they want a IOVA for the
> GEM object instead of a mmap() offset. Return the iova in the 'offset'
> member.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/msm_drv.c | 29 +++++++++++++++++++++++++----
>  include/uapi/drm/msm_drm.h    |  4 ++--
>  2 files changed, 27 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index e29bb66..1e4e022 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -677,6 +677,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
>         return ret;
>  }
>
> +static int msm_ioctl_gem_info_iova(struct drm_device *dev,
> +               struct drm_gem_object *obj, uint64_t *iova)
> +{
> +       struct msm_drm_private *priv = dev->dev_private;
> +
> +       if (!priv->gpu)
> +               return -EINVAL;
> +
Not too familiar with msm so perhaps a silly question: how can we trigger this ?

> +       return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
> +}
> +
>  static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
>                 struct drm_file *file)
>  {
> @@ -684,14 +695,24 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
>         struct drm_gem_object *obj;
>         int ret = 0;
>
> -       if (args->pad)
> -               return -EINVAL;
> -
Please keep the input validation before doing any work (the lookup below).

>         obj = drm_gem_object_lookup(file, args->handle);
>         if (!obj)
>                 return -ENOENT;
>
> -       args->offset = msm_gem_mmap_offset(obj);
> +       /*
> +        * If the hint variable is set, it means that the user wants a IOVA for
> +        * this buffer.  Return the address from the GPU because that is
> +        * probably what it is looking for
> +        */
> +       if (args->hint) {
One could also rename hint to flags. Regardless of the name you can
use hint/flags as a bitmask.

> +               uint64_t iova;
> +
> +               ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
> +               if (!ret)
> +                       args->offset = iova;
> +       } else {
> +               args->offset = msm_gem_mmap_offset(obj);
> +       }
>
>         drm_gem_object_unreference_unlocked(obj);
>
> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
> index 4d5d6a2..045ad20 100644
> --- a/include/uapi/drm/msm_drm.h
> +++ b/include/uapi/drm/msm_drm.h
> @@ -105,8 +105,8 @@ struct drm_msm_gem_new {
>
>  struct drm_msm_gem_info {
>         __u32 handle;         /* in */
> -       __u32 pad;
> -       __u64 offset;         /* out, offset to pass to mmap() */
> +       __u32 hint;           /* in */
Please add explicit #define for the currently valid hints/flags.

> +       __u64 offset;         /* out, mmap() offset if hint is 0, iova if 1 */
Other drivers have used anonymous unions to improve the naming, in
such situations.

struct drm_msm_gem_info {
        __u32 handle;           /* in */
        __u32 hint;             /* in */
        union {                 /* out */
              __u64 offset;         /* offset if hint is FOO */
              __u64 iova;           /* iova if hint is BAR */
        };
};


Thanks
Emil

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
       [not found]     ` <CACvgo513+d19O2rzZ8NXEFgojUQkm2XPae-AdOXXReLM_a1euw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-06 19:57       ` Rob Clark
       [not found]         ` <CAF6AEGvUoW2695_HjgfGbpbPaSnOB2gfPa=3UMTDGvom+DxcwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2017-02-06 19:57 UTC (permalink / raw)
  To: Emil Velikov
  Cc: linux-arm-msm, Jordan Crouse,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ML dri-devel

On Mon, Feb 6, 2017 at 2:20 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
> Hi Jordan,
>
> On 6 February 2017 at 17:39, Jordan Crouse <jcrouse@codeaurora.org> wrote:
>> Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
>> user sets 'hint' to non-zero it means that they want a IOVA for the
>> GEM object instead of a mmap() offset. Return the iova in the 'offset'
>> member.
>>
>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>> ---
>>  drivers/gpu/drm/msm/msm_drv.c | 29 +++++++++++++++++++++++++----
>>  include/uapi/drm/msm_drm.h    |  4 ++--
>>  2 files changed, 27 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>> index e29bb66..1e4e022 100644
>> --- a/drivers/gpu/drm/msm/msm_drv.c
>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>> @@ -677,6 +677,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
>>         return ret;
>>  }
>>
>> +static int msm_ioctl_gem_info_iova(struct drm_device *dev,
>> +               struct drm_gem_object *obj, uint64_t *iova)
>> +{
>> +       struct msm_drm_private *priv = dev->dev_private;
>> +
>> +       if (!priv->gpu)
>> +               return -EINVAL;
>> +
> Not too familiar with msm so perhaps a silly question: how can we trigger this ?

if gpu has not loaded (for example, missing firmware, or kernel does
not have iommu, etc)

>> +       return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
>> +}
>> +
>>  static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
>>                 struct drm_file *file)
>>  {
>> @@ -684,14 +695,24 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
>>         struct drm_gem_object *obj;
>>         int ret = 0;
>>
>> -       if (args->pad)
>> -               return -EINVAL;
>> -
> Please keep the input validation before doing any work (the lookup below).

+1 for making this args->flags and checking against
GEM_INFO_VALID_FLAGS up front.  We may want to use some of those other
bits some day

>>         obj = drm_gem_object_lookup(file, args->handle);
>>         if (!obj)
>>                 return -ENOENT;
>>
>> -       args->offset = msm_gem_mmap_offset(obj);
>> +       /*
>> +        * If the hint variable is set, it means that the user wants a IOVA for
>> +        * this buffer.  Return the address from the GPU because that is
>> +        * probably what it is looking for
>> +        */
>> +       if (args->hint) {
> One could also rename hint to flags. Regardless of the name you can
> use hint/flags as a bitmask.
>
>> +               uint64_t iova;
>> +
>> +               ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
>> +               if (!ret)
>> +                       args->offset = iova;
>> +       } else {
>> +               args->offset = msm_gem_mmap_offset(obj);
>> +       }
>>
>>         drm_gem_object_unreference_unlocked(obj);
>>
>> diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
>> index 4d5d6a2..045ad20 100644
>> --- a/include/uapi/drm/msm_drm.h
>> +++ b/include/uapi/drm/msm_drm.h
>> @@ -105,8 +105,8 @@ struct drm_msm_gem_new {
>>
>>  struct drm_msm_gem_info {
>>         __u32 handle;         /* in */
>> -       __u32 pad;
>> -       __u64 offset;         /* out, offset to pass to mmap() */
>> +       __u32 hint;           /* in */
> Please add explicit #define for the currently valid hints/flags.
>
>> +       __u64 offset;         /* out, mmap() offset if hint is 0, iova if 1 */
> Other drivers have used anonymous unions to improve the naming, in
> such situations.
>
> struct drm_msm_gem_info {
>         __u32 handle;           /* in */
>         __u32 hint;             /* in */
>         union {                 /* out */
>               __u64 offset;         /* offset if hint is FOO */
>               __u64 iova;           /* iova if hint is BAR */
>         };
> };

is anon union legit for uabi?  I was under the impression that for
some reason it was not.  But I could be wrong.

BR,
-R

>
> Thanks
> Emil
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
       [not found]         ` <CAF6AEGvUoW2695_HjgfGbpbPaSnOB2gfPa=3UMTDGvom+DxcwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-02-06 20:24           ` Emil Velikov
  2017-02-06 21:01             ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Emil Velikov @ 2017-02-06 20:24 UTC (permalink / raw)
  To: Rob Clark
  Cc: linux-arm-msm, Jordan Crouse,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, ML dri-devel

On 6 February 2017 at 19:57, Rob Clark <robdclark@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 2:20 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
>> Hi Jordan,
>>
>> On 6 February 2017 at 17:39, Jordan Crouse <jcrouse@codeaurora.org> wrote:
>>> Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
>>> user sets 'hint' to non-zero it means that they want a IOVA for the
>>> GEM object instead of a mmap() offset. Return the iova in the 'offset'
>>> member.
>>>
>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>> ---
>>>  drivers/gpu/drm/msm/msm_drv.c | 29 +++++++++++++++++++++++++----
>>>  include/uapi/drm/msm_drm.h    |  4 ++--
>>>  2 files changed, 27 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>>> index e29bb66..1e4e022 100644
>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>> @@ -677,6 +677,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
>>>         return ret;
>>>  }
>>>
>>> +static int msm_ioctl_gem_info_iova(struct drm_device *dev,
>>> +               struct drm_gem_object *obj, uint64_t *iova)
>>> +{
>>> +       struct msm_drm_private *priv = dev->dev_private;
>>> +
>>> +       if (!priv->gpu)
>>> +               return -EINVAL;
>>> +
>> Not too familiar with msm so perhaps a silly question: how can we trigger this ?
>
> if gpu has not loaded (for example, missing firmware, or kernel does
> not have iommu, etc)
>
Thanks Rob. I was under the impression that in such cases the driver
will/should fail to load.

>>> +       __u64 offset;         /* out, mmap() offset if hint is 0, iova if 1 */
>> Other drivers have used anonymous unions to improve the naming, in
>> such situations.
>>
>> struct drm_msm_gem_info {
>>         __u32 handle;           /* in */
>>         __u32 hint;             /* in */
>>         union {                 /* out */
>>               __u64 offset;         /* offset if hint is FOO */
>>               __u64 iova;           /* iova if hint is BAR */
>>         };
>> };
>
> is anon union legit for uabi?  I was under the impression that for
> some reason it was not.  But I could be wrong.
>
Haven't seen any wording against it and we do have a few instances in
DRM UABI land.
Either way it was just an idea.

Regards,
Emil
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
  2017-02-06 20:24           ` Emil Velikov
@ 2017-02-06 21:01             ` Rob Clark
  0 siblings, 0 replies; 36+ messages in thread
From: Rob Clark @ 2017-02-06 21:01 UTC (permalink / raw)
  To: Emil Velikov; +Cc: Jordan Crouse, linux-arm-msm, freedreno, ML dri-devel

On Mon, Feb 6, 2017 at 3:24 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
> On 6 February 2017 at 19:57, Rob Clark <robdclark@gmail.com> wrote:
>> On Mon, Feb 6, 2017 at 2:20 PM, Emil Velikov <emil.l.velikov@gmail.com> wrote:
>>> Hi Jordan,
>>>
>>> On 6 February 2017 at 17:39, Jordan Crouse <jcrouse@codeaurora.org> wrote:
>>>> Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
>>>> user sets 'hint' to non-zero it means that they want a IOVA for the
>>>> GEM object instead of a mmap() offset. Return the iova in the 'offset'
>>>> member.
>>>>
>>>> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
>>>> ---
>>>>  drivers/gpu/drm/msm/msm_drv.c | 29 +++++++++++++++++++++++++----
>>>>  include/uapi/drm/msm_drm.h    |  4 ++--
>>>>  2 files changed, 27 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
>>>> index e29bb66..1e4e022 100644
>>>> --- a/drivers/gpu/drm/msm/msm_drv.c
>>>> +++ b/drivers/gpu/drm/msm/msm_drv.c
>>>> @@ -677,6 +677,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
>>>>         return ret;
>>>>  }
>>>>
>>>> +static int msm_ioctl_gem_info_iova(struct drm_device *dev,
>>>> +               struct drm_gem_object *obj, uint64_t *iova)
>>>> +{
>>>> +       struct msm_drm_private *priv = dev->dev_private;
>>>> +
>>>> +       if (!priv->gpu)
>>>> +               return -EINVAL;
>>>> +
>>> Not too familiar with msm so perhaps a silly question: how can we trigger this ?
>>
>> if gpu has not loaded (for example, missing firmware, or kernel does
>> not have iommu, etc)
>>
> Thanks Rob. I was under the impression that in such cases the driver
> will/should fail to load.

radeon/nouveau/i915 will all, iirc, fail to load.  I made drm/msm
defer to loading gpu until first open() since having to constantly
rebuild initrd seemed annoying ;-)

>>>> +       __u64 offset;         /* out, mmap() offset if hint is 0, iova if 1 */
>>> Other drivers have used anonymous unions to improve the naming, in
>>> such situations.
>>>
>>> struct drm_msm_gem_info {
>>>         __u32 handle;           /* in */
>>>         __u32 hint;             /* in */
>>>         union {                 /* out */
>>>               __u64 offset;         /* offset if hint is FOO */
>>>               __u64 iova;           /* iova if hint is BAR */
>>>         };
>>> };
>>
>> is anon union legit for uabi?  I was under the impression that for
>> some reason it was not.  But I could be wrong.
>>
> Haven't seen any wording against it and we do have a few instances in
> DRM UABI land.
> Either way it was just an idea.

hmm, ok, if we are already using it in uabi (and not just ancient
ioctls) then maybe I am wrong..

BR,
-R


> Regards,
> Emil

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] drm/msm: Implement preemption for A5XX targets
       [not found]   ` <1486402779-9024-12-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-02-08 20:30     ` Stephen Boyd
       [not found]       ` <8696f3b7-1fbd-309a-1d68-b2f8ad89a30c-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  0 siblings, 1 reply; 36+ messages in thread
From: Stephen Boyd @ 2017-02-08 20:30 UTC (permalink / raw)
  To: Jordan Crouse, freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 02/06/2017 09:39 AM, Jordan Crouse wrote:
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> new file mode 100644
> index 0000000..348ead7
> --- /dev/null
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> @@ -0,0 +1,367 @@
>
> +/*
> + * Try to transition the preemption state from old to new. Return
> + * true on success or false if the original state wasn't 'old'
> + */
> +static inline bool try_preempt_state(struct a5xx_gpu *a5xx_gpu,
> +		enum preempt_state old, enum preempt_state new)
> +{
> +	enum preempt_state cur = atomic_cmpxchg(&a5xx_gpu->preempt_state,
> +		old, new);
> +
> +	return (cur == old);
> +}
> +
> +/*
> + * Force the preemption state to the specified state.  This is used in cases
> + * where the current state is known and won't change
> + */
> +static inline void set_preempt_state(struct a5xx_gpu *gpu,
> +		enum preempt_state new)
> +{
> +	/* atomic_set() doesn't automatically do barriers, so one before.. */
> +	smp_wmb();
> +	atomic_set(&gpu->preempt_state, new);
> +	/* ... and one after*/
> +	smp_wmb();

Should these smp_wmb()s be replaced with smp_mb__before_atomic() and
smp_mb__after_atomic()? The latter is stronger (mb() instead of wmb())
but otherwise that's typically what is used with atomics. Also, it would
be nice if the comments described what we're synchronizing with.


> +
> +static void a5xx_preempt_worker(struct work_struct *work)
> +{
> +	struct a5xx_gpu *a5xx_gpu =
> +		container_of(work, struct a5xx_gpu, preempt_work);
> +	struct msm_gpu *gpu = &a5xx_gpu->base.base;
> +	struct drm_device *dev = gpu->dev;
> +	struct msm_drm_private *priv = dev->dev_private;
> +
> +	if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_COMPLETE) {
> +		uint32_t status = gpu_read(gpu,
> +			REG_A5XX_CP_CONTEXT_SWITCH_CNTL);

Why does this worker check the status again? The irq may fire but the
hardware is still indicating "pending"? And why do we fork off this code
to a workqueue? Can't we do this stuff in the irq handler or timer
handler and drop this worker?

> +
> +		if (status == 0) {
> +			del_timer(&a5xx_gpu->preempt_timer);
> +			a5xx_gpu->cur_ring = a5xx_gpu->next_ring;
> +			a5xx_gpu->next_ring = NULL;
> +
> +			update_wptr(gpu, a5xx_gpu->cur_ring);
> +
> +			set_preempt_state(a5xx_gpu, PREEMPT_NONE);
> +			return;
> +		}
> +
> +		dev_err(dev->dev, "%s: Preemption failed to complete\n",
> +			gpu->name);
> +	} else if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_FAULTED)
> +		dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
> +	else
> +		return;
> +
> +	/* Trigger recovery */
> +	queue_work(priv->wq, &gpu->recover_work);
> +}
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index f5118ad..4fcccd4 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -334,6 +334,11 @@ static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
>  	adreno_gpu_write(gpu, hi, upper_32_bits(data));
>  }
>  
> +static inline uint32_t get_wptr(struct msm_ringbuffer *ring)

const struct msm_ringbuffer?

> +{
> +	return ring->cur - ring->start;
> +}
> +
>  /*
>   * Given a register and a count, return a value to program into
>   * REG_CP_PROTECT_REG(n) - this will block both reads and writes for _len
>
-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] drm/msm: Implement preemption for A5XX targets
       [not found]       ` <8696f3b7-1fbd-309a-1d68-b2f8ad89a30c-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-02-08 23:00         ` Jordan Crouse
  2017-02-09  0:03           ` Stephen Boyd
  0 siblings, 1 reply; 36+ messages in thread
From: Jordan Crouse @ 2017-02-08 23:00 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Wed, Feb 08, 2017 at 12:30:08PM -0800, Stephen Boyd wrote:
> On 02/06/2017 09:39 AM, Jordan Crouse wrote:
> > diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> > new file mode 100644
> > index 0000000..348ead7
> > --- /dev/null
> > +++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
> > @@ -0,0 +1,367 @@
> >
> > +/*
> > + * Try to transition the preemption state from old to new. Return
> > + * true on success or false if the original state wasn't 'old'
> > + */
> > +static inline bool try_preempt_state(struct a5xx_gpu *a5xx_gpu,
> > +		enum preempt_state old, enum preempt_state new)
> > +{
> > +	enum preempt_state cur = atomic_cmpxchg(&a5xx_gpu->preempt_state,
> > +		old, new);
> > +
> > +	return (cur == old);
> > +}
> > +
> > +/*
> > + * Force the preemption state to the specified state.  This is used in cases
> > + * where the current state is known and won't change
> > + */
> > +static inline void set_preempt_state(struct a5xx_gpu *gpu,
> > +		enum preempt_state new)
> > +{
> > +	/* atomic_set() doesn't automatically do barriers, so one before.. */
> > +	smp_wmb();
> > +	atomic_set(&gpu->preempt_state, new);
> > +	/* ... and one after*/
> > +	smp_wmb();
> 
> Should these smp_wmb()s be replaced with smp_mb__before_atomic() and
> smp_mb__after_atomic()? The latter is stronger (mb() instead of wmb())
> but otherwise that's typically what is used with atomics. Also, it would
> be nice if the comments described what we're synchronizing with.

Yep - _before/_after atomic are more correct to use here. We are synchronizing
with other cores that are trying to check the state machine in try_preempt_state
and in the interrupt handler.  I'll clarify that.

> 
> > +
> > +static void a5xx_preempt_worker(struct work_struct *work)
> > +{
> > +	struct a5xx_gpu *a5xx_gpu =
> > +		container_of(work, struct a5xx_gpu, preempt_work);
> > +	struct msm_gpu *gpu = &a5xx_gpu->base.base;
> > +	struct drm_device *dev = gpu->dev;
> > +	struct msm_drm_private *priv = dev->dev_private;
> > +
> > +	if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_COMPLETE) {
> > +		uint32_t status = gpu_read(gpu,
> > +			REG_A5XX_CP_CONTEXT_SWITCH_CNTL);
> 
> Why does this worker check the status again? The irq may fire but the
> hardware is still indicating "pending"? And why do we fork off this code
> to a workqueue? Can't we do this stuff in the irq handler or timer
> handler and drop this worker?

That is a great point - I don't actually know if we need the helper here - I
don't believe we need a mutex for anything at this point - the hang recovery
will need one, but that gets scheduled anyway so the worker is unneeded.

> > +
> > +		if (status == 0) {
> > +			del_timer(&a5xx_gpu->preempt_timer);
> > +			a5xx_gpu->cur_ring = a5xx_gpu->next_ring;
> > +			a5xx_gpu->next_ring = NULL;
> > +
> > +			update_wptr(gpu, a5xx_gpu->cur_ring);
> > +
> > +			set_preempt_state(a5xx_gpu, PREEMPT_NONE);
> > +			return;
> > +		}
> > +
> > +		dev_err(dev->dev, "%s: Preemption failed to complete\n",
> > +			gpu->name);
> > +	} else if (atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_FAULTED)
> > +		dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
> > +	else
> > +		return;
> > +
> > +	/* Trigger recovery */
> > +	queue_work(priv->wq, &gpu->recover_work);
> > +}
> > diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > index f5118ad..4fcccd4 100644
> > --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> > @@ -334,6 +334,11 @@ static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
> >  	adreno_gpu_write(gpu, hi, upper_32_bits(data));
> >  }
> >  
> > +static inline uint32_t get_wptr(struct msm_ringbuffer *ring)
> 
> const struct msm_ringbuffer?

msm_ringbuffer is actively being modified, so we would have to cast it - can't
tell if there would be a compiler advantage or not.

Jordan
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 11/11] drm/msm: Implement preemption for A5XX targets
  2017-02-08 23:00         ` Jordan Crouse
@ 2017-02-09  0:03           ` Stephen Boyd
  0 siblings, 0 replies; 36+ messages in thread
From: Stephen Boyd @ 2017-02-09  0:03 UTC (permalink / raw)
  To: freedreno, linux-arm-msm, dri-devel

On 02/08/2017 03:00 PM, Jordan Crouse wrote:
> On Wed, Feb 08, 2017 at 12:30:08PM -0800, Stephen Boyd wrote:
>
>> const struct msm_ringbuffer?
> msm_ringbuffer is actively being modified, so we would have to cast it - can't
> tell if there would be a compiler advantage or not.
>

I just meant for this function. It keeps us from modifying ring in the
function.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 05/11] drm/msm: get an iova from the address space instead of an id
  2017-02-06 17:39 ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
@ 2017-02-09  5:01   ` Archit Taneja
  0 siblings, 0 replies; 36+ messages in thread
From: Archit Taneja @ 2017-02-09  5:01 UTC (permalink / raw)
  To: Jordan Crouse, freedreno; +Cc: linux-arm-msm, dri-devel



On 02/06/2017 11:09 PM, Jordan Crouse wrote:
> In the future we won't have a fixed set of addresses spaces.
> Instead of going through the effort of assigning a ID for each
> address space just use the address space itself as a token for
> getting / putting an iova.
>
> This forces a few changes in the gem object however: instead
> of using a simple index into a list of domains, we need to
> maintain a list of them. Luckily the list will be pretty small;
> even with dynamic address spaces we wouldn't ever see more than
> two or three.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |   8 +-
>  drivers/gpu/drm/msm/adreno/a5xx_power.c   |   5 +-
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   6 +-
>  drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +++-
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 ++---
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   3 -
>  drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 ++--
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +--
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
>  drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 ++--
>  drivers/gpu/drm/msm/msm_drv.c             |  14 ----
>  drivers/gpu/drm/msm/msm_drv.h             |  25 +++---
>  drivers/gpu/drm/msm/msm_fb.c              |  15 ++--
>  drivers/gpu/drm/msm/msm_fbdev.c           |  10 ++-
>  drivers/gpu/drm/msm/msm_gem.c             | 124 +++++++++++++++++++++---------
>  drivers/gpu/drm/msm/msm_gem.h             |   4 +-
>  drivers/gpu/drm/msm/msm_gem_submit.c      |   4 +-
>  drivers/gpu/drm/msm/msm_gpu.c             |   8 +-
>  drivers/gpu/drm/msm/msm_gpu.h             |   1 -
>  drivers/gpu/drm/msm/msm_kms.h             |   3 +
>  22 files changed, 184 insertions(+), 133 deletions(-)
>

<snip>

> diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
> index e8f41eb..0b5b839 100644
> --- a/drivers/gpu/drm/msm/msm_fbdev.c
> +++ b/drivers/gpu/drm/msm/msm_fbdev.c
> @@ -20,6 +20,7 @@
>  #include "drm_crtc.h"
>  #include "drm_fb_helper.h"
>  #include "msm_gem.h"
> +#include "msm_kms.h"
>
>  extern int msm_gem_mmap_obj(struct drm_gem_object *obj,
>  					struct vm_area_struct *vma);
> @@ -78,6 +79,7 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
>  {
>  	struct msm_fbdev *fbdev = to_msm_fbdev(helper);
>  	struct drm_device *dev = helper->dev;
> +	struct msm_drm_private *priv = dev->dev_private;
>  	struct drm_framebuffer *fb = NULL;
>  	struct fb_info *fbi = NULL;
>  	struct drm_mode_fb_cmd2 mode_cmd = {0};
> @@ -129,7 +131,13 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
>  	 * in panic (ie. lock-safe, etc) we could avoid pinning the
>  	 * buffer now:
>  	 */
> -	ret = msm_gem_get_iova_locked(fbdev->bo, 0, &paddr);
> +
> +	if (!priv->kms) {
> +		ret = -ENODEV;
> +		goto fail_unlock;
> +	}

This check isn't needed. As of now, we don't create a fbdev device if we don't
have kms initialized.

Otherwise,

Reviewed-by: Archit Taneja <architt@codeaurora.org>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [v2] [PATCH 00/11] drm/msm: A5XX preemption
       [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-02-06 17:39   ` [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup Jordan Crouse
  2017-02-06 17:39   ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
@ 2017-03-07 16:58   ` Jordan Crouse
  2017-03-07 16:58     ` [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup Jordan Crouse
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2 siblings, 2 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Here is v2 of the preemption series - Changes:

 * Refactored API in DRM_IOCTL_MSM_GEM_INFO (Thanks Emil Velikov)
 * Removed preemption worker and fixed atomics (Thanks Stephen Boyd)
 * Various fixes and improvements based on testing

Thanks!
Jordan

Jordan Crouse (11):
  drm/msm: Make sure to detach the MMU during GPU cleanup
  drm/msm: Improve the zap shader
  drm/msm: Remove idle function hook
  drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
  drm/msm: get an iova from the address space instead of an id
  drm/msm: Add a struct to pass configuration to msm_gpu_init()
  drm/msm: Remove memptrs->wptr
  drm/msm: Support multiple ringbuffers
  drm/msm: Shadow current pointer in the ring until command is complete
  drm/msm: Make the value of RB_CNTL (almost) generic
  drm/msm: Implement preemption for A5XX targets

 arch/arm64/boot/dts/qcom/msm8996.dtsi     |   4 +-
 drivers/gpu/drm/msm/Makefile              |   1 +
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c     |  13 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c     |  13 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 292 +++++++++++++++++++------
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 103 ++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_power.c   |  11 +-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 345 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   | 220 ++++++++++++-------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |  44 ++--
 drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 +-
 drivers/gpu/drm/msm/msm_drv.c             |  36 ++--
 drivers/gpu/drm/msm/msm_drv.h             |  27 ++-
 drivers/gpu/drm/msm/msm_fb.c              |  15 +-
 drivers/gpu/drm/msm/msm_fbdev.c           |  10 +-
 drivers/gpu/drm/msm/msm_fence.c           |  85 ++++++--
 drivers/gpu/drm/msm/msm_fence.h           |  13 +-
 drivers/gpu/drm/msm/msm_gem.c             | 134 ++++++++----
 drivers/gpu/drm/msm/msm_gem.h             |   5 +-
 drivers/gpu/drm/msm/msm_gem_submit.c      |  14 +-
 drivers/gpu/drm/msm/msm_gpu.c             | 141 +++++++-----
 drivers/gpu/drm/msm/msm_gpu.h             |  54 ++++-
 drivers/gpu/drm/msm/msm_kms.h             |   3 +
 drivers/gpu/drm/msm/msm_ringbuffer.c      |  14 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h      |  21 +-
 include/uapi/drm/msm_drm.h                |  13 +-
 34 files changed, 1322 insertions(+), 400 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c

-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup
  2017-03-07 16:58   ` [v2] [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
@ 2017-03-07 16:58     ` Jordan Crouse
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  1 sibling, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno; +Cc: linux-arm-msm, dri-devel

We should be detaching the MMU before destroying the address
space. To do this cleanly, the detach has to happen in
adreno_gpu_cleanup() because it needs access to structs
in adreno_gpu.c.  Plus it is better symmetry to have
the attach and detach at the same code level.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 31 ++++++++++++++++++++-----------
 drivers/gpu/drm/msm/msm_gpu.c           |  3 ---
 2 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index f67e6f8..35a6849 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -2,7 +2,7 @@
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
- * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
@@ -420,18 +420,27 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	return 0;
 }
 
-void adreno_gpu_cleanup(struct adreno_gpu *gpu)
+void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
 {
-	if (gpu->memptrs_bo) {
-		if (gpu->memptrs)
-			msm_gem_put_vaddr(gpu->memptrs_bo);
+	struct msm_gpu *gpu = &adreno_gpu->base;
+
+	if (adreno_gpu->memptrs_bo) {
+		if (adreno_gpu->memptrs)
+			msm_gem_put_vaddr(adreno_gpu->memptrs_bo);
+
+		if (adreno_gpu->memptrs_iova)
+			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->id);
+
+		drm_gem_object_unreference_unlocked(adreno_gpu->memptrs_bo);
+	}
+	release_firmware(adreno_gpu->pm4);
+	release_firmware(adreno_gpu->pfp);
 
-		if (gpu->memptrs_iova)
-			msm_gem_put_iova(gpu->memptrs_bo, gpu->base.id);
+	msm_gpu_cleanup(gpu);
 
-		drm_gem_object_unreference_unlocked(gpu->memptrs_bo);
+	if (gpu->aspace) {
+		gpu->aspace->mmu->funcs->detach(gpu->aspace->mmu,
+			iommu_ports, ARRAY_SIZE(iommu_ports));
+		msm_gem_address_space_destroy(gpu->aspace);
 	}
-	release_firmware(gpu->pm4);
-	release_firmware(gpu->pfp);
-	msm_gpu_cleanup(&gpu->base);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 7b29843..e89093c 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -710,9 +710,6 @@ void msm_gpu_cleanup(struct msm_gpu *gpu)
 		msm_ringbuffer_destroy(gpu->rb);
 	}
 
-	if (gpu->aspace)
-		msm_gem_address_space_destroy(gpu->aspace);
-
 	if (gpu->fctx)
 		msm_fence_context_free(gpu->fctx);
 }
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 02/11] drm/msm: Improve the zap shader
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 03/11] drm/msm: Remove idle function hook Jordan Crouse
                         ` (8 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Simply the code, use snprintf correct and make sure that we memset
the rest of the segment if the memory size in the ELF file is larger
than the file size.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 arch/arm64/boot/dts/qcom/msm8996.dtsi |  4 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c | 70 +++++++++++++++++------------------
 2 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index b004275..2903020 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -915,8 +915,8 @@
 				};
 			};
 
-			qcom,zap-shader {
-				compatible = "qcom,zap-shader";
+			zap-shader {
+				compatible = "zap-shader";
 				memory-region = <&peripheral_reserved>;
 
 				qcom,firmware = "a530_zap";
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index dfc9734..9d754a7 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -31,11 +31,11 @@ static inline bool _check_segment(const struct elf32_phdr *phdr)
 		phdr->p_memsz);
 }
 
-static int __pil_tz_load_image(struct platform_device *pdev,
+static int zap_load_segments(struct platform_device *pdev,
 		const struct firmware *mdt, const char *fwname,
 		void *fwptr, size_t fw_size, unsigned long fw_min_addr)
 {
-	char str[64] = { 0 };
+	char filename[64];
 	const struct elf32_hdr *ehdr = (struct elf32_hdr *) mdt->data;
 	const struct elf32_phdr *phdrs = (struct elf32_phdr *) (ehdr + 1);
 	const struct firmware *fw;
@@ -53,16 +53,18 @@ static int __pil_tz_load_image(struct platform_device *pdev,
 		offset = (phdr->p_paddr - fw_min_addr);
 
 		/* Request the file containing the segment */
-		snprintf(str, sizeof(str) - 1, "%s.b%02d", fwname, i);
+		snprintf(filename, sizeof(filename), "%s.b%02d", fwname, i);
 
-		ret = request_firmware(&fw, str, &pdev->dev);
+		ret = request_firmware(&fw, filename, &pdev->dev);
 		if (ret) {
-			dev_err(&pdev->dev, "Failed to load segment %s\n", str);
+			DRM_DEV_ERROR(&pdev->dev, "Failed to load segment %s\n",
+				filename);
 			break;
 		}
 
 		if (offset + fw->size > fw_size) {
-			dev_err(&pdev->dev, "Segment %s is too big\n", str);
+			DRM_DEV_ERROR(&pdev->dev, "Segment %s is too big\n",
+				filename);
 			ret = -EINVAL;
 			release_firmware(fw);
 			break;
@@ -70,15 +72,19 @@ static int __pil_tz_load_image(struct platform_device *pdev,
 
 		/* Copy the segment into place */
 		memcpy(fwptr + offset, fw->data, fw->size);
+
+		if (phdr->p_memsz > phdr->p_filesz)
+			memset(fwptr + fw->size, 0,
+				phdr->p_memsz - phdr->p_filesz);
 		release_firmware(fw);
 	}
 
 	return ret;
 }
 
-static int _pil_tz_load_image(struct platform_device *pdev)
+static int zap_load_mdt(struct platform_device *pdev)
 {
-	char str[64] = { 0 };
+	char filename[64];
 	const char *fwname;
 	const struct elf32_hdr *ehdr;
 	const struct elf32_phdr *phdrs;
@@ -86,7 +92,6 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 	phys_addr_t fw_min_addr, fw_max_addr;
 	dma_addr_t fw_phys;
 	size_t fw_size;
-	u32 pas_id;
 	void *ptr;
 	int i, ret;
 
@@ -94,35 +99,29 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 		return -ENODEV;
 
 	if (!qcom_scm_is_available()) {
-		dev_err(&pdev->dev, "SCM is not available\n");
-		return -EINVAL;
+		DRM_DEV_ERROR(&pdev->dev, "SCM is not available\n");
+		return -EPROBE_DEFER;
 	}
 
 	ret = of_reserved_mem_device_init(&pdev->dev);
-
 	if (ret) {
-		dev_err(&pdev->dev, "Unable to set up the reserved memory\n");
+		DRM_DEV_ERROR(&pdev->dev, "Unable to set up the reserved memory\n");
 		return ret;
 	}
 
 	/* Get the firmware and PAS id from the device node */
 	if (of_property_read_string(pdev->dev.of_node, "qcom,firmware",
 		&fwname)) {
-		dev_err(&pdev->dev, "Could not read a firmware name\n");
-		return -EINVAL;
-	}
-
-	if (of_property_read_u32(pdev->dev.of_node, "qcom,pas-id", &pas_id)) {
-		dev_err(&pdev->dev, "Could not read the pas ID\n");
+		DRM_DEV_ERROR(&pdev->dev, "Could not read a firmware name\n");
 		return -EINVAL;
 	}
 
-	snprintf(str, sizeof(str) - 1, "%s.mdt", fwname);
+	snprintf(filename, sizeof(filename), "%s.mdt", fwname);
 
 	/* Request the MDT file for the firmware */
-	ret = request_firmware(&mdt, str, &pdev->dev);
+	ret = request_firmware(&mdt, filename, &pdev->dev);
 	if (ret) {
-		dev_err(&pdev->dev, "Unable to load %s\n", str);
+		DRM_DEV_ERROR(&pdev->dev, "Unable to load %s\n", filename);
 		return ret;
 	}
 
@@ -151,9 +150,9 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 	fw_size = (size_t) (fw_max_addr - fw_min_addr);
 
 	/* Verify the MDT header */
-	ret = qcom_scm_pas_init_image(pas_id, mdt->data, mdt->size);
+	ret = qcom_scm_pas_init_image(13, mdt->data, mdt->size);
 	if (ret) {
-		dev_err(&pdev->dev, "Invalid firmware metadata\n");
+		DRM_DEV_ERROR(&pdev->dev, "Invalid firmware metadata\n");
 		goto out;
 	}
 
@@ -163,18 +162,19 @@ static int _pil_tz_load_image(struct platform_device *pdev)
 		goto out;
 
 	/* Set up the newly allocated memory region */
-	ret = qcom_scm_pas_mem_setup(pas_id, fw_phys, fw_size);
+	ret = qcom_scm_pas_mem_setup(13, fw_phys, fw_size);
 	if (ret) {
-		dev_err(&pdev->dev, "Unable to set up firmware memory\n");
+		DRM_DEV_ERROR(&pdev->dev, "Unable to set up firmware memory\n");
 		goto out;
 	}
 
-	ret = __pil_tz_load_image(pdev, mdt, fwname, ptr, fw_size, fw_min_addr);
-	if (!ret) {
-		ret = qcom_scm_pas_auth_and_reset(pas_id);
-		if (ret)
-			dev_err(&pdev->dev, "Unable to authorize the image\n");
-	}
+	ret = zap_load_segments(pdev, mdt, fwname, ptr, fw_size, fw_min_addr);
+	if (ret)
+		goto out;
+
+	ret = qcom_scm_pas_auth_and_reset(13);
+	if (ret)
+		DRM_DEV_ERROR(&pdev->dev, "Unable to authorize the image\n");
 
 out:
 	if (ret && ptr)
@@ -502,14 +502,14 @@ static int a5xx_zap_shader_init(struct msm_gpu *gpu)
 	of_platform_populate(pdev->dev.of_node, NULL, NULL, &pdev->dev);
 
 	/* Find the sub-node for the zap shader */
-	node = of_find_node_by_name(pdev->dev.of_node, "qcom,zap-shader");
+	node = of_get_child_by_name(pdev->dev.of_node, "zap-shader");
 	if (!node) {
-		DRM_ERROR("%s: qcom,zap-shader not found in device tree\n",
+		DRM_ERROR("%s: zap-shader not found in device tree\n",
 			gpu->name);
 		return -ENODEV;
 	}
 
-	ret = _pil_tz_load_image(of_find_device_by_node(node));
+	ret = zap_load_mdt(of_find_device_by_node(node));
 	if (ret)
 		DRM_ERROR("%s: Unable to load the zap shader\n",
 			gpu->name);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 03/11] drm/msm: Remove idle function hook
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-03-07 16:58       ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 04/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
                         ` (7 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

There isn't any generic code that uses ->idle so remove it.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   | 4 ++--
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 9 ++++-----
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h   | 1 +
 drivers/gpu/drm/msm/adreno/a5xx_power.c | 2 +-
 drivers/gpu/drm/msm/msm_gpu.h           | 1 -
 6 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index b999349..fc4fd2d 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -40,6 +40,7 @@
 extern bool hang_debug;
 
 static void a3xx_dump(struct msm_gpu *gpu);
+static bool a3xx_idle(struct msm_gpu *gpu);
 
 static bool a3xx_me_init(struct msm_gpu *gpu)
 {
@@ -65,7 +66,7 @@ static bool a3xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	return gpu->funcs->idle(gpu);
+	return a3xx_idle(gpu);
 }
 
 static int a3xx_hw_init(struct msm_gpu *gpu)
@@ -448,7 +449,6 @@ static void a3xx_dump(struct msm_gpu *gpu)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
-		.idle = a3xx_idle,
 		.irq = a3xx_irq,
 		.destroy = a3xx_destroy,
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 511bc85..6bc948b 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -31,6 +31,7 @@
 
 extern bool hang_debug;
 static void a4xx_dump(struct msm_gpu *gpu);
+static bool a4xx_idle(struct msm_gpu *gpu);
 
 /*
  * a4xx_enable_hwcg() - Program the clock control registers
@@ -137,7 +138,7 @@ static bool a4xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 
 	gpu->funcs->flush(gpu);
-	return gpu->funcs->idle(gpu);
+	return a4xx_idle(gpu);
 }
 
 static int a4xx_hw_init(struct msm_gpu *gpu)
@@ -538,7 +539,6 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
-		.idle = a4xx_idle,
 		.irq = a4xx_irq,
 		.destroy = a4xx_destroy,
 #ifdef CONFIG_DEBUG_FS
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 9d754a7..5d3c4ff 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -391,7 +391,7 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 
 	gpu->funcs->flush(gpu);
 
-	return gpu->funcs->idle(gpu) ? 0 : -EINVAL;
+	return a5xx_idle(gpu) ? 0 : -EINVAL;
 }
 
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
@@ -699,7 +699,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		OUT_RING(gpu->rb, 0x0F);
 
 		gpu->funcs->flush(gpu);
-		if (!gpu->funcs->idle(gpu))
+		if (!a5xx_idle(gpu))
 			return -EINVAL;
 	}
 
@@ -716,7 +716,7 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		OUT_RING(gpu->rb, 0x00000000);
 
 		gpu->funcs->flush(gpu);
-		if (!gpu->funcs->idle(gpu))
+		if (!a5xx_idle(gpu))
 			return -EINVAL;
 	} else {
 		/* Print a warning so if we die, we know why */
@@ -790,7 +790,7 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 		A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT);
 }
 
-static bool a5xx_idle(struct msm_gpu *gpu)
+bool a5xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for CP to drain ringbuffer: */
 	if (!adreno_idle(gpu))
@@ -1099,7 +1099,6 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
 		.flush = adreno_flush,
-		.idle = a5xx_idle,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 1590f84..6b20f28 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -56,5 +56,6 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 	return -ETIMEDOUT;
 }
 
+bool a5xx_idle(struct msm_gpu *gpu);
 
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index 72d52c7..ed0802e 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -194,7 +194,7 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 
 	gpu->funcs->flush(gpu);
 
-	if (!gpu->funcs->idle(gpu)) {
+	if (!a5xx_idle(gpu)) {
 		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
 			gpu->name);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index c4c39d3..267723f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -50,7 +50,6 @@ struct msm_gpu_funcs {
 	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			struct msm_file_private *ctx);
 	void (*flush)(struct msm_gpu *gpu);
-	bool (*idle)(struct msm_gpu *gpu);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
 	uint32_t (*last_fence)(struct msm_gpu *gpu);
 	void (*recover)(struct msm_gpu *gpu);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 04/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
  2017-03-07 16:58       ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
  2017-03-07 16:58       ` [PATCH 03/11] drm/msm: Remove idle function hook Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
                         ` (6 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Modify the 'pad' member of struct drm_msm_gem_info to 'hint'. If the
user sets 'hint' to non-zero it means that they want a IOVA for the
GEM object instead of a mmap() offset. Return the iova in the 'offset'
member.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/msm_drv.c | 23 +++++++++++++++++++++--
 include/uapi/drm/msm_drm.h    |  8 ++++++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index a9a520f..92375ac 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -695,6 +695,17 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 	return ret;
 }
 
+static int msm_ioctl_gem_info_iova(struct drm_device *dev,
+		struct drm_gem_object *obj, uint64_t *iova)
+{
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!priv->gpu)
+		return -EINVAL;
+
+	return msm_gem_get_iova(obj, priv->gpu->id, iova);
+}
+
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		struct drm_file *file)
 {
@@ -702,14 +713,22 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 	struct drm_gem_object *obj;
 	int ret = 0;
 
-	if (args->pad)
+	if (args->flags & ~MSM_INFO_FLAGS)
 		return -EINVAL;
 
 	obj = drm_gem_object_lookup(file, args->handle);
 	if (!obj)
 		return -ENOENT;
 
-	args->offset = msm_gem_mmap_offset(obj);
+	if (args->flags & MSM_INFO_IOVA) {
+		uint64_t iova;
+
+		ret = msm_ioctl_gem_info_iova(dev, obj, &iova);
+		if (!ret)
+			args->offset = iova;
+	} else {
+		args->offset = msm_gem_mmap_offset(obj);
+	}
 
 	drm_gem_object_unreference_unlocked(obj);
 
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 4d5d6a2..05dc5b3 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -103,10 +103,14 @@ struct drm_msm_gem_new {
 	__u32 handle;         /* out */
 };
 
+#define MSM_INFO_IOVA	0x01
+
+#define MSM_INFO_FLAGS (MSM_INFO_IOVA)
+
 struct drm_msm_gem_info {
 	__u32 handle;         /* in */
-	__u32 pad;
-	__u64 offset;         /* out, offset to pass to mmap() */
+	__u32 flags;	      /* in - combination of MSM_INFO_* flags */
+	__u64 offset;         /* out, mmap() offset or iova */
 };
 
 #define MSM_PREP_READ        0x01
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 05/11] drm/msm: get an iova from the address space instead of an id
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (2 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 04/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
                         ` (5 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

In the future we won't have a fixed set of addresses spaces.
Instead of going through the effort of assigning a ID for each
address space just use the address space itself as a token for
getting / putting an iova.

This forces a few changes in the gem object however: instead
of using a simple index into a list of domains, we need to
maintain a list of them. Luckily the list will be pretty small;
even with dynamic address spaces we wouldn't ever see more than
two or three.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     |   8 +-
 drivers/gpu/drm/msm/adreno/a5xx_power.c   |   5 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |   6 +-
 drivers/gpu/drm/msm/dsi/dsi_host.c        |  15 +++-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c  |   8 +-
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c   |  18 ++--
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c |  13 +--
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c  |   5 +-
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c   |  11 +--
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h   |   4 -
 drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c |  13 +--
 drivers/gpu/drm/msm/msm_drv.c             |  15 +---
 drivers/gpu/drm/msm/msm_drv.h             |  25 +++---
 drivers/gpu/drm/msm/msm_fb.c              |  15 ++--
 drivers/gpu/drm/msm/msm_fbdev.c           |  10 ++-
 drivers/gpu/drm/msm/msm_gem.c             | 134 +++++++++++++++++++++---------
 drivers/gpu/drm/msm/msm_gem.h             |   4 +-
 drivers/gpu/drm/msm/msm_gem_submit.c      |   4 +-
 drivers/gpu/drm/msm/msm_gpu.c             |   8 +-
 drivers/gpu/drm/msm/msm_gpu.h             |   1 -
 drivers/gpu/drm/msm/msm_kms.h             |   3 +
 22 files changed, 194 insertions(+), 135 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 5d3c4ff..25ab1f4 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -415,7 +415,7 @@ static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
 	}
 
 	if (iova) {
-		int ret = msm_gem_get_iova(bo, gpu->id, iova);
+		int ret = msm_gem_get_iova(bo, gpu->aspace, iova);
 
 		if (ret) {
 			drm_gem_object_unreference_unlocked(bo);
@@ -757,19 +757,19 @@ static void a5xx_destroy(struct msm_gpu *gpu)
 
 	if (a5xx_gpu->pm4_bo) {
 		if (a5xx_gpu->pm4_iova)
-			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->pm4_bo);
 	}
 
 	if (a5xx_gpu->pfp_bo) {
 		if (a5xx_gpu->pfp_iova)
-			msm_gem_put_iova(a5xx_gpu->pfp_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->pfp_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->pfp_bo);
 	}
 
 	if (a5xx_gpu->gpmu_bo) {
 		if (a5xx_gpu->gpmu_iova)
-			msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+			msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->aspace);
 		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
 	}
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index ed0802e..2fdee44 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -301,7 +301,8 @@ void a5xx_gpmu_ucode_init(struct msm_gpu *gpu)
 	if (IS_ERR(a5xx_gpu->gpmu_bo))
 		goto err;
 
-	if (msm_gem_get_iova(a5xx_gpu->gpmu_bo, gpu->id, &a5xx_gpu->gpmu_iova))
+	if (msm_gem_get_iova(a5xx_gpu->gpmu_bo, gpu->aspace,
+		&a5xx_gpu->gpmu_iova))
 		goto err;
 
 	ptr = msm_gem_get_vaddr(a5xx_gpu->gpmu_bo);
@@ -330,7 +331,7 @@ void a5xx_gpmu_ucode_init(struct msm_gpu *gpu)
 
 err:
 	if (a5xx_gpu->gpmu_iova)
-		msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->id);
+		msm_gem_put_iova(a5xx_gpu->gpmu_bo, gpu->aspace);
 	if (a5xx_gpu->gpmu_bo)
 		drm_gem_object_unreference_unlocked(a5xx_gpu->gpmu_bo);
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 35a6849..959876d 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -61,7 +61,7 @@ int adreno_hw_init(struct msm_gpu *gpu)
 
 	DBG("%s", gpu->name);
 
-	ret = msm_gem_get_iova(gpu->rb->bo, gpu->id, &gpu->rb_iova);
+	ret = msm_gem_get_iova(gpu->rb->bo, gpu->aspace, &gpu->rb_iova);
 	if (ret) {
 		gpu->rb_iova = 0;
 		dev_err(gpu->dev->dev, "could not map ringbuffer: %d\n", ret);
@@ -410,7 +410,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		return -ENOMEM;
 	}
 
-	ret = msm_gem_get_iova(adreno_gpu->memptrs_bo, gpu->id,
+	ret = msm_gem_get_iova(adreno_gpu->memptrs_bo, gpu->aspace,
 			&adreno_gpu->memptrs_iova);
 	if (ret) {
 		dev_err(drm->dev, "could not map memptrs: %d\n", ret);
@@ -429,7 +429,7 @@ void adreno_gpu_cleanup(struct adreno_gpu *adreno_gpu)
 			msm_gem_put_vaddr(adreno_gpu->memptrs_bo);
 
 		if (adreno_gpu->memptrs_iova)
-			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->id);
+			msm_gem_put_iova(adreno_gpu->memptrs_bo, gpu->aspace);
 
 		drm_gem_object_unreference_unlocked(adreno_gpu->memptrs_bo);
 	}
diff --git a/drivers/gpu/drm/msm/dsi/dsi_host.c b/drivers/gpu/drm/msm/dsi/dsi_host.c
index 1fc07ce..e7472dd 100644
--- a/drivers/gpu/drm/msm/dsi/dsi_host.c
+++ b/drivers/gpu/drm/msm/dsi/dsi_host.c
@@ -28,6 +28,7 @@
 #include <linux/regmap.h>
 #include <video/mipi_display.h>
 
+#include "msm_kms.h"
 #include "dsi.h"
 #include "dsi.xml.h"
 #include "sfpb.xml.h"
@@ -975,6 +976,7 @@ static void dsi_wait4video_eng_busy(struct msm_dsi_host *msm_host)
 static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 {
 	struct drm_device *dev = msm_host->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 	const struct msm_dsi_cfg_handler *cfg_hnd = msm_host->cfg_hnd;
 	int ret;
 	uint64_t iova;
@@ -991,7 +993,13 @@ static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 			return ret;
 		}
 
-		ret = msm_gem_get_iova_locked(msm_host->tx_gem_obj, 0, &iova);
+		if (!priv->kms) {
+			pr_err("%s: No KMS is initalized\n", __func__);
+			return -ENODEV;
+		}
+
+		ret = msm_gem_get_iova_locked(msm_host->tx_gem_obj,
+			priv->kms->aspace, &iova);
 		mutex_unlock(&dev->struct_mutex);
 		if (ret) {
 			pr_err("%s: failed to get iova, %d\n", __func__, ret);
@@ -1023,9 +1031,12 @@ static int dsi_tx_buf_alloc(struct msm_dsi_host *msm_host, int size)
 static void dsi_tx_buf_free(struct msm_dsi_host *msm_host)
 {
 	struct drm_device *dev = msm_host->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 
 	if (msm_host->tx_gem_obj) {
-		msm_gem_put_iova(msm_host->tx_gem_obj, 0);
+		if (priv->kms)
+			msm_gem_put_iova(msm_host->tx_gem_obj,
+				priv->kms->aspace);
 		mutex_lock(&dev->struct_mutex);
 		msm_gem_free_object(msm_host->tx_gem_obj);
 		msm_host->tx_gem_obj = NULL;
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
index 1c29618..1dfad91 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_crtc.c
@@ -133,7 +133,7 @@ static void unref_cursor_worker(struct drm_flip_work *work, void *val)
 		container_of(work, struct mdp4_crtc, unref_cursor_work);
 	struct mdp4_kms *mdp4_kms = get_kms(&mdp4_crtc->base);
 
-	msm_gem_put_iova(val, mdp4_kms->id);
+	msm_gem_put_iova(val, mdp4_kms->base.base.aspace);
 	drm_gem_object_unreference_unlocked(val);
 }
 
@@ -378,7 +378,8 @@ static void update_cursor(struct drm_crtc *crtc)
 		if (next_bo) {
 			/* take a obj ref + iova ref when we start scanning out: */
 			drm_gem_object_reference(next_bo);
-			msm_gem_get_iova_locked(next_bo, mdp4_kms->id, &iova);
+			msm_gem_get_iova_locked(next_bo,
+				mdp4_kms->base.base.aspace, &iova);
 
 			/* enable cursor: */
 			mdp4_write(mdp4_kms, REG_MDP4_DMA_CURSOR_SIZE(dma),
@@ -435,7 +436,8 @@ static int mdp4_crtc_cursor_set(struct drm_crtc *crtc,
 	}
 
 	if (cursor_bo) {
-		ret = msm_gem_get_iova(cursor_bo, mdp4_kms->id, &iova);
+		ret = msm_gem_get_iova(cursor_bo, mdp4_kms->base.base.aspace,
+			&iova);
 		if (ret)
 			goto fail;
 	} else {
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
index 44e0942..239a202 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.c
@@ -160,7 +160,10 @@ static void mdp4_destroy(struct msm_kms *kms)
 {
 	struct mdp4_kms *mdp4_kms = to_mdp4_kms(to_mdp_kms(kms));
 	struct device *dev = mdp4_kms->dev->dev;
-	struct msm_gem_address_space *aspace = mdp4_kms->aspace;
+	struct msm_gem_address_space *aspace = kms->aspace;
+
+	if (mdp4_kms->blank_cursor_iova)
+		msm_gem_put_iova(mdp4_kms->blank_cursor_bo, aspace);
 
 	if (aspace) {
 		aspace->mmu->funcs->detach(aspace->mmu,
@@ -168,8 +171,6 @@ static void mdp4_destroy(struct msm_kms *kms)
 		msm_gem_address_space_destroy(aspace);
 	}
 
-	if (mdp4_kms->blank_cursor_iova)
-		msm_gem_put_iova(mdp4_kms->blank_cursor_bo, mdp4_kms->id);
 	drm_gem_object_unreference_unlocked(mdp4_kms->blank_cursor_bo);
 
 	if (mdp4_kms->rpm_enabled)
@@ -536,7 +537,7 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 			goto fail;
 		}
 
-		mdp4_kms->aspace = aspace;
+		kms->aspace = aspace;
 
 		ret = aspace->mmu->funcs->attach(aspace->mmu, iommu_ports,
 				ARRAY_SIZE(iommu_ports));
@@ -548,13 +549,6 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 		aspace = NULL;
 	}
 
-	mdp4_kms->id = msm_register_address_space(dev, aspace);
-	if (mdp4_kms->id < 0) {
-		ret = mdp4_kms->id;
-		dev_err(dev->dev, "failed to register mdp4 iommu: %d\n", ret);
-		goto fail;
-	}
-
 	ret = modeset_init(mdp4_kms);
 	if (ret) {
 		dev_err(dev->dev, "modeset_init failed: %d\n", ret);
@@ -571,7 +565,7 @@ struct msm_kms *mdp4_kms_init(struct drm_device *dev)
 		goto fail;
 	}
 
-	ret = msm_gem_get_iova(mdp4_kms->blank_cursor_bo, mdp4_kms->id,
+	ret = msm_gem_get_iova(mdp4_kms->blank_cursor_bo, kms->aspace,
 			&mdp4_kms->blank_cursor_iova);
 	if (ret) {
 		dev_err(dev->dev, "could not pin blank-cursor bo: %d\n", ret);
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
index 62712ca..0eacaf0 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_kms.h
@@ -32,9 +32,6 @@ struct mdp4_kms {
 
 	int rev;
 
-	/* mapper-id used to request GEM buffer mapped for scanout: */
-	int id;
-
 	void __iomem *mmio;
 
 	struct regulator *vdd;
@@ -43,7 +40,6 @@ struct mdp4_kms {
 	struct clk *pclk;
 	struct clk *lut_clk;
 	struct clk *axi_clk;
-	struct msm_gem_address_space *aspace;
 
 	struct mdp_irq error_handler;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
index 3903dbc..88f3b86 100644
--- a/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp4/mdp4_plane.c
@@ -109,7 +109,7 @@ static int mdp4_plane_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	DBG("%s: prepare: FB[%u]", mdp4_plane->name, fb->base.id);
-	return msm_framebuffer_prepare(fb, mdp4_kms->id);
+	return msm_framebuffer_prepare(fb, mdp4_kms->base.base.aspace);
 }
 
 static void mdp4_plane_cleanup_fb(struct drm_plane *plane,
@@ -123,7 +123,7 @@ static void mdp4_plane_cleanup_fb(struct drm_plane *plane,
 		return;
 
 	DBG("%s: cleanup: FB[%u]", mdp4_plane->name, fb->base.id);
-	msm_framebuffer_cleanup(fb, mdp4_kms->id);
+	msm_framebuffer_cleanup(fb, mdp4_kms->base.base.aspace);
 }
 
 
@@ -161,6 +161,7 @@ static void mdp4_plane_set_scanout(struct drm_plane *plane,
 {
 	struct mdp4_plane *mdp4_plane = to_mdp4_plane(plane);
 	struct mdp4_kms *mdp4_kms = get_kms(plane);
+	struct msm_kms *kms = &mdp4_kms->base.base;
 	enum mdp4_pipe pipe = mdp4_plane->pipe;
 
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRC_STRIDE_A(pipe),
@@ -172,13 +173,13 @@ static void mdp4_plane_set_scanout(struct drm_plane *plane,
 			MDP4_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
 
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP0_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 0));
+			msm_framebuffer_iova(fb, kms->aspace, 0));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP1_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 1));
+			msm_framebuffer_iova(fb, kms->aspace, 1));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP2_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 2));
+			msm_framebuffer_iova(fb, kms->aspace, 2));
 	mdp4_write(mdp4_kms, REG_MDP4_PIPE_SRCP3_BASE(pipe),
-			msm_framebuffer_iova(fb, mdp4_kms->id, 3));
+			msm_framebuffer_iova(fb, kms->aspace, 3));
 
 	plane->fb = fb;
 }
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
index ceb4673..f5c4065 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_crtc.c
@@ -165,7 +165,7 @@ static void unref_cursor_worker(struct drm_flip_work *work, void *val)
 		container_of(work, struct mdp5_crtc, unref_cursor_work);
 	struct mdp5_kms *mdp5_kms = get_kms(&mdp5_crtc->base);
 
-	msm_gem_put_iova(val, mdp5_kms->id);
+	msm_gem_put_iova(val, mdp5_kms->base.base.aspace);
 	drm_gem_object_unreference_unlocked(val);
 }
 
@@ -561,7 +561,8 @@ static int mdp5_crtc_cursor_set(struct drm_crtc *crtc,
 	if (!cursor_bo)
 		return -ENOENT;
 
-	ret = msm_gem_get_iova(cursor_bo, mdp5_kms->id, &cursor_addr);
+	ret = msm_gem_get_iova(cursor_bo, mdp5_kms->base.base.aspace,
+		&cursor_addr);
 	if (ret)
 		return -EINVAL;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
index 9d21317..889dd5d 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.c
@@ -159,7 +159,7 @@ static void mdp5_set_encoder_mode(struct msm_kms *kms,
 static void mdp5_kms_destroy(struct msm_kms *kms)
 {
 	struct mdp5_kms *mdp5_kms = to_mdp5_kms(to_mdp_kms(kms));
-	struct msm_gem_address_space *aspace = mdp5_kms->aspace;
+	struct msm_gem_address_space *aspace = kms->aspace;
 	int i;
 
 	for (i = 0; i < mdp5_kms->num_hwpipes; i++)
@@ -734,7 +734,7 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 			goto fail;
 		}
 
-		mdp5_kms->aspace = aspace;
+		kms->aspace = aspace;
 
 		ret = aspace->mmu->funcs->attach(aspace->mmu, iommu_ports,
 				ARRAY_SIZE(iommu_ports));
@@ -749,13 +749,6 @@ struct msm_kms *mdp5_kms_init(struct drm_device *dev)
 		aspace = NULL;;
 	}
 
-	mdp5_kms->id = msm_register_address_space(dev, aspace);
-	if (mdp5_kms->id < 0) {
-		ret = mdp5_kms->id;
-		dev_err(&pdev->dev, "failed to register mdp5 iommu: %d\n", ret);
-		goto fail;
-	}
-
 	ret = modeset_init(mdp5_kms);
 	if (ret) {
 		dev_err(&pdev->dev, "modeset_init failed: %d\n", ret);
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
index 8ba6dd8..c0a0304 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_kms.h
@@ -48,10 +48,6 @@ struct mdp5_kms {
 	struct mdp5_state *state;
 	struct drm_modeset_lock state_lock;
 
-	/* mapper-id used to request GEM buffer mapped for scanout: */
-	int id;
-	struct msm_gem_address_space *aspace;
-
 	struct mdp5_smp *smp;
 	struct mdp5_ctl_manager *ctlm;
 
diff --git a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
index 32071e2..8a37421 100644
--- a/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
+++ b/drivers/gpu/drm/msm/mdp/mdp5/mdp5_plane.c
@@ -277,7 +277,7 @@ static int mdp5_plane_prepare_fb(struct drm_plane *plane,
 		return 0;
 
 	DBG("%s: prepare: FB[%u]", plane->name, fb->base.id);
-	return msm_framebuffer_prepare(fb, mdp5_kms->id);
+	return msm_framebuffer_prepare(fb, mdp5_kms->base.base.aspace);
 }
 
 static void mdp5_plane_cleanup_fb(struct drm_plane *plane,
@@ -290,7 +290,7 @@ static void mdp5_plane_cleanup_fb(struct drm_plane *plane,
 		return;
 
 	DBG("%s: cleanup: FB[%u]", plane->name, fb->base.id);
-	msm_framebuffer_cleanup(fb, mdp5_kms->id);
+	msm_framebuffer_cleanup(fb, mdp5_kms->base.base.aspace);
 }
 
 #define FRAC_16_16(mult, div)    (((mult) << 16) / (div))
@@ -443,6 +443,7 @@ static void set_scanout_locked(struct drm_plane *plane,
 		struct drm_framebuffer *fb)
 {
 	struct mdp5_kms *mdp5_kms = get_kms(plane);
+	struct msm_kms *kms = &mdp5_kms->base.base;
 	struct mdp5_hw_pipe *hwpipe = to_mdp5_plane_state(plane->state)->hwpipe;
 	enum mdp5_pipe pipe = hwpipe->pipe;
 
@@ -455,13 +456,13 @@ static void set_scanout_locked(struct drm_plane *plane,
 			MDP5_PIPE_SRC_STRIDE_B_P3(fb->pitches[3]));
 
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC0_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 0));
+			msm_framebuffer_iova(fb, kms->aspace, 0));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC1_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 1));
+			msm_framebuffer_iova(fb, kms->aspace, 1));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC2_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 2));
+			msm_framebuffer_iova(fb, kms->aspace, 2));
 	mdp5_write(mdp5_kms, REG_MDP5_PIPE_SRC3_ADDR(pipe),
-			msm_framebuffer_iova(fb, mdp5_kms->id, 3));
+			msm_framebuffer_iova(fb, kms->aspace, 3));
 
 	plane->fb = fb;
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 92375ac..b03e785 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -51,19 +51,6 @@ static void msm_fb_output_poll_changed(struct drm_device *dev)
 	.atomic_state_free = msm_atomic_state_free,
 };
 
-int msm_register_address_space(struct drm_device *dev,
-		struct msm_gem_address_space *aspace)
-{
-	struct msm_drm_private *priv = dev->dev_private;
-
-	if (WARN_ON(priv->num_aspaces >= ARRAY_SIZE(priv->aspace)))
-		return -EINVAL;
-
-	priv->aspace[priv->num_aspaces] = aspace;
-
-	return priv->num_aspaces++;
-}
-
 #ifdef CONFIG_DRM_MSM_REGISTER_LOGGING
 static bool reglog = false;
 MODULE_PARM_DESC(reglog, "Enable register read/write logging");
@@ -703,7 +690,7 @@ static int msm_ioctl_gem_info_iova(struct drm_device *dev,
 	if (!priv->gpu)
 		return -EINVAL;
 
-	return msm_gem_get_iova(obj, priv->gpu->id, iova);
+	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index cdd7b2f..996227c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -183,9 +183,6 @@ int msm_atomic_commit(struct drm_device *dev,
 void msm_atomic_state_clear(struct drm_atomic_state *state);
 void msm_atomic_state_free(struct drm_atomic_state *state);
 
-int msm_register_address_space(struct drm_device *dev,
-		struct msm_gem_address_space *aspace);
-
 void msm_gem_unmap_vma(struct msm_gem_address_space *aspace,
 		struct msm_gem_vma *vma, struct sg_table *sgt);
 int msm_gem_map_vma(struct msm_gem_address_space *aspace,
@@ -208,13 +205,16 @@ int msm_gem_mmap_obj(struct drm_gem_object *obj,
 int msm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
 int msm_gem_fault(struct vm_area_struct *vma, struct vm_fault *vmf);
 uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj);
-int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
-		uint64_t *iova);
-int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova);
-uint64_t msm_gem_iova(struct drm_gem_object *obj, int id);
+int msm_gem_get_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+int msm_gem_get_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova);
+uint64_t msm_gem_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
 struct page **msm_gem_get_pages(struct drm_gem_object *obj);
 void msm_gem_put_pages(struct drm_gem_object *obj);
-void msm_gem_put_iova(struct drm_gem_object *obj, int id);
+void msm_gem_put_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace);
 int msm_gem_dumb_create(struct drm_file *file, struct drm_device *dev,
 		struct drm_mode_create_dumb *args);
 int msm_gem_dumb_map_offset(struct drm_file *file, struct drm_device *dev,
@@ -249,9 +249,12 @@ struct drm_gem_object *msm_gem_new(struct drm_device *dev,
 struct drm_gem_object *msm_gem_import(struct drm_device *dev,
 		struct dma_buf *dmabuf, struct sg_table *sgt);
 
-int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id);
-void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id);
-uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane);
+int msm_framebuffer_prepare(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace);
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace);
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace, int plane);
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane);
 const struct msm_format *msm_framebuffer_format(struct drm_framebuffer *fb);
 struct drm_framebuffer *msm_framebuffer_init(struct drm_device *dev,
diff --git a/drivers/gpu/drm/msm/msm_fb.c b/drivers/gpu/drm/msm/msm_fb.c
index 9acf544..cedadbf 100644
--- a/drivers/gpu/drm/msm/msm_fb.c
+++ b/drivers/gpu/drm/msm/msm_fb.c
@@ -84,14 +84,15 @@ void msm_framebuffer_describe(struct drm_framebuffer *fb, struct seq_file *m)
  * should be fine, since only the scanout (mdpN) side of things needs
  * this, the gpu doesn't care about fb's.
  */
-int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id)
+int msm_framebuffer_prepare(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	int ret, i, n = drm_format_num_planes(fb->pixel_format);
 	uint64_t iova;
 
 	for (i = 0; i < n; i++) {
-		ret = msm_gem_get_iova(msm_fb->planes[i], id, &iova);
+		ret = msm_gem_get_iova(msm_fb->planes[i], aspace, &iova);
 		DBG("FB[%u]: iova[%d]: %08llx (%d)", fb->base.id, i, iova, ret);
 		if (ret)
 			return ret;
@@ -100,21 +101,23 @@ int msm_framebuffer_prepare(struct drm_framebuffer *fb, int id)
 	return 0;
 }
 
-void msm_framebuffer_cleanup(struct drm_framebuffer *fb, int id)
+void msm_framebuffer_cleanup(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	int i, n = drm_format_num_planes(fb->pixel_format);
 
 	for (i = 0; i < n; i++)
-		msm_gem_put_iova(msm_fb->planes[i], id);
+		msm_gem_put_iova(msm_fb->planes[i], aspace);
 }
 
-uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb, int id, int plane)
+uint32_t msm_framebuffer_iova(struct drm_framebuffer *fb,
+		struct msm_gem_address_space *aspace, int plane)
 {
 	struct msm_framebuffer *msm_fb = to_msm_framebuffer(fb);
 	if (!msm_fb->planes[plane])
 		return 0;
-	return msm_gem_iova(msm_fb->planes[plane], id) + fb->offsets[plane];
+	return msm_gem_iova(msm_fb->planes[plane], aspace) + fb->offsets[plane];
 }
 
 struct drm_gem_object *msm_framebuffer_bo(struct drm_framebuffer *fb, int plane)
diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index e8f41eb..0b5b839 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -20,6 +20,7 @@
 #include "drm_crtc.h"
 #include "drm_fb_helper.h"
 #include "msm_gem.h"
+#include "msm_kms.h"
 
 extern int msm_gem_mmap_obj(struct drm_gem_object *obj,
 					struct vm_area_struct *vma);
@@ -78,6 +79,7 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
 {
 	struct msm_fbdev *fbdev = to_msm_fbdev(helper);
 	struct drm_device *dev = helper->dev;
+	struct msm_drm_private *priv = dev->dev_private;
 	struct drm_framebuffer *fb = NULL;
 	struct fb_info *fbi = NULL;
 	struct drm_mode_fb_cmd2 mode_cmd = {0};
@@ -129,7 +131,13 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
 	 * in panic (ie. lock-safe, etc) we could avoid pinning the
 	 * buffer now:
 	 */
-	ret = msm_gem_get_iova_locked(fbdev->bo, 0, &paddr);
+
+	if (!priv->kms) {
+		ret = -ENODEV;
+		goto fail_unlock;
+	}
+
+	ret = msm_gem_get_iova_locked(fbdev->bo, priv->kms->aspace, &paddr);
 	if (ret) {
 		dev_err(dev->dev, "failed to get buffer obj iova: %d\n", ret);
 		goto fail_unlock;
diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c
index 8d6f2e2..4c3e6ef 100644
--- a/drivers/gpu/drm/msm/msm_gem.c
+++ b/drivers/gpu/drm/msm/msm_gem.c
@@ -285,22 +285,57 @@ uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj)
 	return offset;
 }
 
+static void obj_remove_domain(struct msm_gem_vma *domain)
+{
+	if (domain) {
+		list_del(&domain->list);
+		kfree(domain);
+	}
+}
+
 static void
 put_iova(struct drm_gem_object *obj)
 {
 	struct drm_device *dev = obj->dev;
-	struct msm_drm_private *priv = obj->dev->dev_private;
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	int id;
+	struct msm_gem_vma *domain, *tmp;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	for (id = 0; id < ARRAY_SIZE(msm_obj->domain); id++) {
-		if (!priv->aspace[id])
-			continue;
-		msm_gem_unmap_vma(priv->aspace[id],
-				&msm_obj->domain[id], msm_obj->sgt);
+	list_for_each_entry_safe(domain, tmp, &msm_obj->domains, list) {
+		msm_gem_unmap_vma(domain->aspace, domain, msm_obj->sgt);
+		obj_remove_domain(domain);
+	}
+}
+
+static struct msm_gem_vma *obj_add_domain(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain = kzalloc(sizeof(*domain), GFP_KERNEL);
+
+	if (!domain)
+		return ERR_PTR(-ENOMEM);
+
+	domain->aspace = aspace;
+
+	list_add_tail(&domain->list, &msm_obj->domains);
+
+	return domain;
+}
+
+static struct msm_gem_vma *obj_get_domain(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
+{
+	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain;
+
+	list_for_each_entry(domain, &msm_obj->domains, list) {
+		if (domain->aspace == aspace)
+			return domain;
 	}
+
+	return NULL;
 }
 
 /* should be called under struct_mutex.. although it can be called
@@ -310,49 +345,64 @@ uint64_t msm_gem_mmap_offset(struct drm_gem_object *obj)
  * That means when I do eventually need to add support for unpinning
  * the refcnt counter needs to be atomic_t.
  */
-int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id,
-		uint64_t *iova)
+int msm_gem_get_iova_locked(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova)
 {
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct page **pages;
+	struct msm_gem_vma *domain;
 	int ret = 0;
 
-	if (!msm_obj->domain[id].iova) {
-		struct msm_drm_private *priv = obj->dev->dev_private;
-		struct page **pages = get_pages(obj);
+	if (!iommu_present(&platform_bus_type)) {
+		pages = get_pages(obj);
 
 		if (IS_ERR(pages))
 			return PTR_ERR(pages);
 
-		if (iommu_present(&platform_bus_type)) {
-			ret = msm_gem_map_vma(priv->aspace[id], &msm_obj->domain[id],
-					msm_obj->sgt, obj->size >> PAGE_SHIFT);
-		} else {
-			msm_obj->domain[id].iova = physaddr(obj);
+		*iova = physaddr(obj);
+		return 0;
+	}
+
+	domain = obj_get_domain(obj, aspace);
+
+	if (!domain) {
+		domain = obj_add_domain(obj, aspace);
+		if (IS_ERR(domain))
+			return  PTR_ERR(domain);
+
+		pages = get_pages(obj);
+		if (IS_ERR(pages)) {
+			obj_remove_domain(domain);
+			return PTR_ERR(pages);
 		}
+
+		ret = msm_gem_map_vma(aspace, domain, msm_obj->sgt,
+			obj->size >> PAGE_SHIFT);
 	}
 
 	if (!ret)
-		*iova = msm_obj->domain[id].iova;
+		*iova = domain->iova;
+	else
+		obj_remove_domain(domain);
 
 	return ret;
 }
 
 /* get iova, taking a reference.  Should have a matching put */
-int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova)
+int msm_gem_get_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace, uint64_t *iova)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
+	struct msm_gem_vma *domain;
 	int ret;
 
-	/* this is safe right now because we don't unmap until the
-	 * bo is deleted:
-	 */
-	if (msm_obj->domain[id].iova) {
-		*iova = msm_obj->domain[id].iova;
+	domain = obj_get_domain(obj, aspace);
+	if (domain) {
+		*iova = domain->iova;
 		return 0;
 	}
 
 	mutex_lock(&obj->dev->struct_mutex);
-	ret = msm_gem_get_iova_locked(obj, id, iova);
+	ret = msm_gem_get_iova_locked(obj, aspace, iova);
 	mutex_unlock(&obj->dev->struct_mutex);
 	return ret;
 }
@@ -360,14 +410,18 @@ int msm_gem_get_iova(struct drm_gem_object *obj, int id, uint64_t *iova)
 /* get iova without taking a reference, used in places where you have
  * already done a 'msm_gem_get_iova()'.
  */
-uint64_t msm_gem_iova(struct drm_gem_object *obj, int id)
+uint64_t msm_gem_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
 {
-	struct msm_gem_object *msm_obj = to_msm_bo(obj);
-	WARN_ON(!msm_obj->domain[id].iova);
-	return msm_obj->domain[id].iova;
+	struct msm_gem_vma *domain = obj_get_domain(obj, aspace);
+
+	WARN_ON(!domain);
+
+	return domain ? domain->iova : 0;
 }
 
-void msm_gem_put_iova(struct drm_gem_object *obj, int id)
+void msm_gem_put_iova(struct drm_gem_object *obj,
+		struct msm_gem_address_space *aspace)
 {
 	// XXX TODO ..
 	// NOTE: probably don't need a _locked() version.. we wouldn't
@@ -621,11 +675,10 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 	struct msm_gem_object *msm_obj = to_msm_bo(obj);
 	struct reservation_object *robj = msm_obj->resv;
 	struct reservation_object_list *fobj;
-	struct msm_drm_private *priv = obj->dev->dev_private;
 	struct dma_fence *fence;
 	uint64_t off = drm_vma_node_start(&obj->vma_node);
 	const char *madv;
-	unsigned id;
+	struct msm_gem_vma *domain;
 
 	WARN_ON(!mutex_is_locked(&obj->dev->struct_mutex));
 
@@ -647,8 +700,9 @@ void msm_gem_describe(struct drm_gem_object *obj, struct seq_file *m)
 			obj->name, obj->refcount.refcount.counter,
 			off, msm_obj->vaddr);
 
-	for (id = 0; id < priv->num_aspaces; id++)
-		seq_printf(m, " %08llx", msm_obj->domain[id].iova);
+	/* FIXME: we need to print the address space here too */
+	list_for_each_entry(domain, &msm_obj->domains, list)
+		seq_printf(m, " %08llx", domain->iova);
 
 	seq_printf(m, " %zu%s\n", obj->size, madv);
 
@@ -783,8 +837,12 @@ static int msm_gem_new_impl(struct drm_device *dev,
 	if (!msm_obj)
 		return -ENOMEM;
 
-	if (use_vram)
-		msm_obj->vram_node = &msm_obj->domain[0].node;
+	if (use_vram) {
+		struct msm_gem_vma *domain = obj_add_domain(&msm_obj->base, 0);
+		/* FIXME: Error here? */
+		if (domain)
+			msm_obj->vram_node = &domain->node;
+	}
 
 	msm_obj->flags = flags;
 	msm_obj->madv = MSM_MADV_WILLNEED;
@@ -797,6 +855,8 @@ static int msm_gem_new_impl(struct drm_device *dev,
 	}
 
 	INIT_LIST_HEAD(&msm_obj->submit_entry);
+	INIT_LIST_HEAD(&msm_obj->domains);
+
 	list_add_tail(&msm_obj->mm_list, &priv->inactive_list);
 
 	*obj = &msm_obj->base;
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 7d52951..40cd0b6 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -35,7 +35,9 @@ struct msm_gem_address_space {
 
 struct msm_gem_vma {
 	struct drm_mm_node node;
+	struct msm_gem_address_space *aspace;
 	uint64_t iova;
+	struct list_head list;
 };
 
 struct msm_gem_object {
@@ -75,7 +77,7 @@ struct msm_gem_object {
 	struct sg_table *sgt;
 	void *vaddr;
 
-	struct msm_gem_vma domain[NUM_DOMAINS];
+	struct list_head domains;
 
 	/* normally (resv == &_resv) except for imported bo's */
 	struct reservation_object *resv;
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 1172fe7..8419680 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -158,7 +158,7 @@ static void submit_unlock_unpin_bo(struct msm_gem_submit *submit, int i)
 	struct msm_gem_object *msm_obj = submit->bos[i].obj;
 
 	if (submit->bos[i].flags & BO_PINNED)
-		msm_gem_put_iova(&msm_obj->base, submit->gpu->id);
+		msm_gem_put_iova(&msm_obj->base, submit->gpu->aspace);
 
 	if (submit->bos[i].flags & BO_LOCKED)
 		ww_mutex_unlock(&msm_obj->resv->lock);
@@ -246,7 +246,7 @@ static int submit_pin_objects(struct msm_gem_submit *submit)
 
 		/* if locking succeeded, pin bo: */
 		ret = msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->id, &iova);
+				submit->gpu->aspace, &iova);
 
 		if (ret)
 			break;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index e89093c..16610ef 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -458,7 +458,7 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 		struct msm_gem_object *msm_obj = submit->bos[i].obj;
 		/* move to inactive: */
 		msm_gem_move_to_inactive(&msm_obj->base);
-		msm_gem_put_iova(&msm_obj->base, gpu->id);
+		msm_gem_put_iova(&msm_obj->base, gpu->aspace);
 		drm_gem_object_unreference(&msm_obj->base);
 	}
 
@@ -539,7 +539,7 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		/* submit takes a reference to the bo and iova until retired: */
 		drm_gem_object_reference(&msm_obj->base);
 		msm_gem_get_iova_locked(&msm_obj->base,
-				submit->gpu->id, &iova);
+				submit->gpu->aspace, &iova);
 
 		if (submit->bos[i].flags & MSM_SUBMIT_BO_WRITE)
 			msm_gem_move_to_active(&msm_obj->base, gpu, true, submit->fence);
@@ -674,8 +674,6 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	} else {
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
 	}
-	gpu->id = msm_register_address_space(drm, gpu->aspace);
-
 
 	/* Create ringbuffer: */
 	mutex_lock(&drm->struct_mutex);
@@ -706,7 +704,7 @@ void msm_gpu_cleanup(struct msm_gpu *gpu)
 
 	if (gpu->rb) {
 		if (gpu->rb_iova)
-			msm_gem_put_iova(gpu->rb->bo, gpu->id);
+			msm_gem_put_iova(gpu->rb->bo, gpu->aspace);
 		msm_ringbuffer_destroy(gpu->rb);
 	}
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 267723f..ad6d13a 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -98,7 +98,6 @@ struct msm_gpu {
 	int irq;
 
 	struct msm_gem_address_space *aspace;
-	int id;
 
 	/* Power Control: */
 	struct regulator *gpu_reg, *gpu_cx;
diff --git a/drivers/gpu/drm/msm/msm_kms.h b/drivers/gpu/drm/msm/msm_kms.h
index 117635d2..08760f2 100644
--- a/drivers/gpu/drm/msm/msm_kms.h
+++ b/drivers/gpu/drm/msm/msm_kms.h
@@ -73,6 +73,9 @@ struct msm_kms {
 
 	/* irq number to be passed on to drm_irq_install */
 	int irq;
+
+	/* mapper-id used to request GEM buffer mapped for scanout: */
+	struct msm_gem_address_space *aspace;
 };
 
 /**
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init()
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (3 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
                         ` (4 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

The amount of information that we need to pass into msm_gpu_init()
is steadily increasing, so add a new struct to stabilize the function
call and make it easier to add new configuration down the line.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 12 ++++++++++--
 drivers/gpu/drm/msm/msm_gpu.c           | 13 ++++++-------
 drivers/gpu/drm/msm/msm_gpu.h           | 11 ++++++++++-
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 959876d..cda4156 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -344,6 +344,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct adreno_gpu *adreno_gpu, const struct adreno_gpu_funcs *funcs)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
+	struct msm_gpu_config adreno_gpu_config  = { 0 };
 	struct msm_gpu *gpu = &adreno_gpu->base;
 	int ret;
 
@@ -363,9 +364,16 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	DBG("fast_rate=%u, slow_rate=%u, bus_freq=%u",
 			gpu->fast_rate, gpu->slow_rate, gpu->bus_freq);
 
+	adreno_gpu_config.ioname = "kgsl_3d0_reg_memory";
+	adreno_gpu_config.irqname = "kgsl_3d0_irq";
+
+	adreno_gpu_config.va_start = SZ_16M;
+	adreno_gpu_config.va_end = 0xffffffff;
+
+	adreno_gpu_config.ringsz = RB_SIZE;
+
 	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
-			adreno_gpu->info->name, "kgsl_3d0_reg_memory", "kgsl_3d0_irq",
-			RB_SIZE);
+			adreno_gpu->info->name, &adreno_gpu_config);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 16610ef..050d994 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -569,7 +569,7 @@ static irqreturn_t irq_handler(int irq, void *data)
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
-		const char *name, const char *ioname, const char *irqname, int ringsz)
+		const char *name, struct msm_gpu_config *config)
 {
 	struct iommu_domain *iommu;
 	int i, ret;
@@ -605,14 +605,14 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	BUG_ON(ARRAY_SIZE(clk_names) != ARRAY_SIZE(gpu->grp_clks));
 
 	/* Map registers: */
-	gpu->mmio = msm_ioremap(pdev, ioname, name);
+	gpu->mmio = msm_ioremap(pdev, config->ioname, name);
 	if (IS_ERR(gpu->mmio)) {
 		ret = PTR_ERR(gpu->mmio);
 		goto fail;
 	}
 
 	/* Get Interrupt: */
-	gpu->irq = platform_get_irq_byname(pdev, irqname);
+	gpu->irq = platform_get_irq_byname(pdev, config->irqname);
 	if (gpu->irq < 0) {
 		ret = gpu->irq;
 		dev_err(drm->dev, "failed to get irq: %d\n", ret);
@@ -656,9 +656,8 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	 */
 	iommu = iommu_domain_alloc(&platform_bus_type);
 	if (iommu) {
-		/* TODO 32b vs 64b address space.. */
-		iommu->geometry.aperture_start = SZ_16M;
-		iommu->geometry.aperture_end = 0xffffffff;
+		iommu->geometry.aperture_start = config->va_start;
+		iommu->geometry.aperture_end = config->va_end;
 
 		dev_info(drm->dev, "%s: using IOMMU\n", name);
 		gpu->aspace = msm_gem_address_space_create(&pdev->dev,
@@ -677,7 +676,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 
 	/* Create ringbuffer: */
 	mutex_lock(&drm->struct_mutex);
-	gpu->rb = msm_ringbuffer_new(gpu, ringsz);
+	gpu->rb = msm_ringbuffer_new(gpu, config->ringsz);
 	mutex_unlock(&drm->struct_mutex);
 	if (IS_ERR(gpu->rb)) {
 		ret = PTR_ERR(gpu->rb);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index ad6d13a..cc6530f 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -28,6 +28,14 @@
 struct msm_gem_submit;
 struct msm_gpu_perfcntr;
 
+struct msm_gpu_config {
+	const char *ioname;
+	const char *irqname;
+	uint64_t va_start;
+	uint64_t va_end;
+	unsigned int ringsz;
+};
+
 /* So far, with hardware that I've seen to date, we can have:
  *  + zero, one, or two z180 2d cores
  *  + a3xx or a2xx 3d core, which share a common CP (the firmware
@@ -205,7 +213,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
-		const char *name, const char *ioname, const char *irqname, int ringsz);
+		const char *name, struct msm_gpu_config *config);
+
 void msm_gpu_cleanup(struct msm_gpu *gpu);
 
 struct msm_gpu *adreno_load_gpu(struct drm_device *dev);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 07/11] drm/msm: Remove memptrs->wptr
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (4 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
                         ` (3 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

memptrs->wptr seems to be unused. Remove it to avoid
confusing the upcoming preemption code.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 3 ---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index cda4156..59b8930 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -123,7 +123,6 @@ void adreno_recover(struct msm_gpu *gpu)
 	/* reset completed fence seqno: */
 	adreno_gpu->memptrs->fence = gpu->fctx->completed_fence;
 	adreno_gpu->memptrs->rptr  = 0;
-	adreno_gpu->memptrs->wptr  = 0;
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -256,7 +255,6 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 	seq_printf(m, "fence:    %d/%d\n", adreno_gpu->memptrs->fence,
 			gpu->fctx->last_fence);
 	seq_printf(m, "rptr:     %d\n", get_rptr(adreno_gpu));
-	seq_printf(m, "wptr:     %d\n", adreno_gpu->memptrs->wptr);
 	seq_printf(m, "rb wptr:  %d\n", get_wptr(gpu->rb));
 
 	gpu->funcs->pm_resume(gpu);
@@ -296,7 +294,6 @@ void adreno_dump_info(struct msm_gpu *gpu)
 	printk("fence:    %d/%d\n", adreno_gpu->memptrs->fence,
 			gpu->fctx->last_fence);
 	printk("rptr:     %d\n", get_rptr(adreno_gpu));
-	printk("wptr:     %d\n", adreno_gpu->memptrs->wptr);
 	printk("rb wptr:  %d\n", get_wptr(gpu->rb));
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 42e444a..da47468 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -86,7 +86,6 @@ struct adreno_info {
 
 struct adreno_rbmemptrs {
 	volatile uint32_t rptr;
-	volatile uint32_t wptr;
 	volatile uint32_t fence;
 };
 
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 08/11] drm/msm: Support multiple ringbuffers
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (5 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
                         ` (2 subsequent siblings)
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Add the infrastructure to support the idea of multiple ringbuffers.
Assign each ringbuffer an id and use that as an index for the various
ring specific operations.

The biggest delta is to support legacy fences. Each fence gets its own
sequence number but the legacy functions expect to use a unique integer.
To handle this we return a unique identifer for each submission but
map it to a specific ring/sequence under the covers. Newer users use
a dma_fence pointer anyway so they don't care about the actual sequence
ID or ring.

The actual mechanics for multiple ringbuffers are very target specific
so this code just allows for the possibility but still only defines
one ringbuffer for each target family.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a3xx_gpu.c   |   9 +-
 drivers/gpu/drm/msm/adreno/a4xx_gpu.c   |   9 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   |  45 ++++-----
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h   |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_power.c |   6 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 156 +++++++++++++++++++++-----------
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  36 +++++---
 drivers/gpu/drm/msm/msm_drv.h           |   2 +
 drivers/gpu/drm/msm/msm_fence.c         |  85 ++++++++++++-----
 drivers/gpu/drm/msm/msm_fence.h         |  13 ++-
 drivers/gpu/drm/msm/msm_gem.h           |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c    |  10 +-
 drivers/gpu/drm/msm/msm_gpu.c           | 123 ++++++++++++++++---------
 drivers/gpu/drm/msm/msm_gpu.h           |  38 ++++++--
 drivers/gpu/drm/msm/msm_ringbuffer.c    |  13 ++-
 drivers/gpu/drm/msm/msm_ringbuffer.h    |   8 +-
 include/uapi/drm/msm_drm.h              |   5 +
 17 files changed, 375 insertions(+), 186 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
index fc4fd2d..2f72848 100644
--- a/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a3xx_gpu.c
@@ -44,7 +44,7 @@
 
 static bool a3xx_me_init(struct msm_gpu *gpu)
 {
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT3(ring, CP_ME_INIT, 17);
 	OUT_RING(ring, 0x000003f7);
@@ -65,7 +65,7 @@ static bool a3xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 	return a3xx_idle(gpu);
 }
 
@@ -339,7 +339,7 @@ static void a3xx_destroy(struct msm_gpu *gpu)
 static bool a3xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, gpu->rb[0]))
 		return false;
 
 	/* then wait for GPU to finish: */
@@ -449,6 +449,7 @@ static void a3xx_dump(struct msm_gpu *gpu)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a3xx_irq,
 		.destroy = a3xx_destroy,
 #ifdef CONFIG_DEBUG_FS
@@ -496,7 +497,7 @@ struct msm_gpu *a3xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a3xx_registers;
 	adreno_gpu->reg_offsets = a3xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
index 6bc948b..bdd2a24 100644
--- a/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a4xx_gpu.c
@@ -116,7 +116,7 @@ static void a4xx_enable_hwcg(struct msm_gpu *gpu)
 
 static bool a4xx_me_init(struct msm_gpu *gpu)
 {
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT3(ring, CP_ME_INIT, 17);
 	OUT_RING(ring, 0x000003f7);
@@ -137,7 +137,7 @@ static bool a4xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 	return a4xx_idle(gpu);
 }
 
@@ -337,7 +337,7 @@ static void a4xx_destroy(struct msm_gpu *gpu)
 static bool a4xx_idle(struct msm_gpu *gpu)
 {
 	/* wait for ringbuffer to drain: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, gpu->rb[0]))
 		return false;
 
 	/* then wait for GPU to finish: */
@@ -539,6 +539,7 @@ static int a4xx_get_timestamp(struct msm_gpu *gpu, uint64_t *value)
 		.last_fence = adreno_last_fence,
 		.submit = adreno_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a4xx_irq,
 		.destroy = a4xx_destroy,
 #ifdef CONFIG_DEBUG_FS
@@ -580,7 +581,7 @@ struct msm_gpu *a4xx_gpu_init(struct drm_device *dev)
 	adreno_gpu->registers = a4xx_registers;
 	adreno_gpu->reg_offsets = a4xx_register_offsets;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret)
 		goto fail;
 
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 25ab1f4..4ad98b9 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -189,7 +189,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct msm_drm_private *priv = gpu->dev->dev_private;
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
 	for (i = 0; i < submit->nr_cmds; i++) {
@@ -214,11 +214,11 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
 	OUT_RING(ring, CACHE_FLUSH_TS | (1 << 31));
-	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, fence)));
-	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, fence)));
+	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
+	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, submit->fence->seqno);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 }
 
 struct a5xx_hwcg {
@@ -358,7 +358,7 @@ static void a5xx_enable_hwcg(struct msm_gpu *gpu)
 static int a5xx_me_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	OUT_PKT7(ring, CP_ME_INIT, 8);
 
@@ -389,9 +389,8 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 	OUT_RING(ring, 0x00000000);
 	OUT_RING(ring, 0x00000000);
 
-	gpu->funcs->flush(gpu);
-
-	return a5xx_idle(gpu) ? 0 : -EINVAL;
+	gpu->funcs->flush(gpu, ring);
+	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
 }
 
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
@@ -695,11 +694,11 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	 * ticking correctly
 	 */
 	if (adreno_is_a530(adreno_gpu)) {
-		OUT_PKT7(gpu->rb, CP_EVENT_WRITE, 1);
-		OUT_RING(gpu->rb, 0x0F);
+		OUT_PKT7(gpu->rb[0], CP_EVENT_WRITE, 1);
+		OUT_RING(gpu->rb[0], 0x0F);
 
-		gpu->funcs->flush(gpu);
-		if (!a5xx_idle(gpu))
+		gpu->funcs->flush(gpu, gpu->rb[0]);
+		if (!a5xx_idle(gpu, gpu->rb[0]))
 			return -EINVAL;
 	}
 
@@ -712,11 +711,11 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	 */
 	ret = a5xx_zap_shader_init(gpu);
 	if (!ret) {
-		OUT_PKT7(gpu->rb, CP_SET_SECURE_MODE, 1);
-		OUT_RING(gpu->rb, 0x00000000);
+		OUT_PKT7(gpu->rb[0], CP_SET_SECURE_MODE, 1);
+		OUT_RING(gpu->rb[0], 0x00000000);
 
-		gpu->funcs->flush(gpu);
-		if (!a5xx_idle(gpu))
+		gpu->funcs->flush(gpu, gpu->rb[0]);
+		if (!a5xx_idle(gpu, gpu->rb[0]))
 			return -EINVAL;
 	} else {
 		/* Print a warning so if we die, we know why */
@@ -790,16 +789,19 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 		A5XX_RBBM_INT_0_MASK_MISC_HANG_DETECT);
 }
 
-bool a5xx_idle(struct msm_gpu *gpu)
+bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	/* wait for CP to drain ringbuffer: */
-	if (!adreno_idle(gpu))
+	if (!adreno_idle(gpu, ring))
 		return false;
 
 	if (spin_until(_a5xx_check_idle(gpu))) {
-		DRM_ERROR("%s: %ps: timeout waiting for GPU to idle: status %8.8X irq %8.8X\n",
-			gpu->name, __builtin_return_address(0),
+		DRM_DEV_ERROR(gpu->dev->dev,
+			"timeout waiting for GPU RB %d to idle: status %8.8X rptr/wptr: %4.4X/%4.4X irq %8.8X\n",
+			ring->id,
 			gpu_read(gpu, REG_A5XX_RBBM_STATUS),
+			gpu_read(gpu, REG_A5XX_CP_RB_RPTR),
+			gpu_read(gpu, REG_A5XX_CP_RB_WPTR),
 			gpu_read(gpu, REG_A5XX_RBBM_INT_0_STATUS));
 
 		return false;
@@ -1099,6 +1101,7 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
 		.flush = adreno_flush,
+		.active_ring = adreno_active_ring,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
@@ -1133,7 +1136,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	a5xx_gpu->lm_leakage = 0x4E001A;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 6b20f28..405b563 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -56,6 +56,6 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 	return -ETIMEDOUT;
 }
 
-bool a5xx_idle(struct msm_gpu *gpu);
+bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_power.c b/drivers/gpu/drm/msm/adreno/a5xx_power.c
index 2fdee44..a7d91ac 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_power.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_power.c
@@ -173,7 +173,7 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = gpu->rb[0];
 
 	if (!a5xx_gpu->gpmu_dwords)
 		return 0;
@@ -192,9 +192,9 @@ static int a5xx_gpmu_init(struct msm_gpu *gpu)
 	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
 	OUT_RING(ring, 1);
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 
-	if (!a5xx_idle(gpu)) {
+	if (!a5xx_idle(gpu, ring)) {
 		DRM_ERROR("%s: Unable to load GPMU firmware. GPMU will not be active\n",
 			gpu->name);
 		return -EINVAL;
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 59b8930..21c839f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -21,7 +21,6 @@
 #include "msm_gem.h"
 #include "msm_mmu.h"
 
-#define RB_SIZE    SZ_32K
 #define RB_BLKSIZE 32
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
@@ -57,32 +56,36 @@ int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 int adreno_hw_init(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	int ret;
+	int i;
 
 	DBG("%s", gpu->name);
 
-	ret = msm_gem_get_iova(gpu->rb->bo, gpu->aspace, &gpu->rb_iova);
-	if (ret) {
-		gpu->rb_iova = 0;
-		dev_err(gpu->dev->dev, "could not map ringbuffer: %d\n", ret);
-		return ret;
+	for (i = 0; i < gpu->nr_rings; i++) {
+		int ret = msm_gem_get_iova(gpu->rb[i]->bo, gpu->aspace,
+			&gpu->rb[i]->iova);
+		if (ret) {
+			gpu->rb[i]->iova = 0;
+			dev_err(gpu->dev->dev,
+				"could not map ringbuffer %d: %d\n", i, ret);
+			return ret;
+		}
 	}
 
 	/* Setup REG_CP_RB_CNTL: */
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_CNTL,
-			/* size is log2(quad-words): */
-			AXXX_CP_RB_CNTL_BUFSZ(ilog2(gpu->rb->size / 8)) |
-			AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
-			(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
+		/* size is log2(quad-words): */
+		AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) |
+		AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
+		(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
 
-	/* Setup ringbuffer address: */
+	/* Setup ringbuffer address - use ringbuffer[0] for GPU init */
 	adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_BASE,
-		REG_ADRENO_CP_RB_BASE_HI, gpu->rb_iova);
+		REG_ADRENO_CP_RB_BASE_HI, gpu->rb[0]->iova);
 
 	if (!adreno_is_a430(adreno_gpu)) {
 		adreno_gpu_write64(adreno_gpu, REG_ADRENO_CP_RB_RPTR_ADDR,
 			REG_ADRENO_CP_RB_RPTR_ADDR_HI,
-			rbmemptr(adreno_gpu, rptr));
+			rbmemptr(adreno_gpu, 0, rptr));
 	}
 
 	return 0;
@@ -94,35 +97,58 @@ static uint32_t get_wptr(struct msm_ringbuffer *ring)
 }
 
 /* Use this helper to read rptr, since a430 doesn't update rptr in memory */
-static uint32_t get_rptr(struct adreno_gpu *adreno_gpu)
+static uint32_t get_rptr(struct adreno_gpu *adreno_gpu,
+		struct msm_ringbuffer *ring)
 {
-	if (adreno_is_a430(adreno_gpu))
-		return adreno_gpu->memptrs->rptr = adreno_gpu_read(
+	if (adreno_is_a430(adreno_gpu)) {
+		/*
+		 * If index is anything but 0 this will probably break horribly,
+		 * but I think that we have enough infrastructure in place to
+		 * ensure that it won't be. If not then this is why your
+		 * a430 stopped working.
+		 */
+		return adreno_gpu->memptrs->rptr[ring->id] = adreno_gpu_read(
 			adreno_gpu, REG_ADRENO_CP_RB_RPTR);
-	else
-		return adreno_gpu->memptrs->rptr;
+	} else
+		return adreno_gpu->memptrs->rptr[ring->id];
 }
 
-uint32_t adreno_last_fence(struct msm_gpu *gpu)
+struct msm_ringbuffer *adreno_active_ring(struct msm_gpu *gpu)
+{
+	return gpu->rb[0];
+}
+
+uint32_t adreno_last_fence(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	return adreno_gpu->memptrs->fence;
+
+	if (!ring)
+		return 0;
+
+	return adreno_gpu->memptrs->fence[ring->id];
 }
 
 void adreno_recover(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct drm_device *dev = gpu->dev;
-	int ret;
+	struct msm_ringbuffer *ring;
+	int ret, i;
 
 	gpu->funcs->pm_suspend(gpu);
 
-	/* reset ringbuffer: */
-	gpu->rb->cur = gpu->rb->start;
+	/* reset ringbuffer(s): */
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		ring->cur = ring->start;
 
-	/* reset completed fence seqno: */
-	adreno_gpu->memptrs->fence = gpu->fctx->completed_fence;
-	adreno_gpu->memptrs->rptr  = 0;
+		/* reset completed fence seqno, discard anything pending: */
+		adreno_gpu->memptrs->fence[ring->id] = ring->completed_fence;
+		adreno_gpu->memptrs->rptr[ring->id]  = 0;
+	}
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -140,7 +166,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct msm_drm_private *priv = gpu->dev->dev_private;
-	struct msm_ringbuffer *ring = gpu->rb;
+	struct msm_ringbuffer *ring = submit->ring;
 	unsigned i;
 
 	for (i = 0; i < submit->nr_cmds; i++) {
@@ -179,7 +205,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 	OUT_PKT3(ring, CP_EVENT_WRITE, 3);
 	OUT_RING(ring, CACHE_FLUSH_TS);
-	OUT_RING(ring, rbmemptr(adreno_gpu, fence));
+	OUT_RING(ring, rbmemptr(adreno_gpu, ring->id, fence));
 	OUT_RING(ring, submit->fence->seqno);
 
 	/* we could maybe be clever and only CP_COND_EXEC the interrupt: */
@@ -206,10 +232,10 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	}
 #endif
 
-	gpu->funcs->flush(gpu);
+	gpu->funcs->flush(gpu, ring);
 }
 
-void adreno_flush(struct msm_gpu *gpu)
+void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr;
@@ -219,7 +245,7 @@ void adreno_flush(struct msm_gpu *gpu)
 	 * to account for the possibility that the last command fit exactly into
 	 * the ringbuffer and rb->next hasn't wrapped to zero yet
 	 */
-	wptr = get_wptr(gpu->rb) & ((gpu->rb->size / 4) - 1);
+	wptr = get_wptr(ring) % (MSM_GPU_RINGBUFFER_SZ >> 2);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
@@ -227,17 +253,18 @@ void adreno_flush(struct msm_gpu *gpu)
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_WPTR, wptr);
 }
 
-bool adreno_idle(struct msm_gpu *gpu)
+bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	uint32_t wptr = get_wptr(gpu->rb);
+	uint32_t wptr = get_wptr(ring);
 
 	/* wait for CP to drain ringbuffer: */
-	if (!spin_until(get_rptr(adreno_gpu) == wptr))
+	if (!spin_until(get_rptr(adreno_gpu, ring) == wptr))
 		return true;
 
 	/* TODO maybe we need to reset GPU here to recover from hang? */
-	DRM_ERROR("%s: timeout waiting to drain ringbuffer!\n", gpu->name);
+	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d!\n", gpu->name,
+		ring->id);
 	return false;
 }
 
@@ -245,6 +272,7 @@ bool adreno_idle(struct msm_gpu *gpu)
 void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_ringbuffer *ring;
 	int i;
 
 	seq_printf(m, "revision: %d (%d.%d.%d.%d)\n",
@@ -252,10 +280,18 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 			adreno_gpu->rev.major, adreno_gpu->rev.minor,
 			adreno_gpu->rev.patchid);
 
-	seq_printf(m, "fence:    %d/%d\n", adreno_gpu->memptrs->fence,
-			gpu->fctx->last_fence);
-	seq_printf(m, "rptr:     %d\n", get_rptr(adreno_gpu));
-	seq_printf(m, "rb wptr:  %d\n", get_wptr(gpu->rb));
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		seq_printf(m, "rb %d: fence:    %d/%d\n", i,
+			adreno_last_fence(gpu, ring),
+			ring->completed_fence);
+
+		seq_printf(m, "      rptr:     %d\n",
+			get_rptr(adreno_gpu, ring));
+		seq_printf(m, "rb wptr:  %d\n", get_wptr(ring));
+	}
 
 	gpu->funcs->pm_resume(gpu);
 
@@ -285,16 +321,25 @@ void adreno_show(struct msm_gpu *gpu, struct seq_file *m)
 void adreno_dump_info(struct msm_gpu *gpu)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
 
 	printk("revision: %d (%d.%d.%d.%d)\n",
 			adreno_gpu->info->revn, adreno_gpu->rev.core,
 			adreno_gpu->rev.major, adreno_gpu->rev.minor,
 			adreno_gpu->rev.patchid);
 
-	printk("fence:    %d/%d\n", adreno_gpu->memptrs->fence,
-			gpu->fctx->last_fence);
-	printk("rptr:     %d\n", get_rptr(adreno_gpu));
-	printk("rb wptr:  %d\n", get_wptr(gpu->rb));
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		printk("rb %d: fence:    %d/%d\n", i,
+			adreno_last_fence(gpu, ring),
+			ring->completed_fence);
+
+		printk("rptr:     %d\n", get_rptr(adreno_gpu, ring));
+		printk("rb wptr:  %d\n", get_wptr(ring));
+	}
 }
 
 /* would be nice to not have to duplicate the _show() stuff with printk(): */
@@ -317,19 +362,21 @@ void adreno_dump(struct msm_gpu *gpu)
 	}
 }
 
-static uint32_t ring_freewords(struct msm_gpu *gpu)
+static uint32_t ring_freewords(struct msm_ringbuffer *ring)
 {
-	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
-	uint32_t size = gpu->rb->size / 4;
-	uint32_t wptr = get_wptr(gpu->rb);
-	uint32_t rptr = get_rptr(adreno_gpu);
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(ring->gpu);
+	uint32_t size = MSM_GPU_RINGBUFFER_SZ >> 2;
+	uint32_t wptr = get_wptr(ring);
+	uint32_t rptr = get_rptr(adreno_gpu, ring);
 	return (rptr + (size - 1) - wptr) % size;
 }
 
-void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords)
+void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords)
 {
-	if (spin_until(ring_freewords(gpu) >= ndwords))
-		DRM_ERROR("%s: timeout waiting for ringbuffer space\n", gpu->name);
+	if (spin_until(ring_freewords(ring) >= ndwords))
+		DRM_DEV_ERROR(ring->gpu->dev->dev,
+			"timeout waiting for space in ringubffer %d\n",
+			ring->id);
 }
 
 static const char *iommu_ports[] = {
@@ -338,7 +385,8 @@ void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords)
 };
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
-		struct adreno_gpu *adreno_gpu, const struct adreno_gpu_funcs *funcs)
+		struct adreno_gpu *adreno_gpu,
+		const struct adreno_gpu_funcs *funcs, int nr_rings)
 {
 	struct adreno_platform_config *config = pdev->dev.platform_data;
 	struct msm_gpu_config adreno_gpu_config  = { 0 };
@@ -367,7 +415,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 	adreno_gpu_config.va_start = SZ_16M;
 	adreno_gpu_config.va_end = 0xffffffff;
 
-	adreno_gpu_config.ringsz = RB_SIZE;
+	adreno_gpu_config.nr_rings = nr_rings;
 
 	ret = msm_gpu_init(drm, pdev, &adreno_gpu->base, &funcs->base,
 			adreno_gpu->info->name, &adreno_gpu_config);
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index da47468..e05fa8e 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -81,12 +81,18 @@ struct adreno_info {
 
 const struct adreno_info *adreno_info(struct adreno_rev rev);
 
-#define rbmemptr(adreno_gpu, member)  \
+#define _sizeof(member) \
+	sizeof(((struct adreno_rbmemptrs *) 0)->member[0])
+
+#define _base(adreno_gpu, member)  \
 	((adreno_gpu)->memptrs_iova + offsetof(struct adreno_rbmemptrs, member))
 
+#define rbmemptr(adreno_gpu, index, member) \
+	(_base((adreno_gpu), member) + ((index) * _sizeof(member)))
+
 struct adreno_rbmemptrs {
-	volatile uint32_t rptr;
-	volatile uint32_t fence;
+	volatile uint32_t rptr[MSM_GPU_MAX_RINGS];
+	volatile uint32_t fence[MSM_GPU_MAX_RINGS];
 };
 
 struct adreno_gpu {
@@ -196,21 +202,25 @@ static inline int adreno_is_a530(struct adreno_gpu *gpu)
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value);
 int adreno_hw_init(struct msm_gpu *gpu);
-uint32_t adreno_last_fence(struct msm_gpu *gpu);
+uint32_t adreno_last_fence(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
+uint32_t adreno_submitted_fence(struct msm_gpu *gpu,
+		struct msm_ringbuffer *ring);
 void adreno_recover(struct msm_gpu *gpu);
 void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		struct msm_file_private *ctx);
-void adreno_flush(struct msm_gpu *gpu);
-bool adreno_idle(struct msm_gpu *gpu);
+void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
+bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 #ifdef CONFIG_DEBUG_FS
 void adreno_show(struct msm_gpu *gpu, struct seq_file *m);
 #endif
 void adreno_dump_info(struct msm_gpu *gpu);
 void adreno_dump(struct msm_gpu *gpu);
-void adreno_wait_ring(struct msm_gpu *gpu, uint32_t ndwords);
+void adreno_wait_ring(struct msm_ringbuffer *ring, uint32_t ndwords);
+struct msm_ringbuffer *adreno_active_ring(struct msm_gpu *gpu);
 
 int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
-		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs);
+		struct adreno_gpu *gpu, const struct adreno_gpu_funcs *funcs,
+		int nr_rings);
 void adreno_gpu_cleanup(struct adreno_gpu *gpu);
 
 
@@ -219,7 +229,7 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 static inline void
 OUT_PKT0(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt+1);
+	adreno_wait_ring(ring, cnt+1);
 	OUT_RING(ring, CP_TYPE0_PKT | ((cnt-1) << 16) | (regindx & 0x7FFF));
 }
 
@@ -227,14 +237,14 @@ int adreno_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 static inline void
 OUT_PKT2(struct msm_ringbuffer *ring)
 {
-	adreno_wait_ring(ring->gpu, 1);
+	adreno_wait_ring(ring, 1);
 	OUT_RING(ring, CP_TYPE2_PKT);
 }
 
 static inline void
 OUT_PKT3(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt+1);
+	adreno_wait_ring(ring, cnt+1);
 	OUT_RING(ring, CP_TYPE3_PKT | ((cnt-1) << 16) | ((opcode & 0xFF) << 8));
 }
 
@@ -256,14 +266,14 @@ static inline u32 PM4_PARITY(u32 val)
 static inline void
 OUT_PKT4(struct msm_ringbuffer *ring, uint16_t regindx, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt + 1);
+	adreno_wait_ring(ring, cnt + 1);
 	OUT_RING(ring, PKT4(regindx, cnt));
 }
 
 static inline void
 OUT_PKT7(struct msm_ringbuffer *ring, uint8_t opcode, uint16_t cnt)
 {
-	adreno_wait_ring(ring->gpu, cnt + 1);
+	adreno_wait_ring(ring, cnt + 1);
 	OUT_RING(ring, CP_TYPE7_PKT | (cnt << 0) | (PM4_PARITY(cnt) << 15) |
 		((opcode & 0x7F) << 16) | (PM4_PARITY(opcode) << 23));
 }
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 996227c..600c39c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -78,6 +78,8 @@ struct msm_vblank_ctrl {
 	spinlock_t lock;
 };
 
+#define MSM_GPU_MAX_RINGS 1
+
 struct msm_drm_private {
 
 	struct drm_device *dev;
diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
index 3f299c5..c1e7614 100644
--- a/drivers/gpu/drm/msm/msm_fence.c
+++ b/drivers/gpu/drm/msm/msm_fence.c
@@ -20,7 +20,6 @@
 #include "msm_drv.h"
 #include "msm_fence.h"
 
-
 struct msm_fence_context *
 msm_fence_context_alloc(struct drm_device *dev, const char *name)
 {
@@ -32,9 +31,10 @@ struct msm_fence_context *
 
 	fctx->dev = dev;
 	fctx->name = name;
-	fctx->context = dma_fence_context_alloc(1);
+	fctx->context = dma_fence_context_alloc(MSM_GPU_MAX_RINGS);
 	init_waitqueue_head(&fctx->event);
 	spin_lock_init(&fctx->spinlock);
+	hash_init(fctx->hash);
 
 	return fctx;
 }
@@ -44,64 +44,94 @@ void msm_fence_context_free(struct msm_fence_context *fctx)
 	kfree(fctx);
 }
 
-static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t fence)
+static inline bool fence_completed(struct msm_ringbuffer *ring, uint32_t fence)
+{
+	return (int32_t)(ring->completed_fence - fence) >= 0;
+}
+
+struct msm_fence {
+	struct msm_fence_context *fctx;
+	struct msm_ringbuffer *ring;
+	struct dma_fence base;
+	struct hlist_node node;
+	u32 fence_id;
+};
+
+static struct msm_fence *fence_from_id(struct msm_fence_context *fctx,
+		uint32_t id)
 {
-	return (int32_t)(fctx->completed_fence - fence) >= 0;
+	struct msm_fence *f;
+
+	hash_for_each_possible_rcu(fctx->hash, f, node, id) {
+		if (f->fence_id == id) {
+			if (dma_fence_get_rcu(&f->base))
+				return f;
+		}
+	}
+
+	return NULL;
 }
 
 /* legacy path for WAIT_FENCE ioctl: */
 int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
 		ktime_t *timeout, bool interruptible)
 {
+	struct msm_fence *f = fence_from_id(fctx, fence);
 	int ret;
 
-	if (fence > fctx->last_fence) {
-		DRM_ERROR("%s: waiting on invalid fence: %u (of %u)\n",
-				fctx->name, fence, fctx->last_fence);
-		return -EINVAL;
+	/* If no active fence was found, there are two possibilities */
+	if (!f) {
+		/* The requested ID is newer than last issued - return error */
+		if (fence > fctx->fence_id) {
+			DRM_ERROR("%s: waiting on invalid fence: %u (of %u)\n",
+				fctx->name, fence, fctx->fence_id);
+			return -EINVAL;
+		}
+
+		/* If the id has been issued assume fence has been retired */
+		return 0;
 	}
 
 	if (!timeout) {
 		/* no-wait: */
-		ret = fence_completed(fctx, fence) ? 0 : -EBUSY;
+		ret = fence_completed(f->ring, f->base.seqno) ? 0 : -EBUSY;
 	} else {
 		unsigned long remaining_jiffies = timeout_to_jiffies(timeout);
 
 		if (interruptible)
 			ret = wait_event_interruptible_timeout(fctx->event,
-				fence_completed(fctx, fence),
+				fence_completed(f->ring, f->base.seqno),
 				remaining_jiffies);
 		else
 			ret = wait_event_timeout(fctx->event,
-				fence_completed(fctx, fence),
+				fence_completed(f->ring, f->base.seqno),
 				remaining_jiffies);
 
 		if (ret == 0) {
 			DBG("timeout waiting for fence: %u (completed: %u)",
-					fence, fctx->completed_fence);
+				f->base.seqno, f->ring->completed_fence);
 			ret = -ETIMEDOUT;
 		} else if (ret != -ERESTARTSYS) {
 			ret = 0;
 		}
 	}
 
+	dma_fence_put(&f->base);
+
 	return ret;
 }
 
 /* called from workqueue */
-void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence)
+void msm_update_fence(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring, uint32_t fence)
 {
 	spin_lock(&fctx->spinlock);
-	fctx->completed_fence = max(fence, fctx->completed_fence);
+	ring->completed_fence = max(fence, ring->completed_fence);
 	spin_unlock(&fctx->spinlock);
 
 	wake_up_all(&fctx->event);
 }
 
-struct msm_fence {
-	struct msm_fence_context *fctx;
-	struct dma_fence base;
-};
 
 static inline struct msm_fence *to_msm_fence(struct dma_fence *fence)
 {
@@ -127,12 +157,17 @@ static bool msm_fence_enable_signaling(struct dma_fence *fence)
 static bool msm_fence_signaled(struct dma_fence *fence)
 {
 	struct msm_fence *f = to_msm_fence(fence);
-	return fence_completed(f->fctx, f->base.seqno);
+	return fence_completed(f->ring, f->base.seqno);
 }
 
 static void msm_fence_release(struct dma_fence *fence)
 {
 	struct msm_fence *f = to_msm_fence(fence);
+
+	spin_lock(&f->fctx->spinlock);
+	hash_del_rcu(&f->node);
+	spin_unlock(&f->fctx->spinlock);
+
 	kfree_rcu(f, base.rcu);
 }
 
@@ -146,7 +181,7 @@ static void msm_fence_release(struct dma_fence *fence)
 };
 
 struct dma_fence *
-msm_fence_alloc(struct msm_fence_context *fctx)
+msm_fence_alloc(struct msm_fence_context *fctx, struct msm_ringbuffer *ring)
 {
 	struct msm_fence *f;
 
@@ -155,9 +190,17 @@ struct dma_fence *
 		return ERR_PTR(-ENOMEM);
 
 	f->fctx = fctx;
+	f->ring = ring;
+
+	/* Make a user fence ID to pass back for the legacy functions */
+	f->fence_id = ++fctx->fence_id;
+
+	spin_lock(&fctx->spinlock);
+	hash_add(fctx->hash, &f->node, f->fence_id);
+	spin_unlock(&fctx->spinlock);
 
 	dma_fence_init(&f->base, &msm_fence_ops, &fctx->spinlock,
-		       fctx->context, ++fctx->last_fence);
+			fctx->context + ring->id, ++ring->last_fence);
 
 	return &f->base;
 }
diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
index 56061aa..540dc61 100644
--- a/drivers/gpu/drm/msm/msm_fence.h
+++ b/drivers/gpu/drm/msm/msm_fence.h
@@ -18,17 +18,18 @@
 #ifndef __MSM_FENCE_H__
 #define __MSM_FENCE_H__
 
+#include <linux/hashtable.h>
 #include "msm_drv.h"
+#include "msm_ringbuffer.h"
 
 struct msm_fence_context {
 	struct drm_device *dev;
 	const char *name;
 	unsigned context;
-	/* last_fence == completed_fence --> no pending work */
-	uint32_t last_fence;          /* last assigned fence */
-	uint32_t completed_fence;     /* last completed fence */
+	u32 fence_id;
 	wait_queue_head_t event;
 	spinlock_t spinlock;
+	DECLARE_HASHTABLE(hash, 4);
 };
 
 struct msm_fence_context * msm_fence_context_alloc(struct drm_device *dev,
@@ -39,8 +40,10 @@ int msm_wait_fence(struct msm_fence_context *fctx, uint32_t fence,
 		ktime_t *timeout, bool interruptible);
 int msm_queue_fence_cb(struct msm_fence_context *fctx,
 		struct msm_fence_cb *cb, uint32_t fence);
-void msm_update_fence(struct msm_fence_context *fctx, uint32_t fence);
+void msm_update_fence(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring, uint32_t fence);
 
-struct dma_fence * msm_fence_alloc(struct msm_fence_context *fctx);
+struct dma_fence *msm_fence_alloc(struct msm_fence_context *fctx,
+		struct msm_ringbuffer *ring);
 
 #endif
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 40cd0b6..633b5c1 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -120,6 +120,7 @@ struct msm_gem_submit {
 	struct dma_fence *fence;
 	struct pid *pid;    /* submitting process */
 	bool valid;         /* true if no cmdstream patching needed */
+	struct msm_ringbuffer *ring;
 	unsigned int nr_cmds;
 	unsigned int nr_bos;
 	struct {
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 8419680..d8021a0 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -390,7 +390,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 	struct sync_file *sync_file = NULL;
 	int out_fence_fd = -1;
 	unsigned i;
-	int ret;
+	int ret, ring;
 
 	if (!gpu)
 		return -ENXIO;
@@ -522,7 +522,13 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 
 	submit->nr_cmds = i;
 
-	submit->fence = msm_fence_alloc(gpu->fctx);
+	ring = clamp_t(uint32_t,
+		(args->flags & MSM_SUBMIT_RING_MASK) >> MSM_SUBMIT_RING_SHIFT,
+		0, gpu->nr_rings - 1);
+
+	submit->ring = gpu->rb[ring];
+
+	submit->fence = msm_fence_alloc(gpu->fctx, submit->ring);
 	if (IS_ERR(submit->fence)) {
 		ret = PTR_ERR(submit->fence);
 		submit->fence = NULL;
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index 050d994..c7969f5 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -274,15 +274,37 @@ static void recover_worker(struct work_struct *work)
 	struct msm_gpu *gpu = container_of(work, struct msm_gpu, recover_work);
 	struct drm_device *dev = gpu->dev;
 	struct msm_gem_submit *submit;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	struct msm_ringbuffer *cur_ring = gpu->funcs->active_ring(gpu);
+	int i;
+
+	/* Update all the rings with the latest and greatest fence */
+	FOR_EACH_RING(gpu, ring, i) {
+		uint32_t fence = gpu->funcs->last_fence(gpu, ring);
 
-	msm_update_fence(gpu->fctx, fence + 1);
+		/*
+		 * For the current (faulting?) ring/submit advance the fence by
+		 * one more to clear the faulting submit
+		 */
+		if (ring == cur_ring)
+			fence = fence + 1;
+
+		msm_update_fence(gpu->fctx, cur_ring, fence);
+	}
 
 	mutex_lock(&dev->struct_mutex);
 
 	dev_err(dev->dev, "%s: hangcheck recover!\n", gpu->name);
 	list_for_each_entry(submit, &gpu->submit_list, node) {
-		if (submit->fence->seqno == (fence + 1)) {
+		uint32_t fence;
+
+		/* Only consider submits for the current ring */
+		if (submit->ring != cur_ring)
+			continue;
+
+		fence = gpu->funcs->last_fence(gpu, ring) + 1;
+
+		if (submit->fence->seqno == fence) {
 			struct task_struct *task;
 
 			rcu_read_lock();
@@ -303,10 +325,9 @@ static void recover_worker(struct work_struct *work)
 		inactive_cancel(gpu);
 		gpu->funcs->recover(gpu);
 
-		/* replay the remaining submits after the one that hung: */
-		list_for_each_entry(submit, &gpu->submit_list, node) {
+		/* replay the remaining submits for all rings: */
+		list_for_each_entry(submit, &gpu->submit_list, node)
 			gpu->funcs->submit(gpu, submit, NULL);
-		}
 	}
 
 	mutex_unlock(&dev->struct_mutex);
@@ -326,25 +347,27 @@ static void hangcheck_handler(unsigned long data)
 	struct msm_gpu *gpu = (struct msm_gpu *)data;
 	struct drm_device *dev = gpu->dev;
 	struct msm_drm_private *priv = dev->dev_private;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring = gpu->funcs->active_ring(gpu);
+	uint32_t fence = gpu->funcs->last_fence(gpu, ring);
 
-	if (fence != gpu->hangcheck_fence) {
+	if (fence != gpu->hangcheck_fence[ring->id]) {
 		/* some progress has been made.. ya! */
-		gpu->hangcheck_fence = fence;
-	} else if (fence < gpu->fctx->last_fence) {
+		gpu->hangcheck_fence[ring->id] = fence;
+	} else if (fence < ring->last_fence) {
 		/* no progress and not done.. hung! */
-		gpu->hangcheck_fence = fence;
-		dev_err(dev->dev, "%s: hangcheck detected gpu lockup!\n",
-				gpu->name);
+		gpu->hangcheck_fence[ring->id] = fence;
+		dev_err(dev->dev, "%s: hangcheck detected gpu lockup rb %d!\n",
+				gpu->name, ring->id);
 		dev_err(dev->dev, "%s:     completed fence: %u\n",
 				gpu->name, fence);
 		dev_err(dev->dev, "%s:     submitted fence: %u\n",
-				gpu->name, gpu->fctx->last_fence);
+				gpu->name, ring->last_fence);
+
 		queue_work(priv->wq, &gpu->recover_work);
 	}
 
 	/* if still more pending work, reset the hangcheck timer: */
-	if (gpu->fctx->last_fence > gpu->hangcheck_fence)
+	if (ring->last_fence > gpu->hangcheck_fence[ring->id])
 		hangcheck_timer_reset(gpu);
 
 	/* workaround for missing irq: */
@@ -468,20 +491,13 @@ static void retire_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 static void retire_submits(struct msm_gpu *gpu)
 {
 	struct drm_device *dev = gpu->dev;
+	struct msm_gem_submit *submit, *tmp;
 
 	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
 
-	while (!list_empty(&gpu->submit_list)) {
-		struct msm_gem_submit *submit;
-
-		submit = list_first_entry(&gpu->submit_list,
-				struct msm_gem_submit, node);
-
-		if (dma_fence_is_signaled(submit->fence)) {
+	list_for_each_entry_safe(submit, tmp, &gpu->submit_list, node) {
+		if (dma_fence_is_signaled(submit->fence))
 			retire_submit(gpu, submit);
-		} else {
-			break;
-		}
 	}
 }
 
@@ -489,9 +505,12 @@ static void retire_worker(struct work_struct *work)
 {
 	struct msm_gpu *gpu = container_of(work, struct msm_gpu, retire_work);
 	struct drm_device *dev = gpu->dev;
-	uint32_t fence = gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
 
-	msm_update_fence(gpu->fctx, fence);
+	FOR_EACH_RING(gpu, ring, i)
+		msm_update_fence(gpu->fctx, ring,
+			gpu->funcs->last_fence(gpu, ring));
 
 	mutex_lock(&dev->struct_mutex);
 	retire_submits(gpu);
@@ -572,7 +591,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		const char *name, struct msm_gpu_config *config)
 {
 	struct iommu_domain *iommu;
-	int i, ret;
+	int i, ret, nr_rings = config->nr_rings;
 
 	if (WARN_ON(gpu->num_perfcntrs > ARRAY_SIZE(gpu->last_cntrs)))
 		gpu->num_perfcntrs = ARRAY_SIZE(gpu->last_cntrs);
@@ -674,37 +693,59 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		dev_info(drm->dev, "%s: no IOMMU, fallback to VRAM carveout!\n", name);
 	}
 
-	/* Create ringbuffer: */
-	mutex_lock(&drm->struct_mutex);
-	gpu->rb = msm_ringbuffer_new(gpu, config->ringsz);
-	mutex_unlock(&drm->struct_mutex);
-	if (IS_ERR(gpu->rb)) {
-		ret = PTR_ERR(gpu->rb);
-		gpu->rb = NULL;
-		dev_err(drm->dev, "could not create ringbuffer: %d\n", ret);
-		goto fail;
+	if (nr_rings > ARRAY_SIZE(gpu->rb)) {
+		DRM_DEV_INFO_ONCE(drm->dev, "Only creating %lu ringbuffers\n",
+			ARRAY_SIZE(gpu->rb));
+		nr_rings = ARRAY_SIZE(gpu->rb);
+	}
+
+	/* Create ringbuffer(s): */
+	for (i = 0; i < nr_rings; i++) {
+		mutex_lock(&drm->struct_mutex);
+		gpu->rb[i] = msm_ringbuffer_new(gpu, i);
+		mutex_unlock(&drm->struct_mutex);
+
+		if (IS_ERR(gpu->rb[i])) {
+			ret = PTR_ERR(gpu->rb[i]);
+			gpu->rb[i] = NULL;
+			dev_err(drm->dev,
+				"could not create ringbuffer %d: %d\n", i, ret);
+			goto fail;
+		}
 	}
 
+	gpu->nr_rings = nr_rings;
 	bs_init(gpu);
 
 	return 0;
 
 fail:
+	for (i = 0; i < ARRAY_SIZE(gpu->rb); i++) {
+		if (gpu->rb[i])
+			msm_ringbuffer_destroy(gpu->rb[i]);
+	}
+
 	return ret;
 }
 
 void msm_gpu_cleanup(struct msm_gpu *gpu)
 {
+	int i;
+
 	DBG("%s", gpu->name);
 
 	WARN_ON(!list_empty(&gpu->active_list));
 
 	bs_fini(gpu);
 
-	if (gpu->rb) {
-		if (gpu->rb_iova)
-			msm_gem_put_iova(gpu->rb->bo, gpu->aspace);
-		msm_ringbuffer_destroy(gpu->rb);
+	for (i = 0; i < ARRAY_SIZE(gpu->rb); i++) {
+		if (!gpu->rb[i])
+			continue;
+
+		if (gpu->rb[i]->iova)
+			msm_gem_put_iova(gpu->rb[i]->bo, gpu->aspace);
+
+		msm_ringbuffer_destroy(gpu->rb[i]);
 	}
 
 	if (gpu->fctx)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index cc6530f..38d826a 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -33,7 +33,7 @@ struct msm_gpu_config {
 	const char *irqname;
 	uint64_t va_start;
 	uint64_t va_end;
-	unsigned int ringsz;
+	unsigned int nr_rings;
 };
 
 /* So far, with hardware that I've seen to date, we can have:
@@ -57,9 +57,11 @@ struct msm_gpu_funcs {
 	int (*pm_resume)(struct msm_gpu *gpu);
 	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			struct msm_file_private *ctx);
-	void (*flush)(struct msm_gpu *gpu);
+	void (*flush)(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
-	uint32_t (*last_fence)(struct msm_gpu *gpu);
+	uint32_t (*last_fence)(struct msm_gpu *gpu,
+			struct msm_ringbuffer *ring);
+	struct msm_ringbuffer *(*active_ring)(struct msm_gpu *gpu);
 	void (*recover)(struct msm_gpu *gpu);
 	void (*destroy)(struct msm_gpu *gpu);
 #ifdef CONFIG_DEBUG_FS
@@ -85,9 +87,8 @@ struct msm_gpu {
 	const struct msm_gpu_perfcntr *perfcntrs;
 	uint32_t num_perfcntrs;
 
-	/* ringbuffer: */
-	struct msm_ringbuffer *rb;
-	uint64_t rb_iova;
+	struct msm_ringbuffer *rb[MSM_GPU_MAX_RINGS];
+	int nr_rings;
 
 	/* list of GEM active objects: */
 	struct list_head active_list;
@@ -126,15 +127,36 @@ struct msm_gpu {
 #define DRM_MSM_HANGCHECK_PERIOD 500 /* in ms */
 #define DRM_MSM_HANGCHECK_JIFFIES msecs_to_jiffies(DRM_MSM_HANGCHECK_PERIOD)
 	struct timer_list hangcheck_timer;
-	uint32_t hangcheck_fence;
+	uint32_t hangcheck_fence[MSM_GPU_MAX_RINGS];
 	struct work_struct recover_work;
 
 	struct list_head submit_list;
 };
 
+/* It turns out that all targets use the same ringbuffer size */
+#define MSM_GPU_RINGBUFFER_SZ SZ_32K
+
+static inline struct msm_ringbuffer *__get_ring(struct msm_gpu *gpu, int index)
+{
+	return (index < ARRAY_SIZE(gpu->rb) ? gpu->rb[index] : NULL);
+}
+
+#define FOR_EACH_RING(gpu, ring, index) \
+	for (index = 0, ring = (gpu)->rb[0]; \
+		index < (gpu)->nr_rings && index < ARRAY_SIZE((gpu)->rb); \
+		index++, ring = __get_ring(gpu, index))
+
 static inline bool msm_gpu_active(struct msm_gpu *gpu)
 {
-	return gpu->fctx->last_fence > gpu->funcs->last_fence(gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (ring->last_fence > gpu->funcs->last_fence(gpu, ring))
+			return true;
+	}
+
+	return false;
 }
 
 /* Perf-Counters:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 67b34e0..2ab31c7 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -18,13 +18,13 @@
 #include "msm_ringbuffer.h"
 #include "msm_gpu.h"
 
-struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
+struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 {
 	struct msm_ringbuffer *ring;
 	int ret;
 
-	if (WARN_ON(!is_power_of_2(size)))
-		return ERR_PTR(-EINVAL);
+	/* We assume everwhere that MSM_GPU_RINGBUFFER_SZ is a power of 2 */
+	BUILD_BUG_ON(!is_power_of_2(MSM_GPU_RINGBUFFER_SZ));
 
 	ring = kzalloc(sizeof(*ring), GFP_KERNEL);
 	if (!ring) {
@@ -33,7 +33,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
 	}
 
 	ring->gpu = gpu;
-	ring->bo = msm_gem_new(gpu->dev, size, MSM_BO_WC);
+	ring->id = id;
+	ring->bo = msm_gem_new(gpu->dev, MSM_GPU_RINGBUFFER_SZ, MSM_BO_WC);
 	if (IS_ERR(ring->bo)) {
 		ret = PTR_ERR(ring->bo);
 		ring->bo = NULL;
@@ -45,11 +46,9 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size)
 		ret = PTR_ERR(ring->start);
 		goto fail;
 	}
-	ring->end   = ring->start + (size / 4);
+	ring->end   = ring->start + (MSM_GPU_RINGBUFFER_SZ >> 2);
 	ring->cur   = ring->start;
 
-	ring->size = size;
-
 	return ring;
 
 fail:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 6e0e104..4eb05fe 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -22,12 +22,16 @@
 
 struct msm_ringbuffer {
 	struct msm_gpu *gpu;
-	int size;
+	int id;
 	struct drm_gem_object *bo;
 	uint32_t *start, *end, *cur;
+	uint64_t iova;
+	/* last_fence == completed_fence --> no pending work */
+	uint32_t last_fence;
+	uint32_t completed_fence;
 };
 
-struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int size);
+struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id);
 void msm_ringbuffer_destroy(struct msm_ringbuffer *ring);
 
 /* ringbuffer helpers (the parts that are same for a3xx/a2xx/z180..) */
diff --git a/include/uapi/drm/msm_drm.h b/include/uapi/drm/msm_drm.h
index 05dc5b3..915634b 100644
--- a/include/uapi/drm/msm_drm.h
+++ b/include/uapi/drm/msm_drm.h
@@ -199,10 +199,15 @@ struct drm_msm_gem_submit_bo {
 #define MSM_SUBMIT_NO_IMPLICIT   0x80000000 /* disable implicit sync */
 #define MSM_SUBMIT_FENCE_FD_IN   0x40000000 /* enable input fence_fd */
 #define MSM_SUBMIT_FENCE_FD_OUT  0x20000000 /* enable output fence_fd */
+
+#define MSM_SUBMIT_RING_MASK 0x000F0000
+#define MSM_SUBMIT_RING_SHIFT 16
+
 #define MSM_SUBMIT_FLAGS                ( \
 		MSM_SUBMIT_NO_IMPLICIT   | \
 		MSM_SUBMIT_FENCE_FD_IN   | \
 		MSM_SUBMIT_FENCE_FD_OUT  | \
+		MSM_SUBMIT_RING_MASK	 | \
 		0)
 
 /* Each cmdstream submit consists of a table of buffers involved, and
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (6 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
  2017-03-07 16:58       ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Add a shadow pointer to track the current command being written into
the ring. Don't commit it as 'cur' until the command is submitted.
Because 'cur' is used to construct the software copy of the wptr this
ensures that somebody peeking in on the ring doesn't assume that a
command is inflight while it is being written. This isn't a huge deal
with a single ring (though technically the hangcheck could assume
the system is prematurely busy when it isn't) but it will be rather
important for preemption where the decision to preempt is based
on a non-empty ringbuffer. Without a shadow an aggressive preemption
scheme could assume that the ringbuffer is non empty and switch to it
before the CPU is done writing the command and boom.

Even though preemption won't be supported for all targets because of
the way the code is organized it is simpler to make this generic for
all targets. The extra load for non-preemption targets should be
minimal.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  9 +++++++--
 drivers/gpu/drm/msm/msm_ringbuffer.c    |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h    | 12 ++++++++----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 21c839f..b8c11a0 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -144,6 +144,7 @@ void adreno_recover(struct msm_gpu *gpu)
 			continue;
 
 		ring->cur = ring->start;
+		ring->next = ring->start;
 
 		/* reset completed fence seqno, discard anything pending: */
 		adreno_gpu->memptrs->fence[ring->id] = ring->completed_fence;
@@ -240,12 +241,15 @@ void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	uint32_t wptr;
 
+	/* Copy the shadow to the actual register */
+	ring->cur = ring->next;
+
 	/*
 	 * Mask wptr value that we calculate to fit in the HW range. This is
 	 * to account for the possibility that the last command fit exactly into
 	 * the ringbuffer and rb->next hasn't wrapped to zero yet
 	 */
-	wptr = get_wptr(ring) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+	wptr = (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
@@ -366,7 +370,8 @@ static uint32_t ring_freewords(struct msm_ringbuffer *ring)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(ring->gpu);
 	uint32_t size = MSM_GPU_RINGBUFFER_SZ >> 2;
-	uint32_t wptr = get_wptr(ring);
+	/* Use ring->next to calculate free size */
+	uint32_t wptr = ring->next - ring->start;
 	uint32_t rptr = get_rptr(adreno_gpu, ring);
 	return (rptr + (size - 1) - wptr) % size;
 }
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index 2ab31c7..b885979 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -47,6 +47,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 		goto fail;
 	}
 	ring->end   = ring->start + (MSM_GPU_RINGBUFFER_SZ >> 2);
+	ring->next  = ring->start;
 	ring->cur   = ring->start;
 
 	return ring;
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 4eb05fe..865b21a 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -24,7 +24,7 @@ struct msm_ringbuffer {
 	struct msm_gpu *gpu;
 	int id;
 	struct drm_gem_object *bo;
-	uint32_t *start, *end, *cur;
+	uint32_t *start, *end, *cur, *next;
 	uint64_t iova;
 	/* last_fence == completed_fence --> no pending work */
 	uint32_t last_fence;
@@ -39,9 +39,13 @@ struct msm_ringbuffer {
 static inline void
 OUT_RING(struct msm_ringbuffer *ring, uint32_t data)
 {
-	if (ring->cur == ring->end)
-		ring->cur = ring->start;
-	*(ring->cur++) = data;
+	/*
+	 * ring->next points to the current command being written - it won't be
+	 * committed as ring->cur until the flush
+	 */
+	if (ring->next == ring->end)
+		ring->next = ring->start;
+	*(ring->next++) = data;
 }
 
 #endif /* __MSM_RINGBUFFER_H__ */
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (7 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  2017-03-07 16:58       ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

We use a global ringbuffer size and block size for all targets and
at least for 5XX preemption we need to know the value the RB_CNTL
in several locations so it makes sense to caculate it once and use
it everywhere.

The only monkey wrench is that we need to disable the RPTR shadow
for A430 targets but that only needs to be done once and doesn't
affect A5XX so we can or in the value at init time.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 12 +++++++-----
 drivers/gpu/drm/msm/msm_gpu.h           |  5 +++++
 2 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index b8c11a0..8c23d92 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -21,7 +21,6 @@
 #include "msm_gem.h"
 #include "msm_mmu.h"
 
-#define RB_BLKSIZE 32
 
 int adreno_get_param(struct msm_gpu *gpu, uint32_t param, uint64_t *value)
 {
@@ -71,11 +70,14 @@ int adreno_hw_init(struct msm_gpu *gpu)
 		}
 	}
 
-	/* Setup REG_CP_RB_CNTL: */
+	/*
+	 * Setup REG_CP_RB_CNTL.  The same value is used across targets (with
+	 * the excpetion of A430 that disables the RPTR shadow) - the cacluation
+	 * for the ringbuffer size and block size is moved to msm_gpu.h for the
+	 * pre-processor to deal with and the A430 variant is ORed in here
+	 */
 	adreno_gpu_write(adreno_gpu, REG_ADRENO_CP_RB_CNTL,
-		/* size is log2(quad-words): */
-		AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) |
-		AXXX_CP_RB_CNTL_BLKSZ(ilog2(RB_BLKSIZE / 8)) |
+		MSM_GPU_RB_CNTL_DEFAULT |
 		(adreno_is_a430(adreno_gpu) ? AXXX_CP_RB_CNTL_NO_UPDATE : 0));
 
 	/* Setup ringbuffer address - use ringbuffer[0] for GPU init */
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 38d826a..50fef27 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -135,6 +135,11 @@ struct msm_gpu {
 
 /* It turns out that all targets use the same ringbuffer size */
 #define MSM_GPU_RINGBUFFER_SZ SZ_32K
+#define MSM_GPU_RINGBUFFER_BLKSIZE 32
+
+#define MSM_GPU_RB_CNTL_DEFAULT \
+		(AXXX_CP_RB_CNTL_BUFSZ(ilog2(MSM_GPU_RINGBUFFER_SZ / 8)) | \
+		AXXX_CP_RB_CNTL_BLKSZ(ilog2(MSM_GPU_RINGBUFFER_BLKSIZE / 8)))
 
 static inline struct msm_ringbuffer *__get_ring(struct msm_gpu *gpu, int index)
 {
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 11/11] drm/msm: Implement preemption for A5XX targets
       [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
                         ` (8 preceding siblings ...)
  2017-03-07 16:58       ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
@ 2017-03-07 16:58       ` Jordan Crouse
  9 siblings, 0 replies; 36+ messages in thread
From: Jordan Crouse @ 2017-03-07 16:58 UTC (permalink / raw)
  To: freedreno-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: linux-arm-msm-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Implement preemption for A5XX targets - this allows multiple
ringbuffers for different priorities with automatic preemption
of a lower priority ringbuffer if a higher one is ready.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---
 drivers/gpu/drm/msm/Makefile              |   1 +
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c     | 176 ++++++++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.h     | 102 ++++++++-
 drivers/gpu/drm/msm/adreno/a5xx_preempt.c | 345 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c   |  15 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h   |   7 +-
 drivers/gpu/drm/msm/msm_drv.h             |   2 +-
 drivers/gpu/drm/msm/msm_ringbuffer.c      |   2 +
 drivers/gpu/drm/msm/msm_ringbuffer.h      |   1 +
 9 files changed, 634 insertions(+), 17 deletions(-)
 create mode 100644 drivers/gpu/drm/msm/adreno/a5xx_preempt.c

diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile
index 3905536..a1d808b 100644
--- a/drivers/gpu/drm/msm/Makefile
+++ b/drivers/gpu/drm/msm/Makefile
@@ -8,6 +8,7 @@ msm-y := \
 	adreno/a4xx_gpu.o \
 	adreno/a5xx_gpu.o \
 	adreno/a5xx_power.o \
+	adreno/a5xx_preempt.o \
 	hdmi/hdmi.o \
 	hdmi/hdmi_audio.o \
 	hdmi/hdmi_bridge.o \
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 4ad98b9..fef1541 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -184,14 +184,66 @@ static int zap_load_mdt(struct platform_device *pdev)
 	return ret;
 }
 
+static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	uint32_t wptr;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ring->lock, flags);
+
+	/* Copy the shadow to the actual register */
+	ring->cur = ring->next;
+
+	/* Make sure to wrap wptr if we need to */
+	wptr = get_wptr(ring);
+
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	/* Make sure everything is posted before making a decision */
+	mb();
+
+	/* Update HW if this is the current ring and we are not in preempt */
+	if (a5xx_gpu->cur_ring == ring && !a5xx_in_preempt(a5xx_gpu))
+		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
+}
+
 static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 	struct msm_file_private *ctx)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
 	struct msm_drm_private *priv = gpu->dev->dev_private;
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i, ibs = 0;
 
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Turn off protected mode to write to special registers */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Set the save preemption record for the ring/command */
+	OUT_PKT4(ring, REG_A5XX_CP_CONTEXT_SWITCH_SAVE_ADDR_LO, 2);
+	OUT_RING(ring, lower_32_bits(a5xx_gpu->preempt_iova[submit->ring->id]));
+	OUT_RING(ring, upper_32_bits(a5xx_gpu->preempt_iova[submit->ring->id]));
+
+	/* Turn back on protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+
+	/* Enable local preemption for finegrain preemption */
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Allow CP_CONTEXT_SWITCH_YIELD packets in the IB2 */
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x02);
+
+	/* Submit the commands */
 	for (i = 0; i < submit->nr_cmds; i++) {
 		switch (submit->cmd[i].type) {
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
@@ -209,16 +261,54 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		}
 	}
 
+	/*
+	 * Write the render mode to NULL (0) to indicate to the CP that the IBs
+	 * are done rendering - otherwise a lucky preemption would start
+	 * replaying from the last checkpoint
+	 */
+	OUT_PKT7(ring, CP_SET_RENDER_MODE, 5);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+	OUT_RING(ring, 0);
+
+	/* Turn off IB level preemptions */
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x01);
+
+	/* Write the fence to the scratch register */
 	OUT_PKT4(ring, REG_A5XX_CP_SCRATCH_REG(2), 1);
 	OUT_RING(ring, submit->fence->seqno);
 
+	/*
+	 * Execute a CACHE_FLUSH_TS event. This will ensure that the
+	 * timestamp is written to the memory and then triggers the interrupt
+	 */
 	OUT_PKT7(ring, CP_EVENT_WRITE, 4);
 	OUT_RING(ring, CACHE_FLUSH_TS | (1 << 31));
 	OUT_RING(ring, lower_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, upper_32_bits(rbmemptr(adreno_gpu, ring->id, fence)));
 	OUT_RING(ring, submit->fence->seqno);
 
-	gpu->funcs->flush(gpu, ring);
+	/* Yield the floor on command completion */
+	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+	/*
+	 * If dword[2:1] are non zero, they specify an address for the CP to
+	 * write the value of dword[3] to on preemption complete. Write 0 to
+	 * skip the write
+	 */
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	/* Data value - not used if the address above is 0 */
+	OUT_RING(ring, 0x01);
+	/* Set bit 0 to trigger an interrupt on preempt complete */
+	OUT_RING(ring, 0x01);
+
+	a5xx_flush(gpu, ring);
+
+	/* Check to see if we need to start preemption */
+	a5xx_preempt_trigger(gpu);
 }
 
 struct a5xx_hwcg {
@@ -393,6 +483,50 @@ static int a5xx_me_init(struct msm_gpu *gpu)
 	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
 }
 
+static int a5xx_preempt_start(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring = gpu->rb[0];
+
+	if (gpu->nr_rings == 1)
+		return 0;
+
+	/* Turn off protected mode to write to special registers */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 0);
+
+	/* Set the save preemption record for the ring/command */
+	OUT_PKT4(ring, REG_A5XX_CP_CONTEXT_SWITCH_SAVE_ADDR_LO, 2);
+	OUT_RING(ring, lower_32_bits(a5xx_gpu->preempt_iova[ring->id]));
+	OUT_RING(ring, upper_32_bits(a5xx_gpu->preempt_iova[ring->id]));
+
+	/* Turn back on protected mode */
+	OUT_PKT7(ring, CP_SET_PROTECTED_MODE, 1);
+	OUT_RING(ring, 1);
+
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_GLOBAL, 1);
+	OUT_RING(ring, 0x00);
+
+	OUT_PKT7(ring, CP_PREEMPT_ENABLE_LOCAL, 1);
+	OUT_RING(ring, 0x01);
+
+	OUT_PKT7(ring, CP_YIELD_ENABLE, 1);
+	OUT_RING(ring, 0x01);
+
+	/* Yield the floor on command completion */
+	OUT_PKT7(ring, CP_CONTEXT_SWITCH_YIELD, 4);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x00);
+	OUT_RING(ring, 0x01);
+	OUT_RING(ring, 0x01);
+
+	gpu->funcs->flush(gpu, ring);
+
+	return a5xx_idle(gpu, ring) ? 0 : -EINVAL;
+}
+
+
 static struct drm_gem_object *a5xx_ucode_load_bo(struct msm_gpu *gpu,
 		const struct firmware *fw, u64 *iova)
 {
@@ -525,6 +659,7 @@ static int a5xx_zap_shader_init(struct msm_gpu *gpu)
 	  A5XX_RBBM_INT_0_MASK_RBBM_ETS_MS_TIMEOUT | \
 	  A5XX_RBBM_INT_0_MASK_RBBM_ATB_ASYNC_OVERFLOW | \
 	  A5XX_RBBM_INT_0_MASK_CP_HW_ERROR | \
+	  A5XX_RBBM_INT_0_MASK_CP_SW | \
 	  A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS | \
 	  A5XX_RBBM_INT_0_MASK_UCHE_OOB_ACCESS | \
 	  A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
@@ -672,6 +807,8 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 	if (ret)
 		return ret;
 
+	a5xx_preempt_hw_init(gpu);
+
 	ret = a5xx_ucode_init(gpu);
 	if (ret)
 		return ret;
@@ -724,6 +861,9 @@ static int a5xx_hw_init(struct msm_gpu *gpu)
 		gpu_write(gpu, REG_A5XX_RBBM_SECVID_TRUST_CNTL, 0x0);
 	}
 
+	/* Last step - yield the ringbuffer */
+	a5xx_preempt_start(gpu);
+
 	return 0;
 }
 
@@ -754,6 +894,8 @@ static void a5xx_destroy(struct msm_gpu *gpu)
 
 	DBG("%s", gpu->name);
 
+	a5xx_preempt_fini(gpu);
+
 	if (a5xx_gpu->pm4_bo) {
 		if (a5xx_gpu->pm4_iova)
 			msm_gem_put_iova(a5xx_gpu->pm4_bo, gpu->aspace);
@@ -791,6 +933,14 @@ static inline bool _a5xx_check_idle(struct msm_gpu *gpu)
 
 bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 {
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
+	if (ring != a5xx_gpu->cur_ring) {
+		WARN(1, "Tried to idle a non-current ringbuffer\n");
+		return false;
+	}
+
 	/* wait for CP to drain ringbuffer: */
 	if (!adreno_idle(gpu, ring))
 		return false;
@@ -962,8 +1112,13 @@ static irqreturn_t a5xx_irq(struct msm_gpu *gpu)
 	if (status & A5XX_RBBM_INT_0_MASK_GPMU_VOLTAGE_DROOP)
 		a5xx_gpmu_err_irq(gpu);
 
-	if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS)
+	if (status & A5XX_RBBM_INT_0_MASK_CP_CACHE_FLUSH_TS) {
+		a5xx_preempt_trigger(gpu);
 		msm_gpu_retire(gpu);
+	}
+
+	if (status & A5XX_RBBM_INT_0_MASK_CP_SW)
+		a5xx_preempt_irq(gpu);
 
 	return IRQ_HANDLED;
 }
@@ -1091,6 +1246,14 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 }
 #endif
 
+static struct msm_ringbuffer *a5xx_active_ring(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+
+	return a5xx_gpu->cur_ring;
+}
+
 static const struct adreno_gpu_funcs funcs = {
 	.base = {
 		.get_param = adreno_get_param,
@@ -1100,8 +1263,8 @@ static void a5xx_show(struct msm_gpu *gpu, struct seq_file *m)
 		.recover = a5xx_recover,
 		.last_fence = adreno_last_fence,
 		.submit = a5xx_submit,
-		.flush = adreno_flush,
-		.active_ring = adreno_active_ring,
+		.flush = a5xx_flush,
+		.active_ring = a5xx_active_ring,
 		.irq = a5xx_irq,
 		.destroy = a5xx_destroy,
 		.show = a5xx_show,
@@ -1136,7 +1299,7 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 
 	a5xx_gpu->lm_leakage = 0x4E001A;
 
-	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 1);
+	ret = adreno_gpu_init(dev, pdev, adreno_gpu, &funcs, 4);
 	if (ret) {
 		a5xx_destroy(&(a5xx_gpu->base.base));
 		return ERR_PTR(ret);
@@ -1145,5 +1308,8 @@ struct msm_gpu *a5xx_gpu_init(struct drm_device *dev)
 	if (gpu->aspace)
 		msm_mmu_set_fault_handler(gpu->aspace->mmu, gpu, a5xx_fault_handler);
 
+	/* Set up the preemption specific bits and pieces for each ringbuffer */
+	a5xx_preempt_init(gpu);
+
 	return gpu;
 }
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
index 405b563..f042a78 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.h
@@ -1,4 +1,4 @@
-/* Copyright (c) 2016 The Linux Foundation. All rights reserved.
+/* Copyright (c) 2016-2017 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 and
@@ -36,10 +36,98 @@ struct a5xx_gpu {
 	uint32_t gpmu_dwords;
 
 	uint32_t lm_leakage;
+
+	struct msm_ringbuffer *cur_ring;
+	struct msm_ringbuffer *next_ring;
+
+	struct drm_gem_object *preempt_bo[MSM_GPU_MAX_RINGS];
+	struct a5xx_preempt_record *preempt[MSM_GPU_MAX_RINGS];
+	uint64_t preempt_iova[MSM_GPU_MAX_RINGS];
+
+	atomic_t preempt_state;
+	struct timer_list preempt_timer;
+
 };
 
 #define to_a5xx_gpu(x) container_of(x, struct a5xx_gpu, base)
 
+/*
+ * In order to do lockless preemption we use a simple state machine to progress
+ * through the process.
+ *
+ * PREEMPT_NONE - no preemption in progress.  Next state START.
+ * PREEMPT_START - The trigger is evaulating if preemption is possible. Next
+ * states: TRIGGERED, NONE
+ * PREEMPT_TRIGGERED: A preemption has been executed on the hardware. Next
+ * states: FAULTED, PENDING
+ * PREEMPT_FAULTED: A preemption timed out (never completed). This will trigger
+ * recovery.  Next state: N/A
+ * PREEMPT_PENDING: Preemption complete interrupt fired - the callback is
+ * checking the success of the operation. Next state: FAULTED, NONE.
+ */
+
+enum preempt_state {
+	PREEMPT_NONE = 0,
+	PREEMPT_START,
+	PREEMPT_TRIGGERED,
+	PREEMPT_FAULTED,
+	PREEMPT_PENDING,
+};
+
+/*
+ * struct a5xx_preempt_record is a shared buffer between the microcode and the
+ * CPU to store the state for preemption. The record itself is much larger
+ * (64k) but most of that is used by the CP for storage.
+ *
+ * There is a preemption record assigned per ringbuffer. When the CPU triggers a
+ * preemption, it fills out the record with the useful information (wptr, ring
+ * base, etc) and the microcode uses that information to set up the CP following
+ * the preemption.  When a ring is switched out, the CP will save the ringbuffer
+ * state back to the record. In this way, once the records are properly set up
+ * the CPU can quickly switch back and forth between ringbuffers by only
+ * updating a few registers (often only the wptr).
+ *
+ * These are the CPU aware registers in the record:
+ * @magic: Must always be 0x27C4BAFC
+ * @info: Type of the record - written 0 by the CPU, updated by the CP
+ * @data: Data field from SET_RENDER_MODE or a checkpoint. Written and used by
+ * the CP
+ * @cntl: Value of RB_CNTL written by CPU, save/restored by CP
+ * @rptr: Value of RB_RPTR written by CPU, save/restored by CP
+ * @wptr: Value of RB_WPTR written by CPU, save/restored by CP
+ * @rptr_addr: Value of RB_RPTR_ADDR written by CPU, save/restored by CP
+ * @rbase: Value of RB_BASE written by CPU, save/restored by CP
+ * @counter: GPU address of the storage area for the performance counters
+ */
+struct a5xx_preempt_record {
+	uint32_t magic;
+	uint32_t info;
+	uint32_t data;
+	uint32_t cntl;
+	uint32_t rptr;
+	uint32_t wptr;
+	uint64_t rptr_addr;
+	uint64_t rbase;
+	uint64_t counter;
+};
+
+/* Magic identifier for the preemption record */
+#define A5XX_PREEMPT_RECORD_MAGIC 0x27C4BAFCUL
+
+/*
+ * Even though the structure above is only a few bytes, we need a full 64k to
+ * store the entire preemption record from the CP
+ */
+#define A5XX_PREEMPT_RECORD_SIZE (64 * 1024)
+
+/*
+ * The preemption counter block is a storage area for the value of the
+ * preemption counters that are saved immediately before context switch. We
+ * append it on to the end of the allocadtion for the preemption record.
+ */
+#define A5XX_PREEMPT_COUNTER_SIZE (16 * 4)
+
+
 int a5xx_power_init(struct msm_gpu *gpu);
 void a5xx_gpmu_ucode_init(struct msm_gpu *gpu);
 
@@ -58,4 +146,16 @@ static inline int spin_usecs(struct msm_gpu *gpu, uint32_t usecs,
 
 bool a5xx_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 
+void a5xx_preempt_init(struct msm_gpu *gpu);
+void a5xx_preempt_hw_init(struct msm_gpu *gpu);
+void a5xx_preempt_trigger(struct msm_gpu *gpu);
+void a5xx_preempt_irq(struct msm_gpu *gpu);
+void a5xx_preempt_fini(struct msm_gpu *gpu);
+
+/* Return true if we are in a preempt state */
+static inline bool a5xx_in_preempt(struct a5xx_gpu *a5xx_gpu)
+{
+	return !(atomic_read(&a5xx_gpu->preempt_state) == PREEMPT_NONE);
+}
+
 #endif /* __A5XX_GPU_H__ */
diff --git a/drivers/gpu/drm/msm/adreno/a5xx_preempt.c b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
new file mode 100644
index 0000000..582ba9b
--- /dev/null
+++ b/drivers/gpu/drm/msm/adreno/a5xx_preempt.c
@@ -0,0 +1,345 @@
+/* Copyright (c) 2017 The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "msm_gem.h"
+#include "a5xx_gpu.h"
+
+static void *alloc_kernel_bo(struct drm_device *drm, struct msm_gpu *gpu,
+		size_t size, uint32_t flags, struct drm_gem_object **bo,
+		u64 *iova)
+{
+	struct drm_gem_object *_bo;
+	u64 _iova;
+	void *ptr;
+	int ret;
+
+	mutex_lock(&drm->struct_mutex);
+	_bo = msm_gem_new(drm, size, flags);
+	mutex_unlock(&drm->struct_mutex);
+
+	if (IS_ERR(_bo))
+		return _bo;
+
+	ret = msm_gem_get_iova(_bo, gpu->aspace, &_iova);
+	if (ret)
+		goto out;
+
+	ptr = msm_gem_get_vaddr(_bo);
+	if (!ptr) {
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (bo)
+		*bo = _bo;
+	if (iova)
+		*iova = _iova;
+
+	return ptr;
+out:
+	drm_gem_object_unreference_unlocked(_bo);
+	return ERR_PTR(ret);
+}
+
+/*
+ * Try to transition the preemption state from old to new. Return
+ * true on success or false if the original state wasn't 'old'
+ */
+static inline bool try_preempt_state(struct a5xx_gpu *a5xx_gpu,
+		enum preempt_state old, enum preempt_state new)
+{
+	enum preempt_state cur = atomic_cmpxchg(&a5xx_gpu->preempt_state,
+		old, new);
+
+	return (cur == old);
+}
+
+/*
+ * Force the preemption state to the specified state.  This is used in cases
+ * where the current state is known and won't change
+ */
+static inline void set_preempt_state(struct a5xx_gpu *gpu,
+		enum preempt_state new)
+{
+	/*
+	 * preempt_state may be read by other cores trying to trigger a
+	 * preemption or in the interrupt handler so barriers are needed
+	 * before...
+	 */
+	smp_mb__before_atomic();
+	atomic_set(&gpu->preempt_state, new);
+	/* ... and after*/
+	smp_mb__after_atomic();
+}
+
+/* Write the most recent wptr for the given ring into the hardware */
+static inline void update_wptr(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
+{
+	unsigned long flags;
+	uint32_t wptr;
+
+	if (!ring)
+		return;
+
+	spin_lock_irqsave(&ring->lock, flags);
+	wptr = get_wptr(ring);
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
+}
+
+/* Return the highest priority ringbuffer with something in it */
+static struct msm_ringbuffer *get_next_ring(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	unsigned long flags;
+	int i;
+
+	for (i = gpu->nr_rings - 1; i >= 0; i--) {
+		bool empty;
+		struct msm_ringbuffer *ring = gpu->rb[i];
+
+		spin_lock_irqsave(&ring->lock, flags);
+		empty = (get_wptr(ring) == adreno_gpu->memptrs->rptr[ring->id]);
+		spin_unlock_irqrestore(&ring->lock, flags);
+
+		if (!empty)
+			return ring;
+	}
+
+	return NULL;
+}
+
+static void a5xx_preempt_timer(unsigned long data)
+{
+	struct a5xx_gpu *a5xx_gpu = (struct a5xx_gpu *) data;
+	struct msm_gpu *gpu = &a5xx_gpu->base.base;
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED, PREEMPT_FAULTED))
+		return;
+
+	dev_err(dev->dev, "%s: preemption timed out\n", gpu->name);
+	queue_work(priv->wq, &gpu->recover_work);
+}
+
+/* Try to trigger a preemption switch */
+void a5xx_preempt_trigger(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	unsigned long flags;
+	struct msm_ringbuffer *ring;
+
+	if (gpu->nr_rings == 1)
+		return;
+
+	/*
+	 * Try to start preemption by moving from NONE to START. If
+	 * unsuccessful, a preemption is already in flight
+	 */
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_NONE, PREEMPT_START))
+		return;
+
+	/* Get the next ring to preempt to */
+	ring = get_next_ring(gpu);
+
+	/*
+	 * If no ring is populated or the highest priority ring is the current
+	 * one do nothing except to update the wptr to the latest and greatest
+	 */
+	if (!ring || (a5xx_gpu->cur_ring == ring)) {
+		update_wptr(gpu, ring);
+
+		/* Set the state back to NONE */
+		set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+		return;
+	}
+
+	/* Make sure the wptr doesn't update while we're in motion */
+	spin_lock_irqsave(&ring->lock, flags);
+	a5xx_gpu->preempt[ring->id]->wptr = get_wptr(ring);
+	spin_unlock_irqrestore(&ring->lock, flags);
+
+	/* Set the address of the incoming preemption record */
+	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_LO,
+		REG_A5XX_CP_CONTEXT_SWITCH_RESTORE_ADDR_HI,
+		a5xx_gpu->preempt_iova[ring->id]);
+
+	a5xx_gpu->next_ring = ring;
+
+	/* Start a timer to catch a stuck preemption */
+	mod_timer(&a5xx_gpu->preempt_timer, jiffies + msecs_to_jiffies(10000));
+
+	/* Set the preemption state to triggered */
+	set_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED);
+
+	/* Make sure everything is written before hitting the button */
+	wmb();
+
+	/* And actually start the preemption */
+	gpu_write(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL, 1);
+}
+
+void a5xx_preempt_irq(struct msm_gpu *gpu)
+{
+	uint32_t status;
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct drm_device *dev = gpu->dev;
+	struct msm_drm_private *priv = dev->dev_private;
+
+	if (!try_preempt_state(a5xx_gpu, PREEMPT_TRIGGERED, PREEMPT_PENDING))
+		return;
+
+	/* Delete the preemption watchdog timer */
+	del_timer(&a5xx_gpu->preempt_timer);
+
+	/*
+	 * The hardware should be setting CP_CONTEXT_SWITCH_CNTL to zero before
+	 * firing the interrupt, but there is a non zero chance of a hardware
+	 * condition or a software race that could set it again before we have a
+	 * chance to finish. If that happens, log and go for recovery
+	 */
+	status = gpu_read(gpu, REG_A5XX_CP_CONTEXT_SWITCH_CNTL);
+	if (unlikely(status)) {
+		set_preempt_state(a5xx_gpu, PREEMPT_FAULTED);
+		dev_err(dev->dev, "%s: Preemption failed to complete\n",
+			gpu->name);
+		queue_work(priv->wq, &gpu->recover_work);
+		return;
+	}
+
+	a5xx_gpu->cur_ring = a5xx_gpu->next_ring;
+	a5xx_gpu->next_ring = NULL;
+
+	update_wptr(gpu, a5xx_gpu->cur_ring);
+
+	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+}
+
+void a5xx_preempt_hw_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	if (gpu->nr_rings > 1) {
+		/* Clear the preemption records */
+		FOR_EACH_RING(gpu, ring, i) {
+			if (ring) {
+				a5xx_gpu->preempt[ring->id]->wptr = 0;
+				a5xx_gpu->preempt[ring->id]->rptr = 0;
+				a5xx_gpu->preempt[ring->id]->rbase = ring->iova;
+			}
+		}
+	}
+
+	/* Write a 0 to signal that we aren't switching pagetables */
+	gpu_write64(gpu, REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_LO,
+		REG_A5XX_CP_CONTEXT_SWITCH_SMMU_INFO_HI, 0);
+
+	/* Reset the preemption state */
+	set_preempt_state(a5xx_gpu, PREEMPT_NONE);
+
+	/* Always come up on rb 0 */
+	a5xx_gpu->cur_ring = gpu->rb[0];
+}
+
+static int preempt_init_ring(struct a5xx_gpu *a5xx_gpu,
+		struct msm_ringbuffer *ring)
+{
+	struct adreno_gpu *adreno_gpu = &a5xx_gpu->base;
+	struct msm_gpu *gpu = &adreno_gpu->base;
+	struct a5xx_preempt_record *ptr;
+	struct drm_gem_object *bo;
+	u64 iova;
+
+	ptr = alloc_kernel_bo(gpu->dev, gpu,
+		A5XX_PREEMPT_RECORD_SIZE + A5XX_PREEMPT_COUNTER_SIZE,
+		MSM_BO_UNCACHED, &bo, &iova);
+
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+
+	a5xx_gpu->preempt_bo[ring->id] = bo;
+	a5xx_gpu->preempt_iova[ring->id] = iova;
+	a5xx_gpu->preempt[ring->id] = ptr;
+
+	/* Set up the defaults on the preemption record */
+
+	ptr->magic = A5XX_PREEMPT_RECORD_MAGIC;
+	ptr->info = 0;
+	ptr->data = 0;
+	ptr->cntl = MSM_GPU_RB_CNTL_DEFAULT;
+	ptr->rptr_addr = rbmemptr(adreno_gpu, ring->id, rptr);
+	ptr->counter = iova + A5XX_PREEMPT_RECORD_SIZE;
+
+	return 0;
+}
+
+void a5xx_preempt_fini(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring || !a5xx_gpu->preempt_bo[i])
+			continue;
+
+		if (a5xx_gpu->preempt[i])
+			msm_gem_put_vaddr(a5xx_gpu->preempt_bo[i]);
+
+		if (a5xx_gpu->preempt_iova[i])
+			msm_gem_put_iova(a5xx_gpu->preempt_bo[i], gpu->aspace);
+
+		drm_gem_object_unreference_unlocked(a5xx_gpu->preempt_bo[i]);
+
+		a5xx_gpu->preempt_bo[i] = NULL;
+	}
+}
+
+void a5xx_preempt_init(struct msm_gpu *gpu)
+{
+	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
+	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
+	struct msm_ringbuffer *ring;
+	int i;
+
+	/* No preemption if we only have one ring */
+	if (gpu->nr_rings <= 1)
+		return;
+
+	FOR_EACH_RING(gpu, ring, i) {
+		if (!ring)
+			continue;
+
+		if (preempt_init_ring(a5xx_gpu, ring)) {
+			/*
+			 * On any failure our adventure is over. Clean up and
+			 * set nr_rings to 1 to force preemption off
+			 */
+			a5xx_preempt_fini(gpu);
+			gpu->nr_rings = 1;
+
+			return;
+		}
+	}
+
+	setup_timer(&a5xx_gpu->preempt_timer, a5xx_preempt_timer,
+		(unsigned long) a5xx_gpu);
+}
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 8c23d92..2debc76 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -93,11 +93,6 @@ int adreno_hw_init(struct msm_gpu *gpu)
 	return 0;
 }
 
-static uint32_t get_wptr(struct msm_ringbuffer *ring)
-{
-	return ring->cur - ring->start;
-}
-
 /* Use this helper to read rptr, since a430 doesn't update rptr in memory */
 static uint32_t get_rptr(struct adreno_gpu *adreno_gpu,
 		struct msm_ringbuffer *ring)
@@ -145,6 +140,7 @@ void adreno_recover(struct msm_gpu *gpu)
 		if (!ring)
 			continue;
 
+		/* No need for a lock here, nobody else is peeking in */
 		ring->cur = ring->start;
 		ring->next = ring->start;
 
@@ -184,7 +180,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		case MSM_SUBMIT_CMD_BUF:
 			OUT_PKT3(ring, adreno_is_a430(adreno_gpu) ?
 				CP_INDIRECT_BUFFER_PFE : CP_INDIRECT_BUFFER_PFD, 2);
-			OUT_RING(ring, submit->cmd[i].iova);
+			OUT_RING(ring, lower_32_bits(submit->cmd[i].iova));
 			OUT_RING(ring, submit->cmd[i].size);
 			OUT_PKT2(ring);
 			break;
@@ -251,7 +247,7 @@ void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 	 * to account for the possibility that the last command fit exactly into
 	 * the ringbuffer and rb->next hasn't wrapped to zero yet
 	 */
-	wptr = (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+	wptr = get_wptr(ring);
 
 	/* ensure writes to ringbuffer have hit system memory: */
 	mb();
@@ -269,8 +265,9 @@ bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		return true;
 
 	/* TODO maybe we need to reset GPU here to recover from hang? */
-	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d!\n", gpu->name,
-		ring->id);
+	DRM_ERROR("%s: timeout waiting to drain ringbuffer %d rptr/wptr = %X/%X\n",
+		gpu->name, ring->id, get_rptr(adreno_gpu, ring), wptr);
+
 	return false;
 }
 
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index e05fa8e..4a3630f 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -2,7 +2,7 @@
  * Copyright (C) 2013 Red Hat
  * Author: Rob Clark <robdclark@gmail.com>
  *
- * Copyright (c) 2014 The Linux Foundation. All rights reserved.
+ * Copyright (c) 2014,2017 The Linux Foundation. All rights reserved.
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License version 2 as published by
@@ -332,6 +332,11 @@ static inline void adreno_gpu_write64(struct adreno_gpu *gpu,
 	adreno_gpu_write(gpu, hi, upper_32_bits(data));
 }
 
+static inline uint32_t get_wptr(struct msm_ringbuffer *ring)
+{
+	return (ring->cur - ring->start) % (MSM_GPU_RINGBUFFER_SZ >> 2);
+}
+
 /*
  * Given a register and a count, return a value to program into
  * REG_CP_PROTECT_REG(n) - this will block both reads and writes for _len
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 600c39c..08ad9e4 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -78,7 +78,7 @@ struct msm_vblank_ctrl {
 	spinlock_t lock;
 };
 
-#define MSM_GPU_MAX_RINGS 1
+#define MSM_GPU_MAX_RINGS 4
 
 struct msm_drm_private {
 
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c
index b885979..f42ce09 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.c
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.c
@@ -50,6 +50,8 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id)
 	ring->next  = ring->start;
 	ring->cur   = ring->start;
 
+	spin_lock_init(&ring->lock);
+
 	return ring;
 
 fail:
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 865b21a..0f91db0 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -29,6 +29,7 @@ struct msm_ringbuffer {
 	/* last_fence == completed_fence --> no pending work */
 	uint32_t last_fence;
 	uint32_t completed_fence;
+	spinlock_t lock;
 };
 
 struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id);
-- 
1.9.1

_______________________________________________
Freedreno mailing list
Freedreno@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno

^ permalink raw reply related	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-03-07 16:58 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-06 17:39 [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
2017-02-06 17:39 ` [PATCH 03/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
2017-02-06 19:20   ` Emil Velikov
     [not found]     ` <CACvgo513+d19O2rzZ8NXEFgojUQkm2XPae-AdOXXReLM_a1euw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-06 19:57       ` Rob Clark
     [not found]         ` <CAF6AEGvUoW2695_HjgfGbpbPaSnOB2gfPa=3UMTDGvom+DxcwA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-02-06 20:24           ` Emil Velikov
2017-02-06 21:01             ` Rob Clark
2017-02-06 17:39 ` [PATCH 04/11] drm/msm: Remove idle function hook Jordan Crouse
2017-02-06 17:39 ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
2017-02-09  5:01   ` Archit Taneja
2017-02-06 17:39 ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
2017-02-06 17:39 ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
2017-02-06 17:39 ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
2017-02-06 17:39 ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
2017-02-06 17:39 ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
2017-02-06 17:39 ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse
     [not found]   ` <1486402779-9024-12-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-02-08 20:30     ` Stephen Boyd
     [not found]       ` <8696f3b7-1fbd-309a-1d68-b2f8ad89a30c-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-02-08 23:00         ` Jordan Crouse
2017-02-09  0:03           ` Stephen Boyd
2017-02-06 17:59 ` [PATCH 00/11] drm/msm: A5XX preemption Daniel Vetter
2017-02-06 18:23   ` Daniel Stone
2017-02-06 18:29     ` [Intel-gfx] " Rob Clark
2017-02-06 18:29   ` Alex Deucher
     [not found] ` <1486402779-9024-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-02-06 17:39   ` [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup Jordan Crouse
2017-02-06 17:39   ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
2017-03-07 16:58   ` [v2] [PATCH 00/11] drm/msm: A5XX preemption Jordan Crouse
2017-03-07 16:58     ` [PATCH 01/11] drm/msm: Make sure to detach the MMU during GPU cleanup Jordan Crouse
     [not found]     ` <1488905900-6603-1-git-send-email-jcrouse-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
2017-03-07 16:58       ` [PATCH 02/11] drm/msm: Improve the zap shader Jordan Crouse
2017-03-07 16:58       ` [PATCH 03/11] drm/msm: Remove idle function hook Jordan Crouse
2017-03-07 16:58       ` [PATCH 04/11] drm/msm: Add hint to DRM_IOCTL_MSM_GEM_INFO to return an object IOVA Jordan Crouse
2017-03-07 16:58       ` [PATCH 05/11] drm/msm: get an iova from the address space instead of an id Jordan Crouse
2017-03-07 16:58       ` [PATCH 06/11] drm/msm: Add a struct to pass configuration to msm_gpu_init() Jordan Crouse
2017-03-07 16:58       ` [PATCH 07/11] drm/msm: Remove memptrs->wptr Jordan Crouse
2017-03-07 16:58       ` [PATCH 08/11] drm/msm: Support multiple ringbuffers Jordan Crouse
2017-03-07 16:58       ` [PATCH 09/11] drm/msm: Shadow current pointer in the ring until command is complete Jordan Crouse
2017-03-07 16:58       ` [PATCH 10/11] drm/msm: Make the value of RB_CNTL (almost) generic Jordan Crouse
2017-03-07 16:58       ` [PATCH 11/11] drm/msm: Implement preemption for A5XX targets Jordan Crouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.