All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-10 21:50 Oded Gabbay
  2014-07-10 21:50 ` [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd Oded Gabbay
                   ` (26 more replies)
  0 siblings, 27 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

To support HSA on KV, we need to limit the number of vmids and pipes
that are available for radeon's use with KV.

This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
0-7) and also makes radeon thinks that KV has only a single MEC with a single
pipe in it

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 4bfc2c0..e0c8052 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
 	/*
 	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
+	 * Nonetheless, we assign only 1 pipe because all other pipes will
+	 * be handled by KFD
 	 */
-	if (rdev->family == CHIP_KAVERI)
-		rdev->mec.num_mec = 2;
-	else
-		rdev->mec.num_mec = 1;
-	rdev->mec.num_pipe = 4;
+	rdev->mec.num_mec = 1;
+	rdev->mec.num_pipe = 1;
 	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
 
 	if (rdev->mec.hpd_eop_obj == NULL) {
@@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
 
 	/* init the pipes */
 	mutex_lock(&rdev->srbm_mutex);
-	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
-		int me = (i < 4) ? 1 : 2;
-		int pipe = (i < 4) ? i : (i - 4);
 
-		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
+	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
 
-		cik_srbm_select(rdev, me, pipe, 0, 0);
+	cik_srbm_select(rdev, 0, 0, 0, 0);
 
-		/* write the EOP addr */
-		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+	/* write the EOP addr */
+	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-		/* set the VMID assigned */
-		WREG32(CP_HPD_EOP_VMID, 0);
+	/* set the VMID assigned */
+	WREG32(CP_HPD_EOP_VMID, 0);
+
+	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+	tmp = RREG32(CP_HPD_EOP_CONTROL);
+	tmp &= ~EOP_SIZE_MASK;
+	tmp |= order_base_2(MEC_HPD_SIZE / 8);
+	WREG32(CP_HPD_EOP_CONTROL, tmp);
 
-		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-		tmp = RREG32(CP_HPD_EOP_CONTROL);
-		tmp &= ~EOP_SIZE_MASK;
-		tmp |= order_base_2(MEC_HPD_SIZE / 8);
-		WREG32(CP_HPD_EOP_CONTROL, tmp);
-	}
-	cik_srbm_select(rdev, 0, 0, 0, 0);
 	mutex_unlock(&rdev->srbm_mutex);
 
 	/* init the queues.  Just two for now. */
@@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
  */
 int cik_vm_init(struct radeon_device *rdev)
 {
-	/* number of VMs */
-	rdev->vm_manager.nvm = 16;
+	/*
+	 * number of VMs
+	 * VMID 0 is reserved for Graphics
+	 * radeon compute will use VMIDs 1-7
+	 * KFD will use VMIDs 8-15
+	 */
+	rdev->vm_manager.nvm = 8;
 	/* base offset of vram pages */
 	if (rdev->flags & RADEON_IS_IGP) {
 		u64 tmp = RREG32(MC_VM_FB_OFFSET);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 16:16     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface Oded Gabbay
                   ` (25 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

Radeon and KFD share the doorbell aperture.
Radeon sets it up, takes the doorbells required for its own rings
and reports the setup to KFD.
Radeon reserved doorbells are at the start of the doorbell aperture.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  4 ++++
 drivers/gpu/drm/radeon/radeon_device.c | 31 +++++++++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7cda75d..4e7e41f 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,6 +676,10 @@ struct radeon_doorbell {
 
 int radeon_doorbell_get(struct radeon_device *rdev, u32 *page);
 void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset);
 
 /*
  * IRQS.
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 03686fa..98538d2 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -328,6 +328,37 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell)
 		__clear_bit(doorbell, rdev->doorbell.used);
 }
 
+/**
+ * radeon_doorbell_get_kfd_info - Report doorbell configuration required to
+ *                                setup KFD
+ *
+ * @rdev: radeon_device pointer
+ * @aperture_base: output returning doorbell aperture base physical address
+ * @aperture_size: output returning doorbell aperture size in bytes
+ * @start_offset: output returning # of doorbell bytes reserved for radeon.
+ *
+ * Radeon and the KFD share the doorbell aperture. Radeon sets it up,
+ * takes doorbells required for its own rings and reports the setup to KFD.
+ * Radeon reserved doorbells are at the start of the doorbell aperture.
+ */
+void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
+				  phys_addr_t *aperture_base,
+				  size_t *aperture_size,
+				  size_t *start_offset)
+{
+	/* The first num_doorbells are used by radeon.
+	 * KFD takes whatever's left in the aperture. */
+	if (rdev->doorbell.size > rdev->doorbell.num_doorbells * sizeof(u32)) {
+		*aperture_base = rdev->doorbell.base;
+		*aperture_size = rdev->doorbell.size;
+		*start_offset = rdev->doorbell.num_doorbells * sizeof(u32);
+	} else {
+		*aperture_base = 0;
+		*aperture_size = 0;
+		*start_offset = 0;
+	}
+}
+
 /*
  * radeon_wb_*()
  * Writeback is the the method by which the the GPU updates special pages
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
  2014-07-10 21:50 ` [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 22:38     ` Joe Perches
  2014-07-10 21:50 ` [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size Oded Gabbay
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

This patch adds the interface between the radeon driver and the kfd
driver. The interface implementation is contained in
radeon_kfd.c and radeon_kfd.h.

The interface itself is represented by a pointer to struct
kfd_dev. The pointer is located inside radeon_device structure.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/Makefile     |  1 +
 drivers/gpu/drm/radeon/radeon.h     |  3 ++
 drivers/gpu/drm/radeon/radeon_kfd.c | 94 +++++++++++++++++++++++++++++++++++++
 include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++
 4 files changed, 165 insertions(+)
 create mode 100644 drivers/gpu/drm/radeon/radeon_kfd.c
 create mode 100644 include/linux/radeon_kfd.h

diff --git a/drivers/gpu/drm/radeon/Makefile b/drivers/gpu/drm/radeon/Makefile
index 1b04002..a1c913d 100644
--- a/drivers/gpu/drm/radeon/Makefile
+++ b/drivers/gpu/drm/radeon/Makefile
@@ -104,6 +104,7 @@ radeon-y += \
 	radeon_vce.o \
 	vce_v1_0.o \
 	vce_v2_0.o \
+	radeon_kfd.o
 
 radeon-$(CONFIG_COMPAT) += radeon_ioc32.o
 radeon-$(CONFIG_VGA_SWITCHEROO) += radeon_atpx_handler.o
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 4e7e41f..90f66bb 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2340,6 +2340,9 @@ struct radeon_device {
 
 	struct dev_pm_domain vga_pm_domain;
 	bool have_disp_power_ref;
+
+	/* HSA KFD interface */
+	struct kfd_dev		*kfd;
 };
 
 bool radeon_is_px(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
new file mode 100644
index 0000000..7c7f808
--- /dev/null
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -0,0 +1,94 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/radeon_kfd.h>
+#include <drm/drmP.h>
+#include "radeon.h"
+
+static const struct kfd2kgd_calls kfd2kgd = {
+};
+
+static const struct kgd2kfd_calls *kgd2kfd;
+
+bool radeon_kfd_init(void)
+{
+	bool (*kgd2kfd_init_p)(unsigned, const struct kfd2kgd_calls*,
+				const struct kgd2kfd_calls**);
+
+	kgd2kfd_init_p = symbol_request(kgd2kfd_init);
+
+	if (kgd2kfd_init_p == NULL)
+		return false;
+
+	if (!kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kfd2kgd, &kgd2kfd)) {
+		symbol_put(kgd2kfd_init);
+		kgd2kfd = NULL;
+
+		return false;
+	}
+
+	return true;
+}
+
+void radeon_kfd_fini(void)
+{
+	if (kgd2kfd) {
+		kgd2kfd->exit();
+		symbol_put(kgd2kfd_init);
+	}
+}
+
+void radeon_kfd_device_probe(struct radeon_device *rdev)
+{
+	if (kgd2kfd)
+		rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev, rdev->pdev);
+}
+
+void radeon_kfd_device_init(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		struct kgd2kfd_shared_resources gpu_resources = {
+			.mmio_registers = rdev->rmmio,
+
+			.compute_vmid_bitmap = 0xFF00,
+
+			.first_compute_pipe = 1,
+			.compute_pipe_count = 8 - 1,
+		};
+
+		radeon_doorbell_get_kfd_info(rdev,
+				&gpu_resources.doorbell_physical_address,
+				&gpu_resources.doorbell_aperture_size,
+				&gpu_resources.doorbell_start_offset);
+
+		kgd2kfd->device_init(rdev->kfd, &gpu_resources);
+	}
+}
+
+void radeon_kfd_device_fini(struct radeon_device *rdev)
+{
+	if (rdev->kfd) {
+		kgd2kfd->device_exit(rdev->kfd);
+		rdev->kfd = NULL;
+	}
+}
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
new file mode 100644
index 0000000..59785e9
--- /dev/null
+++ b/include/linux/radeon_kfd.h
@@ -0,0 +1,67 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * radeon_kfd.h defines the private interface between the
+ * AMD kernel graphics drivers and the AMD radeon KFD.
+ */
+
+#ifndef RADEON_KFD_H_INCLUDED
+#define RADEON_KFD_H_INCLUDED
+
+#include <linux/types.h>
+struct pci_dev;
+
+#define KFD_INTERFACE_VERSION 1
+
+struct kfd_dev;
+struct kgd_dev;
+
+struct kgd2kfd_shared_resources {
+	void __iomem *mmio_registers; /* Mapped pointer to GFX MMIO registers. */
+
+	unsigned int compute_vmid_bitmap; /* Bit n == 1 means VMID n is available for KFD. */
+
+	unsigned int first_compute_pipe; /* Compute pipes are counted starting from MEC0/pipe0 as 0. */
+	unsigned int compute_pipe_count; /* Number of MEC pipes available for KFD. */
+
+	phys_addr_t doorbell_physical_address; /* Base address of doorbell aperture. */
+	size_t doorbell_aperture_size; /* Size in bytes of doorbell aperture. */
+	size_t doorbell_start_offset; /* Number of bytes at start of aperture reserved for KGD. */
+};
+
+struct kgd2kfd_calls {
+	void (*exit)(void);
+	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
+	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
+	void (*device_exit)(struct kfd_dev *kfd);
+};
+
+struct kfd2kgd_calls {
+};
+
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f);
+
+#endif
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
  2014-07-10 21:50 ` [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd Oded Gabbay
  2014-07-10 21:50 ` [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 16:27     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping Oded Gabbay
                   ` (23 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

This patch adds a new interface to kfd2kgd_calls structure so that
the kfd driver could get the virtual ram size of a specific
radeon device.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon_kfd.c | 12 ++++++++++++
 include/linux/radeon_kfd.h          |  1 +
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 7c7f808..1b859b5 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -25,7 +25,10 @@
 #include <drm/drmP.h>
 #include "radeon.h"
 
+static uint64_t get_vmem_size(struct kgd_dev *kgd);
+
 static const struct kfd2kgd_calls kfd2kgd = {
+	.get_vmem_size = get_vmem_size,
 };
 
 static const struct kgd2kfd_calls *kgd2kfd;
@@ -92,3 +95,12 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
 		rdev->kfd = NULL;
 	}
 }
+
+static uint64_t get_vmem_size(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	BUG_ON(kgd == NULL);
+
+	return rdev->mc.real_vram_size;
+}
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
index 59785e9..28cddf5 100644
--- a/include/linux/radeon_kfd.h
+++ b/include/linux/radeon_kfd.h
@@ -57,6 +57,7 @@ struct kgd2kfd_calls {
 };
 
 struct kfd2kgd_calls {
+	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
 };
 
 bool kgd2kfd_init(unsigned interface_version,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (2 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 16:32     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register Oded Gabbay
                   ` (22 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

This patch adds new interfaces to kfd2kgd_calls structure.

The new interfaces allow the kfd driver to :

1. Allocated video memory through the radeon driver
2. Map and unmap video memory with GPUVM through the radeon driver
3. Map and unmap system memory with GPUVM through the radeon driver

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon_kfd.c | 129 ++++++++++++++++++++++++++++++++++++
 include/linux/radeon_kfd.h          |  23 +++++++
 2 files changed, 152 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 1b859b5..66ee36b 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -25,9 +25,31 @@
 #include <drm/drmP.h>
 #include "radeon.h"
 
+struct kgd_mem {
+	struct radeon_bo *bo;
+	u32 domain;
+};
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
+
+static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
+static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
+static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
+
 static uint64_t get_vmem_size(struct kgd_dev *kgd);
 
 static const struct kfd2kgd_calls kfd2kgd = {
+	.allocate_mem = allocate_mem,
+	.free_mem = free_mem,
+	.gpumap_mem = gpumap_mem,
+	.ungpumap_mem = ungpumap_mem,
+	.kmap_mem = kmap_mem,
+	.unkmap_mem = unkmap_mem,
 	.get_vmem_size = get_vmem_size,
 };
 
@@ -96,6 +118,113 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
 	}
 }
 
+static u32 pool_to_domain(enum kgd_memory_pool p)
+{
+	switch (p) {
+	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
+	default: return RADEON_GEM_DOMAIN_GTT;
+	}
+}
+
+static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
+		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+	struct kgd_mem *mem;
+	int r;
+
+	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+	if (!mem)
+		return -ENOMEM;
+
+	mem->domain = pool_to_domain(pool);
+
+	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
+	if (r) {
+		kfree(mem);
+		return r;
+	}
+
+	*memory_handle = mem;
+	return 0;
+}
+
+static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
+	radeon_bo_unref(&mem->bo);
+	kfree(mem);
+}
+
+static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
+	radeon_bo_unreserve(mem->bo);
+
+	return r;
+}
+
+static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_unpin(mem->bo);
+
+	/*
+	 * This unpin only removed NO_EVICT placement flags
+	 * and should never fail
+	 */
+	BUG_ON(r != 0);
+	radeon_bo_unreserve(mem->bo);
+}
+
+static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	r = radeon_bo_kmap(mem->bo, ptr);
+	radeon_bo_unreserve(mem->bo);
+
+	return r;
+}
+
+static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	int r;
+
+	r = radeon_bo_reserve(mem->bo, true);
+	/*
+	 * ttm_bo_reserve can only fail if the buffer reservation lock
+	 * is held in circumstances that would deadlock
+	 */
+	BUG_ON(r != 0);
+	radeon_bo_kunmap(mem->bo);
+	radeon_bo_unreserve(mem->bo);
+}
+
 static uint64_t get_vmem_size(struct kgd_dev *kgd)
 {
 	struct radeon_device *rdev = (struct radeon_device *)kgd;
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
index 28cddf5..c7997d4 100644
--- a/include/linux/radeon_kfd.h
+++ b/include/linux/radeon_kfd.h
@@ -36,6 +36,14 @@ struct pci_dev;
 struct kfd_dev;
 struct kgd_dev;
 
+struct kgd_mem;
+
+enum kgd_memory_pool {
+	KGD_POOL_SYSTEM_CACHEABLE = 1,
+	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
+	KGD_POOL_FRAMEBUFFER = 3,
+};
+
 struct kgd2kfd_shared_resources {
 	void __iomem *mmio_registers; /* Mapped pointer to GFX MMIO registers. */
 
@@ -57,6 +65,21 @@ struct kgd2kfd_calls {
 };
 
 struct kfd2kgd_calls {
+	/* Memory management. */
+	int (*allocate_mem)(struct kgd_dev *kgd,
+				size_t size,
+				size_t alignment,
+				enum kgd_memory_pool pool,
+				struct kgd_mem **memory_handle);
+
+	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
+
+	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
+	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
+
+	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
+	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
+
 	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (3 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 16:34     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon Oded Gabbay
                   ` (21 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

This patch adds a new interface to kfd2kgd_calls structure, which
allows the kfd to lock and unlock the srbm_gfx_cntl register

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
 include/linux/radeon_kfd.h          |  4 ++++
 2 files changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 66ee36b..594020e 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
 
 static uint64_t get_vmem_size(struct kgd_dev *kgd);
 
+static void lock_srbm_gfx_cntl(struct kgd_dev *kgd);
+static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
+
+
 static const struct kfd2kgd_calls kfd2kgd = {
 	.allocate_mem = allocate_mem,
 	.free_mem = free_mem,
@@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.kmap_mem = kmap_mem,
 	.unkmap_mem = unkmap_mem,
 	.get_vmem_size = get_vmem_size,
+	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
+	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
 };
 
 static const struct kgd2kfd_calls *kgd2kfd;
@@ -233,3 +239,17 @@ static uint64_t get_vmem_size(struct kgd_dev *kgd)
 
 	return rdev->mc.real_vram_size;
 }
+
+static void lock_srbm_gfx_cntl(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	mutex_lock(&rdev->srbm_mutex);
+}
+
+static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd)
+{
+	struct radeon_device *rdev = (struct radeon_device *)kgd;
+
+	mutex_unlock(&rdev->srbm_mutex);
+}
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
index c7997d4..40b691c 100644
--- a/include/linux/radeon_kfd.h
+++ b/include/linux/radeon_kfd.h
@@ -81,6 +81,10 @@ struct kfd2kgd_calls {
 	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
 
 	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
+
+	/* SRBM_GFX_CNTL mutex */
+	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
+	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
 };
 
 bool kgd2kfd_init(unsigned interface_version,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (4 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 16:36     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs Oded Gabbay
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

The KFD driver should be loaded when the radeon driver is loaded and
should be finalized when the radeon driver is removed.

This patch adds a function call to initialize kfd from radeon_init
and a function call to finalize kfd from radeon_exit.

If the KFD driver is not present in the system, the initialize call
fails and the radeon driver continues normally.

This patch also adds calls to probe, initialize and finalize a kfd device
per radeon device using the kgd-->kfd interface.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
 drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index cb14213..88a45a0 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
 static inline void radeon_unregister_atpx_handler(void) {}
 #endif
 
+extern bool radeon_kfd_init(void);
+extern void radeon_kfd_fini(void);
+
 int radeon_no_wb;
 int radeon_modeset = -1;
 int radeon_dynclks = -1;
@@ -630,12 +633,15 @@ static int __init radeon_init(void)
 #endif
 	}
 
+	radeon_kfd_init();
+
 	/* let modprobe override vga console setting */
 	return drm_pci_init(driver, pdriver);
 }
 
 static void __exit radeon_exit(void)
 {
+	radeon_kfd_fini();
 	drm_pci_exit(driver, pdriver);
 	radeon_unregister_atpx_handler();
 }
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 35d9318..0748284 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -34,6 +34,10 @@
 #include <linux/slab.h>
 #include <linux/pm_runtime.h>
 
+extern void radeon_kfd_device_probe(struct radeon_device *rdev);
+extern void radeon_kfd_device_init(struct radeon_device *rdev);
+extern void radeon_kfd_device_fini(struct radeon_device *rdev);
+
 #if defined(CONFIG_VGA_SWITCHEROO)
 bool radeon_has_atpx(void);
 #else
@@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
 
 	pm_runtime_get_sync(dev->dev);
 
+	radeon_kfd_device_fini(rdev);
+
 	radeon_acpi_fini(rdev);
 	
 	radeon_modeset_fini(rdev);
@@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
 				"Error during ACPI methods call\n");
 	}
 
+	radeon_kfd_device_probe(rdev);
+	radeon_kfd_device_init(rdev);
+
 	if (radeon_is_px(dev)) {
 		pm_runtime_use_autosuspend(dev->dev);
 		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (5 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 17:04     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 10/83] hsa/radeon: Add initialization and unmapping of doorbell aperture Oded Gabbay
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kishon Vijay Abraham I, Sandeep Nair, Kenneth Heitke,
	Srinivas Pandruvada, Santosh Shilimkar, Andreas Noever,
	Lucas Stach, Philipp Zabel

This patch adds the code base of the hsa driver for
AMD's GPUs.

This driver is called kfd.

This initial version supports the first HSA chip, Kaveri.

This driver is located in a new directory structure under drivers/gpu.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/Kconfig                        |    2 +
 drivers/gpu/Makefile                   |    1 +
 drivers/gpu/hsa/Kconfig                |   20 +
 drivers/gpu/hsa/Makefile               |    1 +
 drivers/gpu/hsa/radeon/Makefile        |    8 +
 drivers/gpu/hsa/radeon/kfd_chardev.c   |  133 ++++
 drivers/gpu/hsa/radeon/kfd_crat.h      |  292 ++++++++
 drivers/gpu/hsa/radeon/kfd_device.c    |  162 +++++
 drivers/gpu/hsa/radeon/kfd_module.c    |  117 ++++
 drivers/gpu/hsa/radeon/kfd_pasid.c     |   92 +++
 drivers/gpu/hsa/radeon/kfd_priv.h      |  232 ++++++
 drivers/gpu/hsa/radeon/kfd_process.c   |  400 +++++++++++
 drivers/gpu/hsa/radeon/kfd_scheduler.h |   62 ++
 drivers/gpu/hsa/radeon/kfd_topology.c  | 1201 ++++++++++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_topology.h  |  168 +++++
 15 files changed, 2891 insertions(+)
 create mode 100644 drivers/gpu/hsa/Kconfig
 create mode 100644 drivers/gpu/hsa/Makefile
 create mode 100644 drivers/gpu/hsa/radeon/Makefile
 create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h

diff --git a/drivers/Kconfig b/drivers/Kconfig
index 9b2dcc2..c1ac8f8 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -178,4 +178,6 @@ source "drivers/mcb/Kconfig"
 
 source "drivers/thunderbolt/Kconfig"
 
+source "drivers/gpu/hsa/Kconfig"
+
 endmenu
diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index 70da9eb..749a7ea 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -1,3 +1,4 @@
 obj-y			+= drm/ vga/
 obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
 obj-$(CONFIG_IMX_IPUV3_CORE)	+= ipu-v3/
+obj-$(CONFIG_HSA)	+= hsa/
\ No newline at end of file
diff --git a/drivers/gpu/hsa/Kconfig b/drivers/gpu/hsa/Kconfig
new file mode 100644
index 0000000..ee7bb28
--- /dev/null
+++ b/drivers/gpu/hsa/Kconfig
@@ -0,0 +1,20 @@
+#
+# Heterogenous system architecture configuration
+#
+
+menuconfig HSA
+	bool "Heterogenous System Architecture"
+	default y
+	help
+	  Say Y here if you want Heterogenous System Architecture support.
+
+if HSA
+
+config HSA_RADEON
+	tristate "HSA kernel driver for AMD Radeon devices"
+	depends on HSA && AMD_IOMMU_V2 && X86_64
+	default m
+	help
+	  Enable this if you want to support HSA on AMD Radeon devices.
+
+endif # HSA
diff --git a/drivers/gpu/hsa/Makefile b/drivers/gpu/hsa/Makefile
new file mode 100644
index 0000000..0951584
--- /dev/null
+++ b/drivers/gpu/hsa/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_HSA_RADEON)	+= radeon/
diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
new file mode 100644
index 0000000..ba16a09
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/Makefile
@@ -0,0 +1,8 @@
+#
+# Makefile for Heterogenous System Architecture support for AMD Radeon devices
+#
+
+radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
+		kfd_pasid.o kfd_topology.o kfd_process.o
+
+obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
new file mode 100644
index 0000000..7a56a8f
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -0,0 +1,133 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/device.h>
+#include <linux/export.h>
+#include <linux/err.h>
+#include <linux/fs.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+
+static long kfd_ioctl(struct file *, unsigned int, unsigned long);
+static int kfd_open(struct inode *, struct file *);
+
+static const char kfd_dev_name[] = "kfd";
+
+static const struct file_operations kfd_fops = {
+	.owner = THIS_MODULE,
+	.unlocked_ioctl = kfd_ioctl,
+	.open = kfd_open,
+};
+
+static int kfd_char_dev_major = -1;
+static struct class *kfd_class;
+struct device *kfd_device;
+
+int
+radeon_kfd_chardev_init(void)
+{
+	int err = 0;
+
+	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
+	err = kfd_char_dev_major;
+	if (err < 0)
+		goto err_register_chrdev;
+
+	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
+	err = PTR_ERR(kfd_class);
+	if (IS_ERR(kfd_class))
+		goto err_class_create;
+
+	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
+	err = PTR_ERR(kfd_device);
+	if (IS_ERR(kfd_device))
+		goto err_device_create;
+
+	return 0;
+
+err_device_create:
+	class_destroy(kfd_class);
+err_class_create:
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+err_register_chrdev:
+	return err;
+}
+
+void
+radeon_kfd_chardev_exit(void)
+{
+	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
+	class_destroy(kfd_class);
+	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
+}
+
+struct device*
+radeon_kfd_chardev(void)
+{
+	return kfd_device;
+}
+
+
+static int
+kfd_open(struct inode *inode, struct file *filep)
+{
+	struct kfd_process *process;
+
+	if (iminor(inode) != 0)
+		return -ENODEV;
+
+	process = radeon_kfd_create_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	pr_debug("\nkfd: process %d opened dev/kfd", process->pasid);
+
+	return 0;
+}
+
+
+static long
+kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
+{
+	long err = -EINVAL;
+
+	dev_info(kfd_device,
+		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
+		 cmd, _IOC_NR(cmd), arg);
+
+	switch (cmd) {
+	default:
+		dev_err(kfd_device,
+			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
+			cmd, arg);
+		err = -EINVAL;
+		break;
+	}
+
+	if (err < 0)
+		dev_err(kfd_device, "ioctl error %ld\n", err);
+
+	return err;
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_crat.h b/drivers/gpu/hsa/radeon/kfd_crat.h
new file mode 100644
index 0000000..587455d
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_crat.h
@@ -0,0 +1,292 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_CRAT_H_INCLUDED
+#define KFD_CRAT_H_INCLUDED
+
+#include <linux/types.h>
+
+#pragma pack(1)
+
+/*
+ * 4CC signature values for the CRAT and CDIT ACPI tables
+ */
+
+#define CRAT_SIGNATURE	"CRAT"
+#define CDIT_SIGNATURE	"CDIT"
+
+/*
+ * Component Resource Association Table (CRAT)
+ */
+
+#define CRAT_OEMID_LENGTH	6
+#define CRAT_OEMTABLEID_LENGTH	8
+#define CRAT_RESERVED_LENGTH	6
+
+struct crat_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CRAT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		reserved[CRAT_RESERVED_LENGTH];
+};
+
+/*
+ * The header structure is immediately followed by total_entries of the
+ * data definitions
+ */
+
+/*
+ * The currently defined subtype entries in the CRAT
+ */
+#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
+#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
+#define CRAT_SUBTYPE_CACHE_AFFINITY		2
+#define CRAT_SUBTYPE_TLB_AFFINITY		3
+#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
+#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
+#define CRAT_SUBTYPE_MAX			6
+
+#define CRAT_SIBLINGMAP_SIZE	32
+
+/*
+ * ComputeUnit Affinity structure and definitions
+ */
+#define CRAT_CU_FLAGS_ENABLED		0x00000001
+#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
+#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
+#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
+#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
+
+#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
+
+struct crat_subtype_computeunit {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain;
+	uint32_t	processor_id_low;
+	uint16_t	num_cpu_cores;
+	uint16_t	num_simd_cores;
+	uint16_t	max_waves_simd;
+	uint16_t	io_count;
+	uint16_t	hsa_capability;
+	uint16_t	lds_size_in_kb;
+	uint8_t		wave_front_size;
+	uint8_t		num_banks;
+	uint16_t	micro_engine_id;
+	uint8_t		num_arrays;
+	uint8_t		num_cu_per_array;
+	uint8_t		num_simd_per_cu;
+	uint8_t		max_slots_scatch_cu;
+	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Memory Affinity structure and definitions
+ */
+#define CRAT_MEM_FLAGS_ENABLED		0x00000001
+#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
+#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
+#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
+
+#define CRAT_MEMORY_RESERVED_LENGTH 8
+
+struct crat_subtype_memory {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	promixity_domain;
+	uint32_t	base_addr_low;
+	uint32_t	base_addr_high;
+	uint32_t	length_low;
+	uint32_t	length_high;
+	uint32_t	width;
+	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
+};
+
+/*
+ * HSA Cache Affinity structure and definitions
+ */
+#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
+#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
+#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
+#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
+#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
+#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_CACHE_RESERVED_LENGTH 8
+
+struct crat_subtype_cache {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	cache_size;
+	uint8_t		cache_level;
+	uint8_t		lines_per_tag;
+	uint16_t	cache_line_size;
+	uint8_t		associativity;
+	uint8_t		cache_properties;
+	uint16_t	cache_latency;
+	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA TLB Affinity structure and definitions
+ */
+#define CRAT_TLB_FLAGS_ENABLED	0x00000001
+#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
+#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
+#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
+#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
+#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
+
+#define CRAT_TLB_RESERVED_LENGTH 4
+
+struct crat_subtype_tlb {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	tlb_level;
+	uint8_t		data_tlb_associativity_2mb;
+	uint8_t		data_tlb_size_2mb;
+	uint8_t		instruction_tlb_associativity_2mb;
+	uint8_t		instruction_tlb_size_2mb;
+	uint8_t		data_tlb_associativity_4k;
+	uint8_t		data_tlb_size_4k;
+	uint8_t		instruction_tlb_associativity_4k;
+	uint8_t		instruction_tlb_size_4k;
+	uint8_t		data_tlb_associativity_1gb;
+	uint8_t		data_tlb_size_1gb;
+	uint8_t		instruction_tlb_associativity_1gb;
+	uint8_t		instruction_tlb_size_1gb;
+	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
+};
+
+/*
+ * HSA CCompute/APU Affinity structure and definitions
+ */
+#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
+#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
+
+#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
+
+struct crat_subtype_ccompute {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	processor_id_low;
+	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
+	uint32_t	apu_size;
+	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
+};
+
+/*
+ * HSA IO Link Affinity structure and definitions
+ */
+#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
+#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
+#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
+
+/*
+ * IO interface types
+ */
+#define CRAT_IOLINK_TYPE_UNDEFINED	0
+#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
+#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
+#define CRAT_IOLINK_TYPE_OTHER		3
+#define CRAT_IOLINK_TYPE_MAX		255
+
+#define CRAT_IOLINK_RESERVED_LENGTH 24
+
+struct crat_subtype_iolink {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+	uint32_t	proximity_domain_from;
+	uint32_t	proximity_domain_to;
+	uint8_t		io_interface_type;
+	uint8_t		version_major;
+	uint16_t	version_minor;
+	uint32_t	minimum_latency;
+	uint32_t	maximum_latency;
+	uint32_t	minimum_bandwidth_mbs;
+	uint32_t	maximum_bandwidth_mbs;
+	uint32_t	recommended_transfer_size;
+	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
+};
+
+/*
+ * HSA generic sub-type header
+ */
+
+#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
+
+struct crat_subtype_generic {
+	uint8_t		type;
+	uint8_t		length;
+	uint16_t	reserved;
+	uint32_t	flags;
+};
+
+/*
+ * Component Locality Distance Information Table (CDIT)
+ */
+#define CDIT_OEMID_LENGTH	6
+#define CDIT_OEMTABLEID_LENGTH	8
+
+struct cdit_header {
+	uint32_t	signature;
+	uint32_t	length;
+	uint8_t		revision;
+	uint8_t		checksum;
+	uint8_t		oem_id[CDIT_OEMID_LENGTH];
+	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
+	uint32_t	oem_revision;
+	uint32_t	creator_id;
+	uint32_t	creator_revision;
+	uint32_t	total_entries;
+	uint16_t	num_domains;
+	uint8_t		entry[1];
+};
+
+#pragma pack()
+
+#endif /* KFD_CRAT_H_INCLUDED */
diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
new file mode 100644
index 0000000..d122920
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_device.c
@@ -0,0 +1,162 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/amd-iommu.h>
+#include <linux/bsearch.h>
+#include <linux/pci.h>
+#include <linux/slab.h>
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+
+static const struct kfd_device_info bonaire_device_info = {
+	.max_pasid_bits = 16,
+};
+
+struct kfd_deviceid {
+	unsigned short did;
+	const struct kfd_device_info *device_info;
+};
+
+/* Please keep this sorted by increasing device id. */
+static const struct kfd_deviceid supported_devices[] = {
+	{ 0x1305, &bonaire_device_info },	/* Kaveri */
+	{ 0x1307, &bonaire_device_info },	/* Kaveri */
+	{ 0x130F, &bonaire_device_info },	/* Kaveri */
+	{ 0x665C, &bonaire_device_info },	/* Bonaire */
+};
+
+static const struct kfd_device_info *
+lookup_device_info(unsigned short did)
+{
+	size_t i;
+
+	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
+		if (supported_devices[i].did == did) {
+			BUG_ON(supported_devices[i].device_info == NULL);
+			return supported_devices[i].device_info;
+		}
+	}
+
+	return NULL;
+}
+
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
+{
+	struct kfd_dev *kfd;
+
+	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
+
+	if (!device_info)
+		return NULL;
+
+	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
+	kfd->kgd = kgd;
+	kfd->device_info = device_info;
+	kfd->pdev = pdev;
+
+	return kfd;
+}
+
+static bool
+device_iommu_pasid_init(struct kfd_dev *kfd)
+{
+	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
+					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+
+	struct amd_iommu_device_info iommu_info;
+	pasid_t pasid_limit;
+	int err;
+
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err < 0)
+		return false;
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
+		return false;
+
+	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
+	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit);
+
+	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+	if (err < 0)
+		return false;
+
+	if (!radeon_kfd_set_pasid_limit(pasid_limit)) {
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	return true;
+}
+
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+{
+	struct kfd_dev *dev = radeon_kfd_device_by_pci_dev(pdev);
+
+	if (dev)
+		radeon_kfd_unbind_process_from_device(dev, pasid);
+}
+
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources)
+{
+	kfd->shared_resources = *gpu_resources;
+
+	kfd->regs = gpu_resources->mmio_registers;
+
+	if (!device_iommu_pasid_init(kfd))
+		return false;
+
+	if (kfd_topology_add_device(kfd) != 0) {
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
+
+	if (kfd->device_info->scheduler_class->create(kfd, &kfd->scheduler)) {
+		amd_iommu_free_device(kfd->pdev);
+		return false;
+	}
+
+	kfd->device_info->scheduler_class->start(kfd->scheduler);
+
+	kfd->init_complete = true;
+
+	return true;
+}
+
+void kgd2kfd_device_exit(struct kfd_dev *kfd)
+{
+	int err = kfd_topology_remove_device(kfd);
+
+	BUG_ON(err != 0);
+
+	if (kfd->init_complete) {
+		kfd->device_info->scheduler_class->stop(kfd->scheduler);
+		kfd->device_info->scheduler_class->destroy(kfd->scheduler);
+
+		amd_iommu_free_device(kfd->pdev);
+	}
+
+	kfree(kfd);
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_module.c b/drivers/gpu/hsa/radeon/kfd_module.c
new file mode 100644
index 0000000..6978bc0
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_module.c
@@ -0,0 +1,117 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/notifier.h>
+
+#include "kfd_priv.h"
+
+#define DRIVER_AUTHOR		"Andrew Lewycky, Oded Gabbay, Evgeny Pinchuk, others."
+
+#define DRIVER_NAME		"kfd"
+#define DRIVER_DESC		"AMD HSA Kernel Fusion Driver"
+#define DRIVER_DATE		"20140127"
+
+const struct kfd2kgd_calls *kfd2kgd;
+static const struct kgd2kfd_calls kgd2kfd = {
+	.exit		= kgd2kfd_exit,
+	.probe		= kgd2kfd_probe,
+	.device_init	= kgd2kfd_device_init,
+	.device_exit	= kgd2kfd_device_exit,
+};
+
+bool kgd2kfd_init(unsigned interface_version,
+		  const struct kfd2kgd_calls *f2g,
+		  const struct kgd2kfd_calls **g2f)
+{
+	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
+	if (interface_version != KFD_INTERFACE_VERSION)
+		return false;
+
+	kfd2kgd = f2g;
+	*g2f = &kgd2kfd;
+
+	return true;
+}
+EXPORT_SYMBOL(kgd2kfd_init);
+
+void kgd2kfd_exit(void)
+{
+}
+
+extern int kfd_process_exit(struct notifier_block *nb,
+				unsigned long action, void *data);
+
+static struct notifier_block kfd_mmput_nb = {
+	.notifier_call		= kfd_process_exit,
+	.priority		= 3,
+};
+
+static int __init kfd_module_init(void)
+{
+	int err;
+
+	err = radeon_kfd_pasid_init();
+	if (err < 0)
+		goto err_pasid;
+
+	err = radeon_kfd_chardev_init();
+	if (err < 0)
+		goto err_ioctl;
+
+	err = mmput_register_notifier(&kfd_mmput_nb);
+	if (err)
+		goto err_mmu_notifier;
+
+	err = kfd_topology_init();
+	if (err < 0)
+		goto err_topology;
+
+	pr_info("[hsa] Initialized kfd module");
+
+	return 0;
+err_topology:
+	mmput_unregister_notifier(&kfd_mmput_nb);
+err_mmu_notifier:
+	radeon_kfd_chardev_exit();
+err_ioctl:
+	radeon_kfd_pasid_exit();
+err_pasid:
+	return err;
+}
+
+static void __exit kfd_module_exit(void)
+{
+	kfd_topology_shutdown();
+	mmput_unregister_notifier(&kfd_mmput_nb);
+	radeon_kfd_chardev_exit();
+	radeon_kfd_pasid_exit();
+	pr_info("[hsa] Removed kfd module");
+}
+
+module_init(kfd_module_init);
+module_exit(kfd_module_exit);
+
+MODULE_AUTHOR(DRIVER_AUTHOR);
+MODULE_DESCRIPTION(DRIVER_DESC);
+MODULE_LICENSE("GPL");
diff --git a/drivers/gpu/hsa/radeon/kfd_pasid.c b/drivers/gpu/hsa/radeon/kfd_pasid.c
new file mode 100644
index 0000000..d78bd00
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_pasid.c
@@ -0,0 +1,92 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/slab.h>
+#include <linux/types.h>
+#include "kfd_priv.h"
+
+#define INITIAL_PASID_LIMIT (1<<20)
+
+static unsigned long *pasid_bitmap;
+static pasid_t pasid_limit;
+static DEFINE_MUTEX(pasid_mutex);
+
+int radeon_kfd_pasid_init(void)
+{
+	pasid_limit = INITIAL_PASID_LIMIT;
+
+	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
+	if (!pasid_bitmap)
+		return -ENOMEM;
+
+	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
+
+	return 0;
+}
+
+void radeon_kfd_pasid_exit(void)
+{
+	kfree(pasid_bitmap);
+}
+
+bool radeon_kfd_set_pasid_limit(pasid_t new_limit)
+{
+	if (new_limit < pasid_limit) {
+		bool ok;
+
+		mutex_lock(&pasid_mutex);
+
+		/* ensure that no pasids >= new_limit are in-use */
+		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
+		if (ok)
+			pasid_limit = new_limit;
+
+		mutex_unlock(&pasid_mutex);
+
+		return ok;
+	}
+
+	return true;
+}
+
+pasid_t radeon_kfd_pasid_alloc(void)
+{
+	pasid_t found;
+
+	mutex_lock(&pasid_mutex);
+
+	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
+	if (found == pasid_limit)
+		found = 0;
+	else
+		set_bit(found, pasid_bitmap);
+
+	mutex_unlock(&pasid_mutex);
+
+	return found;
+}
+
+void radeon_kfd_pasid_free(pasid_t pasid)
+{
+	BUG_ON(pasid == 0 || pasid >= pasid_limit);
+	clear_bit(pasid, pasid_bitmap);
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
new file mode 100644
index 0000000..1d1dbcf
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_priv.h
@@ -0,0 +1,232 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_PRIV_H_INCLUDED
+#define KFD_PRIV_H_INCLUDED
+
+#include <linux/hashtable.h>
+#include <linux/mmu_notifier.h>
+#include <linux/mutex.h>
+#include <linux/radeon_kfd.h>
+#include <linux/types.h>
+
+struct kfd_scheduler_class;
+
+#define MAX_KFD_DEVICES 16	/* Global limit - only MAX_KFD_DEVICES will be supported by KFD. */
+
+/*
+ * Per-process limit. Each process can only
+ * create MAX_PROCESS_QUEUES across all devices
+ */
+#define MAX_PROCESS_QUEUES 1024
+
+#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
+#define KFD_SYSFS_FILE_MODE 0444
+
+/* We multiplex different sorts of mmap-able memory onto /dev/kfd.
+** We figure out what type of memory the caller wanted by comparing the mmap page offset to known ranges. */
+#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
+#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
+
+/* GPU ID hash width in bits */
+#define KFD_GPU_ID_HASH_WIDTH 16
+
+/* Macro for allocating structures */
+#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
+
+/* Large enough to hold the maximum usable pasid + 1.
+** It must also be able to store the number of doorbells reported by a KFD device. */
+typedef unsigned int pasid_t;
+
+/* Type that represents a HW doorbell slot. */
+typedef u32 doorbell_t;
+
+struct kfd_device_info {
+	const struct kfd_scheduler_class *scheduler_class;
+	unsigned int max_pasid_bits;
+};
+
+struct kfd_dev {
+	struct kgd_dev *kgd;
+
+	const struct kfd_device_info *device_info;
+	struct pci_dev *pdev;
+
+	void __iomem *regs;
+
+	bool init_complete;
+
+	unsigned int id;		/* topology stub index */
+
+	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
+					 * KFD. It is aligned for mapping
+					 * into user mode
+					 */
+	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
+					 * to HW doorbell, GFX reserved some
+					 * at the start)
+					 */
+	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
+
+	struct kgd2kfd_shared_resources shared_resources;
+
+	struct kfd_scheduler *scheduler;
+};
+
+/* KGD2KFD callbacks */
+void kgd2kfd_exit(void);
+struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
+bool kgd2kfd_device_init(struct kfd_dev *kfd,
+			 const struct kgd2kfd_shared_resources *gpu_resources);
+void kgd2kfd_device_exit(struct kfd_dev *kfd);
+
+extern const struct kfd2kgd_calls *kfd2kgd;
+
+
+/* KFD2KGD callback wrappers */
+void radeon_kfd_lock_srbm_index(struct kfd_dev *kfd);
+void radeon_kfd_unlock_srbm_index(struct kfd_dev *kfd);
+
+enum kfd_mempool {
+	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
+	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
+	KFD_MEMPOOL_FRAMEBUFFER = 3,
+};
+
+struct kfd_mem_obj_s; /* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
+typedef struct kfd_mem_obj_s *kfd_mem_obj;
+
+int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
+				enum kfd_mempool pool, kfd_mem_obj *mem_obj);
+void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
+void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
+void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
+
+/* Character device interface */
+int radeon_kfd_chardev_init(void);
+void radeon_kfd_chardev_exit(void);
+struct device *radeon_kfd_chardev(void);
+
+/* Scheduler */
+struct kfd_scheduler;
+struct kfd_scheduler_process;
+struct kfd_scheduler_queue {
+	uint64_t dummy;
+};
+
+struct kfd_queue {
+	struct kfd_dev *dev;
+
+	/* scheduler_queue must be last. It is variable sized (dev->device_info->scheduler_class->queue_size) */
+	struct kfd_scheduler_queue scheduler_queue;
+};
+
+/* Data that is per-process-per device. */
+struct kfd_process_device {
+	/* List of all per-device data for a process. Starts from kfd_process.per_device_data. */
+	struct list_head per_device_list;
+
+	/* The device that owns this data. */
+	struct kfd_dev *dev;
+
+	/* The user-mode address of the doorbell mapping for this device. */
+	doorbell_t __user *doorbell_mapping;
+
+	/* The number of queues created by this process for this device. */
+	uint32_t queue_count;
+
+	/* Scheduler process data for this device. */
+	struct kfd_scheduler_process *scheduler_process;
+
+	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
+	bool bound;
+};
+
+/* Process data */
+struct kfd_process {
+	struct list_head processes_list;
+
+	struct mm_struct *mm;
+
+	struct mutex mutex;
+
+	/* In any process, the thread that started main() is the lead thread and outlives the rest.
+	 * It is here because amd_iommu_bind_pasid wants a task_struct. */
+	struct task_struct *lead_thread;
+
+	pasid_t pasid;
+
+	/* List of kfd_process_device structures, one for each device the process is using. */
+	struct list_head per_device_data;
+
+	/* The process's queues. */
+	size_t queue_array_size;
+	struct kfd_queue **queues;	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
+	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
+};
+
+struct kfd_process *radeon_kfd_create_process(const struct task_struct *);
+struct kfd_process *radeon_kfd_get_process(const struct task_struct *);
+
+struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p);
+void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
+struct kfd_process_device *radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p);
+
+bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id);
+void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue);
+void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id);
+struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id);
+
+
+/* PASIDs */
+int radeon_kfd_pasid_init(void);
+void radeon_kfd_pasid_exit(void);
+bool radeon_kfd_set_pasid_limit(pasid_t new_limit);
+pasid_t radeon_kfd_pasid_alloc(void);
+void radeon_kfd_pasid_free(pasid_t pasid);
+
+/* Doorbells */
+void radeon_kfd_doorbell_init(struct kfd_dev *kfd);
+int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
+doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
+					   unsigned int doorbell_index);
+unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id);
+
+extern struct device *kfd_device;
+
+/* Topology */
+int kfd_topology_init(void);
+void kfd_topology_shutdown(void);
+int kfd_topology_add_device(struct kfd_dev *gpu);
+int kfd_topology_remove_device(struct kfd_dev *gpu);
+struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id);
+struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
+
+/* MMIO registers */
+#define WRITE_REG(dev, reg, value) radeon_kfd_write_reg((dev), (reg), (value))
+#define READ_REG(dev, reg) radeon_kfd_read_reg((dev), (reg))
+void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
+uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
+
+#endif
diff --git a/drivers/gpu/hsa/radeon/kfd_process.c b/drivers/gpu/hsa/radeon/kfd_process.c
new file mode 100644
index 0000000..145ee38
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_process.c
@@ -0,0 +1,400 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/mutex.h>
+#include <linux/log2.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/amd-iommu.h>
+#include <linux/notifier.h>
+struct mm_struct;
+
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+
+/* Initial size for the array of queues.
+ * The allocated size is doubled each time it is exceeded up to MAX_PROCESS_QUEUES. */
+#define INITIAL_QUEUE_ARRAY_SIZE 16
+
+/* List of struct kfd_process */
+static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
+
+static DEFINE_MUTEX(kfd_processes_mutex);
+
+static struct kfd_process *create_process(const struct task_struct *thread);
+
+struct kfd_process*
+radeon_kfd_create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	/*
+	 * take kfd processes mutex before starting of process creation
+	 * so there won't be a case where two threads of the same process
+	 * create two kfd_process structures
+	 */
+	mutex_lock(&kfd_processes_mutex);
+
+	/* A prior open of /dev/kfd could have already created the process. */
+	process = thread->mm->kfd_process;
+	if (process)
+		pr_debug("kfd: process already found\n");
+
+	if (!process)
+		process = create_process(thread);
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	return process;
+}
+
+struct kfd_process*
+radeon_kfd_get_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+
+	if (thread->mm == NULL)
+		return ERR_PTR(-EINVAL);
+
+	/* Only the pthreads threading model is supported. */
+	if (thread->group_leader->mm != thread->mm)
+		return ERR_PTR(-EINVAL);
+
+	process = thread->mm->kfd_process;
+
+	return process;
+}
+
+/* Assumes that the kfd_process mutex is held.
+ * (Or that it doesn't need to be held because the process is exiting.)
+ *
+ * dev_filter can be set to only destroy queues for one device.
+ * Otherwise all queues for the process are destroyed.
+ */
+static void
+destroy_queues(struct kfd_process *p, struct kfd_dev *dev_filter)
+{
+	unsigned long queue_id;
+
+	for_each_set_bit(queue_id, p->allocated_queue_bitmap, MAX_PROCESS_QUEUES) {
+
+		struct kfd_queue *queue = radeon_kfd_get_queue(p, queue_id);
+		struct kfd_dev *dev;
+
+		BUG_ON(queue == NULL);
+
+		dev = queue->dev;
+
+		if (!dev_filter || dev == dev_filter) {
+			struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
+
+			BUG_ON(pdd == NULL); /* A queue exists so pdd must. */
+
+			radeon_kfd_remove_queue(p, queue_id);
+			dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
+
+			kfree(queue);
+
+			BUG_ON(pdd->queue_count == 0);
+			BUG_ON(pdd->scheduler_process == NULL);
+
+			if (--pdd->queue_count == 0) {
+				dev->device_info->scheduler_class->deregister_process(dev->scheduler,
+							pdd->scheduler_process);
+				pdd->scheduler_process = NULL;
+			}
+		}
+	}
+}
+
+static void free_process(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd, *temp;
+
+	BUG_ON(p == NULL);
+
+	destroy_queues(p, NULL);
+
+	/* doorbell mappings: automatic */
+
+	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
+		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
+		list_del(&pdd->per_device_list);
+		kfree(pdd);
+	}
+
+	radeon_kfd_pasid_free(p->pasid);
+
+	mutex_destroy(&p->mutex);
+
+	kfree(p->queues);
+
+	list_del(&p->processes_list);
+
+	kfree(p);
+}
+
+int kfd_process_exit(struct notifier_block *nb,
+			unsigned long action, void *data)
+{
+	struct mm_struct *mm = data;
+	struct kfd_process *p;
+
+	mutex_lock(&kfd_processes_mutex);
+
+	p = mm->kfd_process;
+	if (p) {
+		free_process(p);
+		mm->kfd_process = NULL;
+	}
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	return 0;
+}
+
+static struct kfd_process *create_process(const struct task_struct *thread)
+{
+	struct kfd_process *process;
+	int err = -ENOMEM;
+
+	process = kzalloc(sizeof(*process), GFP_KERNEL);
+
+	if (!process)
+		goto err_alloc;
+
+	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
+	if (!process->queues)
+		goto err_alloc;
+
+	process->pasid = radeon_kfd_pasid_alloc();
+	if (process->pasid == 0)
+		goto err_alloc;
+
+	mutex_init(&process->mutex);
+
+	process->mm = thread->mm;
+	thread->mm->kfd_process = process;
+	list_add_tail(&process->processes_list, &kfd_processes_list);
+
+	process->lead_thread = thread->group_leader;
+
+	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
+
+	INIT_LIST_HEAD(&process->per_device_data);
+
+	return process;
+
+err_alloc:
+	kfree(process->queues);
+	kfree(process);
+	return ERR_PTR(err);
+}
+
+struct kfd_process_device *
+radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		if (pdd->dev == dev)
+			return pdd;
+
+	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
+	if (pdd != NULL) {
+		pdd->dev = dev;
+		list_add(&pdd->per_device_list, &p->per_device_data);
+	}
+
+	return pdd;
+}
+
+/* Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
+ * Unbinding occurs when the process dies or the device is removed.
+ *
+ * Assumes that the process lock is held.
+ */
+struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p)
+{
+	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
+	int err;
+
+	if (pdd == NULL)
+		return ERR_PTR(-ENOMEM);
+
+	if (pdd->bound)
+		return pdd;
+
+	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
+	if (err < 0)
+		return ERR_PTR(err);
+
+	pdd->bound = true;
+
+	return pdd;
+}
+
+void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
+{
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+
+	BUG_ON(dev == NULL);
+
+	mutex_lock(&kfd_processes_mutex);
+
+	list_for_each_entry(p, &kfd_processes_list, processes_list)
+		if (p->pasid == pasid)
+			break;
+
+	mutex_unlock(&kfd_processes_mutex);
+
+	BUG_ON(p->pasid != pasid);
+
+	pdd = radeon_kfd_get_process_device_data(dev, p);
+
+	BUG_ON(pdd == NULL);
+
+	mutex_lock(&p->mutex);
+
+	destroy_queues(p, dev);
+
+	/* All queues just got destroyed so this should be gone. */
+	BUG_ON(pdd->scheduler_process != NULL);
+
+	/*
+	 * Just mark pdd as unbound, because we still need it to call
+	 * amd_iommu_unbind_pasid() in when the process exits.
+	 * We don't call amd_iommu_unbind_pasid() here
+	 * because the IOMMU called us.
+	 */
+	pdd->bound = false;
+
+	mutex_unlock(&p->mutex);
+}
+
+/* Ensure that the process's queue array is large enough to hold the queue at queue_id.
+ * Assumes that the process lock is held. */
+static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
+{
+	size_t desired_size;
+	struct kfd_queue **new_queues;
+
+	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
+	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
+			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
+	/* Ensure that doubling the current size won't ever overflow. */
+	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
+
+	/*
+	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
+	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
+	 */
+	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
+	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
+			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
+	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
+			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
+
+	if (queue_id < p->queue_array_size)
+		return true;
+
+	if (queue_id >= MAX_PROCESS_QUEUES)
+		return false;
+
+	desired_size = p->queue_array_size;
+	while (desired_size <= queue_id)
+		desired_size *= 2;
+
+	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
+	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
+
+	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
+	if (!new_queues)
+		return false;
+
+	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
+
+	kfree(p->queues);
+	p->queues = new_queues;
+	p->queue_array_size = desired_size;
+
+	return true;
+}
+
+/* Assumes that the process lock is held. */
+bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
+{
+	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
+
+	if (qid >= MAX_PROCESS_QUEUES)
+		return false;
+
+	if (!ensure_queue_array_size(p, qid))
+		return false;
+
+	__set_bit(qid, p->allocated_queue_bitmap);
+
+	p->queues[qid] = NULL;
+	*queue_id = qid;
+
+	return true;
+}
+
+/* Install a queue into a previously-allocated queue id.
+ *  Assumes that the process lock is held. */
+void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
+{
+	BUG_ON(queue_id >= p->queue_array_size); /* Have to call allocate_queue_id before install_queue. */
+	BUG_ON(queue == NULL);
+
+	p->queues[queue_id] = queue;
+}
+
+/* Remove a queue from the open queue list and deallocate the queue id.
+ * This can be called whether or not a queue was installed.
+ * Assumes that the process lock is held. */
+void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
+{
+	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
+	BUG_ON(queue_id >= p->queue_array_size);
+
+	__clear_bit(queue_id, p->allocated_queue_bitmap);
+}
+
+/* Assumes that the process lock is held. */
+struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
+{
+	/* test_bit because the contents of unallocated queue slots are undefined.
+	 * Otherwise ensure_queue_array_size would have to clear new entries and
+	 * remove_queue would have to NULL removed queues. */
+	return (queue_id < p->queue_array_size &&
+		test_bit(queue_id, p->allocated_queue_bitmap)) ?
+			p->queues[queue_id] : NULL;
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
new file mode 100644
index 0000000..48a032f
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
@@ -0,0 +1,62 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_SCHEDULER_H_INCLUDED
+#define KFD_SCHEDULER_H_INCLUDED
+
+#include <linux/types.h>
+struct kfd_process;
+
+/* Opaque types for scheduler private data. */
+struct kfd_scheduler;
+struct kfd_scheduler_process;
+struct kfd_scheduler_queue;
+
+struct kfd_scheduler_class {
+	const char *name;
+
+	int (*create)(struct kfd_dev *, struct kfd_scheduler **);
+	void (*destroy)(struct kfd_scheduler *);
+
+	void (*start)(struct kfd_scheduler *);
+	void (*stop)(struct kfd_scheduler *);
+
+	int (*register_process)(struct kfd_scheduler *, struct kfd_process *, struct kfd_scheduler_process **);
+	void (*deregister_process)(struct kfd_scheduler *, struct kfd_scheduler_process *);
+
+	size_t queue_size;
+
+	int (*create_queue)(struct kfd_scheduler *scheduler,
+			    struct kfd_scheduler_process *process,
+			    struct kfd_scheduler_queue *queue,
+			    void __user *ring_address,
+			    uint64_t ring_size,
+			    void __user *rptr_address,
+			    void __user *wptr_address,
+			    unsigned int doorbell);
+
+	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
+};
+
+extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
+
+#endif
diff --git a/drivers/gpu/hsa/radeon/kfd_topology.c b/drivers/gpu/hsa/radeon/kfd_topology.c
new file mode 100644
index 0000000..6acac25
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_topology.c
@@ -0,0 +1,1201 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/errno.h>
+#include <linux/acpi.h>
+#include <linux/hash.h>
+
+#include "kfd_priv.h"
+#include "kfd_crat.h"
+#include "kfd_topology.h"
+
+static struct list_head topology_device_list;
+static int topology_crat_parsed;
+static struct kfd_system_properties sys_props;
+
+static DECLARE_RWSEM(topology_lock);
+
+
+static uint8_t checksum_image(const void *buf, size_t len)
+{
+	uint8_t *p = (uint8_t *)buf;
+	uint8_t sum = 0;
+
+	if (!buf)
+		return 0;
+
+	while (len-- > 0)
+		sum += *p++;
+
+	return sum;
+		}
+
+struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu_id == gpu_id) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev)
+{
+	struct kfd_topology_device *top_dev;
+	struct kfd_dev *device = NULL;
+
+	down_read(&topology_lock);
+
+	list_for_each_entry(top_dev, &topology_device_list, list)
+		if (top_dev->gpu->pdev == pdev) {
+			device = top_dev->gpu;
+			break;
+		}
+
+	up_read(&topology_lock);
+
+	return device;
+}
+
+static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
+{
+	struct acpi_table_header *crat_table;
+	acpi_status status;
+
+	if (!size)
+		return -EINVAL;
+
+/*
+	 * Fetch the CRAT table from ACPI
+ */
+	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
+	if (status == AE_NOT_FOUND) {
+		pr_warn("CRAT table not found\n");
+		return -ENODATA;
+	} else if (ACPI_FAILURE(status)) {
+		const char *err = acpi_format_exception(status);
+
+		pr_err("CRAT table error: %s\n", err);
+		return -EINVAL;
+	}
+
+	/*
+	 * The checksum of the table should be verified
+	 */
+	if (checksum_image(crat_table, crat_table->length) ==
+		crat_table->checksum) {
+		pr_err("Bad checksum for the CRAT table\n");
+		return -EINVAL;
+}
+
+
+	if (*size >= crat_table->length && crat_image != 0)
+		memcpy(crat_image, crat_table, crat_table->length);
+
+	*size = crat_table->length;
+
+	return 0;
+}
+
+static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
+	dev->node_props.cpu_core_id_base = cu->processor_id_low;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
+		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
+
+	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
+			cu->processor_id_low);
+}
+
+static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
+		struct crat_subtype_computeunit *cu)
+{
+	BUG_ON(!dev);
+	BUG_ON(!cu);
+
+	dev->node_props.simd_id_base = cu->processor_id_low;
+	dev->node_props.simd_count = cu->num_simd_cores;
+	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
+	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
+	dev->node_props.wave_front_size = cu->wave_front_size;
+	dev->node_props.mem_banks_count = cu->num_banks;
+	dev->node_props.array_count = cu->num_arrays;
+	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
+	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
+	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
+	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
+		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
+	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
+				cu->processor_id_low);
+}
+
+/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
+{
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!cu);
+
+	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
+			cu->proximity_domain, cu->hsa_capability);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (cu->proximity_domain == i) {
+			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
+				kfd_populated_cu_info_cpu(dev, cu);
+
+			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
+				kfd_populated_cu_info_gpu(dev, cu);
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
+{
+	struct kfd_mem_properties *props;
+	struct kfd_topology_device *dev;
+	int i = 0;
+
+	BUG_ON(!mem);
+
+	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
+			mem->promixity_domain);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (mem->promixity_domain == i) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			if (dev->node_props.cpu_cores_count == 0)
+				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
+			else
+				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
+
+			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
+				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
+			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
+				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
+
+			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
+						mem->length_low;
+			props->width = mem->width;
+
+			dev->mem_bank_count++;
+			list_add_tail(&props->list, &dev->mem_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
+{
+	struct kfd_cache_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t id;
+
+	BUG_ON(!cache);
+
+	id = cache->processor_id_low;
+
+	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (id == dev->node_props.cpu_core_id_base ||
+		    id == dev->node_props.simd_id_base) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			props->processor_id_low = id;
+			props->cache_level = cache->cache_level;
+			props->cache_size = cache->cache_size;
+			props->cacheline_size = cache->cache_line_size;
+			props->cachelines_per_tag = cache->lines_per_tag;
+			props->cache_assoc = cache->associativity;
+			props->cache_latency = cache->cache_latency;
+
+			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_DATA;
+			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
+			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_CPU;
+			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
+				props->cache_type |= HSA_CACHE_TYPE_HSACU;
+
+			dev->cache_count++;
+			dev->node_props.caches_count++;
+			list_add_tail(&props->list, &dev->cache_props);
+
+			break;
+		}
+
+	return 0;
+}
+
+/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
+static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
+{
+	struct kfd_iolink_properties *props;
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+	uint32_t id_from;
+	uint32_t id_to;
+
+	BUG_ON(!iolink);
+
+	id_from = iolink->proximity_domain_from;
+	id_to = iolink->proximity_domain_to;
+
+	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
+	list_for_each_entry(dev, &topology_device_list, list) {
+		if (id_from == i) {
+			props = kfd_alloc_struct(props);
+			if (props == 0)
+				return -ENOMEM;
+
+			props->node_from = id_from;
+			props->node_to = id_to;
+			props->ver_maj = iolink->version_major;
+			props->ver_min = iolink->version_minor;
+
+			/*
+			 * weight factor (derived from CDIR), currently always 1
+			 */
+			props->weight = 1;
+
+			props->min_latency = iolink->minimum_latency;
+			props->max_latency = iolink->maximum_latency;
+			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
+			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
+			props->rec_transfer_size =
+					iolink->recommended_transfer_size;
+
+			dev->io_link_count++;
+			dev->node_props.io_links_count++;
+			list_add_tail(&props->list, &dev->io_link_props);
+
+			break;
+		}
+		i++;
+	}
+
+	return 0;
+}
+
+static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
+{
+	struct crat_subtype_computeunit *cu;
+	struct crat_subtype_memory *mem;
+	struct crat_subtype_cache *cache;
+	struct crat_subtype_iolink *iolink;
+	int ret = 0;
+
+	BUG_ON(!sub_type_hdr);
+
+	switch (sub_type_hdr->type) {
+	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
+		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
+		ret = kfd_parse_subtype_cu(cu);
+		break;
+	case CRAT_SUBTYPE_MEMORY_AFFINITY:
+		mem = (struct crat_subtype_memory *)sub_type_hdr;
+		ret = kfd_parse_subtype_mem(mem);
+		break;
+	case CRAT_SUBTYPE_CACHE_AFFINITY:
+		cache = (struct crat_subtype_cache *)sub_type_hdr;
+		ret = kfd_parse_subtype_cache(cache);
+		break;
+	case CRAT_SUBTYPE_TLB_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found TLB entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
+		/*
+		 * For now, nothing to do here
+		 */
+		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
+		break;
+	case CRAT_SUBTYPE_IOLINK_AFFINITY:
+		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
+		ret = kfd_parse_subtype_iolink(iolink);
+		break;
+	default:
+		pr_warn("Unknown subtype (%d) in CRAT\n",
+				sub_type_hdr->type);
+	}
+
+	return ret;
+}
+
+static void kfd_release_topology_device(struct kfd_topology_device *dev)
+{
+	struct kfd_mem_properties *mem;
+	struct kfd_cache_properties *cache;
+	struct kfd_iolink_properties *iolink;
+
+	BUG_ON(!dev);
+
+	list_del(&dev->list);
+
+	while (dev->mem_props.next != &dev->mem_props) {
+		mem = container_of(dev->mem_props.next,
+				struct kfd_mem_properties, list);
+		list_del(&mem->list);
+		kfree(mem);
+	}
+
+	while (dev->cache_props.next != &dev->cache_props) {
+		cache = container_of(dev->cache_props.next,
+				struct kfd_cache_properties, list);
+		list_del(&cache->list);
+		kfree(cache);
+	}
+
+	while (dev->io_link_props.next != &dev->io_link_props) {
+		iolink = container_of(dev->io_link_props.next,
+				struct kfd_iolink_properties, list);
+		list_del(&iolink->list);
+		kfree(iolink);
+	}
+
+	kfree(dev);
+
+	sys_props.num_devices--;
+}
+
+static void kfd_release_live_view(void)
+{
+	struct kfd_topology_device *dev;
+
+	while (topology_device_list.next != &topology_device_list) {
+		dev = container_of(topology_device_list.next,
+				 struct kfd_topology_device, list);
+		kfd_release_topology_device(dev);
+}
+
+	memset(&sys_props, 0, sizeof(sys_props));
+}
+
+static struct kfd_topology_device *kfd_create_topology_device(void)
+{
+	struct kfd_topology_device *dev;
+
+	dev = kfd_alloc_struct(dev);
+	if (dev == 0) {
+		pr_err("No memory to allocate a topology device");
+		return 0;
+	}
+
+	INIT_LIST_HEAD(&dev->mem_props);
+	INIT_LIST_HEAD(&dev->cache_props);
+	INIT_LIST_HEAD(&dev->io_link_props);
+
+	list_add_tail(&dev->list, &topology_device_list);
+	sys_props.num_devices++;
+
+	return dev;
+	}
+
+static int kfd_parse_crat_table(void *crat_image)
+{
+	struct kfd_topology_device *top_dev;
+	struct crat_subtype_generic *sub_type_hdr;
+	uint16_t node_id;
+	int ret;
+	struct crat_header *crat_table = (struct crat_header *)crat_image;
+	uint16_t num_nodes;
+	uint32_t image_len;
+
+	if (!crat_image)
+		return -EINVAL;
+
+	num_nodes = crat_table->num_domains;
+	image_len = crat_table->length;
+
+	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
+
+	for (node_id = 0; node_id < num_nodes; node_id++) {
+		top_dev = kfd_create_topology_device();
+		if (!top_dev) {
+			kfd_release_live_view();
+			return -ENOMEM;
+	}
+}
+
+	sys_props.platform_id = *((uint64_t *)crat_table->oem_id);
+	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
+	sys_props.platform_rev = crat_table->revision;
+
+	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
+	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
+			((char *)crat_image) + image_len) {
+		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
+			ret = kfd_parse_subtype(sub_type_hdr);
+			if (ret != 0) {
+				kfd_release_live_view();
+				return ret;
+			}
+		}
+
+		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
+				sub_type_hdr->length);
+	}
+
+	sys_props.generation_count++;
+	topology_crat_parsed = 1;
+
+	return 0;
+}
+
+
+#define sysfs_show_gen_prop(buffer, fmt, ...) \
+		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
+#define sysfs_show_32bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
+#define sysfs_show_64bit_prop(buffer, name, value) \
+		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
+#define sysfs_show_32bit_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%u\n", value)
+#define sysfs_show_str_val(buffer, value) \
+		sysfs_show_gen_prop(buffer, "%s\n", value)
+
+static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (attr == &sys_props.attr_genid) {
+		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
+	} else if (attr == &sys_props.attr_props) {
+		sysfs_show_64bit_prop(buffer, "platform_oem",
+				sys_props.platform_oem);
+		sysfs_show_64bit_prop(buffer, "platform_id",
+				sys_props.platform_id);
+		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
+				sys_props.platform_rev);
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops sysprops_ops = {
+	.show = sysprops_show,
+};
+
+static struct kobj_type sysprops_type = {
+	.sysfs_ops = &sysprops_ops,
+};
+
+static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_iolink_properties *iolink;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	iolink = container_of(attr, struct kfd_iolink_properties, attr);
+	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
+	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
+	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
+	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
+	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
+	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
+	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
+	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
+	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
+	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
+	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
+			iolink->rec_transfer_size);
+	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
+
+	return ret;
+}
+
+static const struct sysfs_ops iolink_ops = {
+	.show = iolink_show,
+};
+
+static struct kobj_type iolink_type = {
+	.sysfs_ops = &iolink_ops,
+};
+
+static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_mem_properties *mem;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	mem = container_of(attr, struct kfd_mem_properties, attr);
+	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
+	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
+	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
+	sysfs_show_32bit_prop(buffer, "width", mem->width);
+	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
+
+	return ret;
+}
+
+static const struct sysfs_ops mem_ops = {
+	.show = mem_show,
+};
+
+static struct kobj_type mem_type = {
+	.sysfs_ops = &mem_ops,
+};
+
+static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	uint32_t i;
+	struct kfd_cache_properties *cache;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	cache = container_of(attr, struct kfd_cache_properties, attr);
+	sysfs_show_32bit_prop(buffer, "processor_id_low",
+			cache->processor_id_low);
+	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
+	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
+	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
+	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
+			cache->cachelines_per_tag);
+	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
+	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
+	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
+	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
+	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
+		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
+				buffer, cache->sibling_map[i],
+				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
+						"\n" : ",");
+
+	return ret;
+}
+
+static const struct sysfs_ops cache_ops = {
+	.show = kfd_cache_show,
+};
+
+static struct kobj_type cache_type = {
+	.sysfs_ops = &cache_ops,
+};
+
+static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
+		char *buffer)
+{
+	ssize_t ret;
+	struct kfd_topology_device *dev;
+	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+	uint32_t i;
+
+	/* Making sure that the buffer is an empty string */
+	buffer[0] = 0;
+
+	if (strcmp(attr->name, "gpu_id") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_gpuid);
+		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
+	} else if (strcmp(attr->name, "name") == 0) {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_name);
+		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
+			public_name[i] =
+					(char)dev->node_props.marketing_name[i];
+			if (dev->node_props.marketing_name[i] == 0)
+				break;
+		}
+		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
+		ret = sysfs_show_str_val(buffer, public_name);
+	} else {
+		dev = container_of(attr, struct kfd_topology_device,
+				attr_props);
+		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
+				dev->node_props.cpu_cores_count);
+		sysfs_show_32bit_prop(buffer, "simd_count",
+				dev->node_props.simd_count);
+		sysfs_show_32bit_prop(buffer, "mem_banks_count",
+				dev->node_props.mem_banks_count);
+		sysfs_show_32bit_prop(buffer, "caches_count",
+				dev->node_props.caches_count);
+		sysfs_show_32bit_prop(buffer, "io_links_count",
+				dev->node_props.io_links_count);
+		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
+				dev->node_props.cpu_core_id_base);
+		sysfs_show_32bit_prop(buffer, "simd_id_base",
+				dev->node_props.simd_id_base);
+		sysfs_show_32bit_prop(buffer, "capability",
+				dev->node_props.capability);
+		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
+				dev->node_props.max_waves_per_simd);
+		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
+				dev->node_props.lds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
+				dev->node_props.gds_size_in_kb);
+		sysfs_show_32bit_prop(buffer, "wave_front_size",
+				dev->node_props.wave_front_size);
+		sysfs_show_32bit_prop(buffer, "array_count",
+				dev->node_props.array_count);
+		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
+				dev->node_props.simd_arrays_per_engine);
+		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
+				dev->node_props.cu_per_simd_array);
+		sysfs_show_32bit_prop(buffer, "simd_per_cu",
+				dev->node_props.simd_per_cu);
+		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
+				dev->node_props.max_slots_scratch_cu);
+		sysfs_show_32bit_prop(buffer, "engine_id",
+				dev->node_props.engine_id);
+		sysfs_show_32bit_prop(buffer, "vendor_id",
+				dev->node_props.vendor_id);
+		sysfs_show_32bit_prop(buffer, "device_id",
+				dev->node_props.device_id);
+		sysfs_show_32bit_prop(buffer, "location_id",
+				dev->node_props.location_id);
+		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
+				dev->node_props.max_engine_clk_fcompute);
+		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
+				dev->node_props.max_engine_clk_ccompute);
+	}
+
+	return ret;
+}
+
+static const struct sysfs_ops node_ops = {
+	.show = node_show,
+};
+
+static struct kobj_type node_type = {
+	.sysfs_ops = &node_ops,
+};
+
+static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
+{
+	sysfs_remove_file(kobj, attr);
+	kobject_del(kobj);
+	kobject_put(kobj);
+}
+
+static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+
+	BUG_ON(!dev);
+
+	if (dev->kobj_iolink) {
+		list_for_each_entry(iolink, &dev->io_link_props, list)
+			if (iolink->kobj) {
+				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
+				iolink->kobj = 0;
+			}
+		kobject_del(dev->kobj_iolink);
+		kobject_put(dev->kobj_iolink);
+		dev->kobj_iolink = 0;
+	}
+
+	if (dev->kobj_cache) {
+		list_for_each_entry(cache, &dev->cache_props, list)
+			if (cache->kobj) {
+				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
+				cache->kobj = 0;
+			}
+		kobject_del(dev->kobj_cache);
+		kobject_put(dev->kobj_cache);
+		dev->kobj_cache = 0;
+	}
+
+	if (dev->kobj_mem) {
+		list_for_each_entry(mem, &dev->mem_props, list)
+			if (mem->kobj) {
+				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
+				mem->kobj = 0;
+			}
+		kobject_del(dev->kobj_mem);
+		kobject_put(dev->kobj_mem);
+		dev->kobj_mem = 0;
+	}
+
+	if (dev->kobj_node) {
+		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
+		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
+		kobject_del(dev->kobj_node);
+		kobject_put(dev->kobj_node);
+		dev->kobj_node = 0;
+	}
+}
+
+static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
+		uint32_t id)
+{
+	struct kfd_iolink_properties *iolink;
+	struct kfd_cache_properties *cache;
+	struct kfd_mem_properties *mem;
+	int ret;
+	uint32_t i;
+
+	BUG_ON(!dev);
+
+	/*
+	 * Creating the sysfs folders
+	 */
+	BUG_ON(dev->kobj_node);
+	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
+	if (!dev->kobj_node)
+		return -ENOMEM;
+
+	ret = kobject_init_and_add(dev->kobj_node, &node_type,
+			sys_props.kobj_nodes, "%d", id);
+	if (ret < 0)
+		return ret;
+
+	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
+	if (!dev->kobj_mem)
+		return -ENOMEM;
+
+	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
+	if (!dev->kobj_cache)
+		return -ENOMEM;
+
+	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
+	if (!dev->kobj_iolink)
+		return -ENOMEM;
+
+	/*
+	 * Creating sysfs files for node properties
+	 */
+	dev->attr_gpuid.name = "gpu_id";
+	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_gpuid);
+	dev->attr_name.name = "name";
+	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_name);
+	dev->attr_props.name = "properties";
+	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
+	sysfs_attr_init(&dev->attr_props);
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
+	if (ret < 0)
+		return ret;
+	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
+	if (ret < 0)
+		return ret;
+
+	i = 0;
+	list_for_each_entry(mem, &dev->mem_props, list) {
+		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!mem->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(mem->kobj, &mem_type,
+				dev->kobj_mem, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		mem->attr.name = "properties";
+		mem->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&mem->attr);
+		ret = sysfs_create_file(mem->kobj, &mem->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(cache, &dev->cache_props, list) {
+		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!cache->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(cache->kobj, &cache_type,
+				dev->kobj_cache, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		cache->attr.name = "properties";
+		cache->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&cache->attr);
+		ret = sysfs_create_file(cache->kobj, &cache->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	i = 0;
+	list_for_each_entry(iolink, &dev->io_link_props, list) {
+		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
+		if (!iolink->kobj)
+			return -ENOMEM;
+		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
+				dev->kobj_iolink, "%d", i);
+		if (ret < 0)
+			return ret;
+
+		iolink->attr.name = "properties";
+		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&iolink->attr);
+		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
+		if (ret < 0)
+			return ret;
+		i++;
+}
+
+	return 0;
+}
+
+static int kfd_build_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+	int ret;
+	uint32_t i = 0;
+
+	list_for_each_entry(dev, &topology_device_list, list) {
+		ret = kfd_build_sysfs_node_entry(dev, 0);
+		if (ret < 0)
+			return ret;
+		i++;
+	}
+
+	return 0;
+}
+
+static void kfd_remove_sysfs_node_tree(void)
+{
+	struct kfd_topology_device *dev;
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		kfd_remove_sysfs_node_entry(dev);
+}
+
+static int kfd_topology_update_sysfs(void)
+{
+	int ret;
+
+	pr_info("Creating topology SYSFS entries\n");
+	if (sys_props.kobj_topology == 0) {
+		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
+		if (!sys_props.kobj_topology)
+			return -ENOMEM;
+
+		ret = kobject_init_and_add(sys_props.kobj_topology,
+				&sysprops_type,  &kfd_device->kobj,
+				"topology");
+		if (ret < 0)
+			return ret;
+
+		sys_props.kobj_nodes = kobject_create_and_add("nodes",
+				sys_props.kobj_topology);
+		if (!sys_props.kobj_nodes)
+			return -ENOMEM;
+
+		sys_props.attr_genid.name = "generation_id";
+		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_genid);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		if (ret < 0)
+			return ret;
+
+		sys_props.attr_props.name = "system_properties";
+		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
+		sysfs_attr_init(&sys_props.attr_props);
+		ret = sysfs_create_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (ret < 0)
+			return ret;
+	}
+
+	kfd_remove_sysfs_node_tree();
+
+	return kfd_build_sysfs_node_tree();
+}
+
+static void kfd_topology_release_sysfs(void)
+{
+	kfd_remove_sysfs_node_tree();
+	if (sys_props.kobj_topology) {
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_genid);
+		sysfs_remove_file(sys_props.kobj_topology,
+				&sys_props.attr_props);
+		if (sys_props.kobj_nodes) {
+			kobject_del(sys_props.kobj_nodes);
+			kobject_put(sys_props.kobj_nodes);
+			sys_props.kobj_nodes = 0;
+		}
+		kobject_del(sys_props.kobj_topology);
+		kobject_put(sys_props.kobj_topology);
+		sys_props.kobj_topology = 0;
+	}
+}
+
+int kfd_topology_init(void)
+{
+	void *crat_image = 0;
+	size_t image_size = 0;
+	int ret;
+
+	/*
+	 * Initialize the head for the topology device list
+	 */
+	INIT_LIST_HEAD(&topology_device_list);
+	init_rwsem(&topology_lock);
+	topology_crat_parsed = 0;
+
+	memset(&sys_props, 0, sizeof(sys_props));
+
+	/*
+	 * Get the CRAT image from the ACPI
+	 */
+	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+	if (ret == 0 && image_size > 0) {
+		pr_info("Found CRAT image with size=%zd\n", image_size);
+		crat_image = kmalloc(image_size, GFP_KERNEL);
+		if (!crat_image) {
+			ret = -ENOMEM;
+			pr_err("No memory for allocating CRAT image\n");
+			goto err;
+		}
+		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
+
+		if (ret == 0) {
+			down_write(&topology_lock);
+			ret = kfd_parse_crat_table(crat_image);
+			if (ret == 0)
+				ret = kfd_topology_update_sysfs();
+			up_write(&topology_lock);
+		} else {
+			pr_err("Couldn't get CRAT table size from ACPI\n");
+		}
+		kfree(crat_image);
+	} else if (ret == -ENODATA) {
+		ret = 0;
+	} else {
+		pr_err("Couldn't get CRAT table size from ACPI\n");
+	}
+
+err:
+	pr_info("Finished initializing topology ret=%d\n", ret);
+	return ret;
+}
+
+void kfd_topology_shutdown(void)
+{
+	kfd_topology_release_sysfs();
+	kfd_release_live_view();
+}
+
+static void kfd_debug_print_topology(void)
+{
+	struct kfd_topology_device *dev;
+	uint32_t i = 0;
+
+	pr_info("DEBUG PRINT OF TOPOLOGY:");
+	list_for_each_entry(dev, &topology_device_list, list) {
+		pr_info("Node: %d\n", i);
+		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
+		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
+		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
+		i++;
+	}
+}
+
+static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
+{
+	uint32_t hashout;
+	uint32_t buf[7];
+	int i;
+
+	if (!gpu)
+		return 0;
+
+	buf[0] = gpu->pdev->devfn;
+	buf[1] = gpu->pdev->subsystem_vendor;
+	buf[2] = gpu->pdev->subsystem_device;
+	buf[3] = gpu->pdev->device;
+	buf[4] = gpu->pdev->bus->number;
+	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
+	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
+
+	for (i = 0, hashout = 0; i < 7; i++)
+		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
+
+	return hashout;
+}
+
+static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	struct kfd_topology_device *out_dev = 0;
+
+	BUG_ON(!gpu);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
+			dev->gpu = gpu;
+			out_dev = dev;
+			break;
+		}
+
+	return out_dev;
+}
+
+static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
+{
+	/*
+	 * TODO: Generate an event for thunk about the arrival/removal
+	 * of the GPU
+	 */
+}
+
+int kfd_topology_add_device(struct kfd_dev *gpu)
+{
+	uint32_t gpu_id;
+	struct kfd_topology_device *dev;
+	int res;
+
+	BUG_ON(!gpu);
+
+	gpu_id = kfd_generate_gpu_id(gpu);
+
+	pr_info("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
+
+	down_write(&topology_lock);
+	/*
+	 * Try to assign the GPU to existing topology device (generated from
+	 * CRAT table
+	 */
+	dev = kfd_assign_gpu(gpu);
+	if (!dev) {
+		pr_info("GPU was not found in the current topology. Extending.\n");
+		kfd_debug_print_topology();
+		dev = kfd_create_topology_device();
+		if (!dev) {
+			res = -ENOMEM;
+			goto err;
+		}
+		dev->gpu = gpu;
+
+		/*
+		 * TODO: Make a call to retrieve topology information from the
+		 * GPU vBIOS
+		 */
+
+		/*
+		 * Update the SYSFS tree, since we added another topology device
+		 */
+		if (kfd_topology_update_sysfs() < 0)
+			kfd_topology_release_sysfs();
+
+	}
+
+	dev->gpu_id = gpu_id;
+	gpu->id = gpu_id;
+	dev->node_props.vendor_id = gpu->pdev->vendor;
+	dev->node_props.device_id = gpu->pdev->device;
+	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
+			(gpu->pdev->devfn & 0xffffff);
+	/*
+	 * TODO: Retrieve max engine clock values from KGD
+	 */
+
+	res = 0;
+
+err:
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 1);
+
+	return res;
+}
+
+int kfd_topology_remove_device(struct kfd_dev *gpu)
+{
+	struct kfd_topology_device *dev;
+	uint32_t gpu_id;
+	int res = -ENODEV;
+
+	BUG_ON(!gpu);
+
+	down_write(&topology_lock);
+
+	list_for_each_entry(dev, &topology_device_list, list)
+		if (dev->gpu == gpu) {
+			gpu_id = dev->gpu_id;
+			kfd_remove_sysfs_node_entry(dev);
+			kfd_release_topology_device(dev);
+			res = 0;
+			if (kfd_topology_update_sysfs() < 0)
+				kfd_topology_release_sysfs();
+			break;
+		}
+
+	up_write(&topology_lock);
+
+	if (res == 0)
+		kfd_notify_gpu_change(gpu_id, 0);
+
+	return res;
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_topology.h b/drivers/gpu/hsa/radeon/kfd_topology.h
new file mode 100644
index 0000000..989624b
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_topology.h
@@ -0,0 +1,168 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __KFD_TOPOLOGY_H__
+#define __KFD_TOPOLOGY_H__
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include "kfd_priv.h"
+
+#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
+
+#define HSA_CAP_HOT_PLUGGABLE			0x00000001
+#define HSA_CAP_ATS_PRESENT			0x00000002
+#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
+#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
+#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
+#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
+#define HSA_CAP_VA_LIMIT			0x00000040
+#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
+#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
+#define HSA_CAP_RESERVED			0xfffff000
+
+struct kfd_node_properties {
+	uint32_t cpu_cores_count;
+	uint32_t simd_count;
+	uint32_t mem_banks_count;
+	uint32_t caches_count;
+	uint32_t io_links_count;
+	uint32_t cpu_core_id_base;
+	uint32_t simd_id_base;
+	uint32_t capability;
+	uint32_t max_waves_per_simd;
+	uint32_t lds_size_in_kb;
+	uint32_t gds_size_in_kb;
+	uint32_t wave_front_size;
+	uint32_t array_count;
+	uint32_t simd_arrays_per_engine;
+	uint32_t cu_per_simd_array;
+	uint32_t simd_per_cu;
+	uint32_t max_slots_scratch_cu;
+	uint32_t engine_id;
+	uint32_t vendor_id;
+	uint32_t device_id;
+	uint32_t location_id;
+	uint32_t max_engine_clk_fcompute;
+	uint32_t max_engine_clk_ccompute;
+	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
+};
+
+#define HSA_MEM_HEAP_TYPE_SYSTEM	0
+#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
+#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
+#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
+#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
+#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
+
+#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
+#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
+#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
+
+struct kfd_mem_properties {
+	struct list_head	list;
+	uint32_t		heap_type;
+	uint64_t		size_in_bytes;
+	uint32_t		flags;
+	uint32_t		width;
+	uint32_t		mem_clk_max;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+#define KFD_TOPOLOGY_CPU_SIBLINGS 256
+
+#define HSA_CACHE_TYPE_DATA		0x00000001
+#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
+#define HSA_CACHE_TYPE_CPU		0x00000004
+#define HSA_CACHE_TYPE_HSACU		0x00000008
+#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
+
+struct kfd_cache_properties {
+	struct list_head	list;
+	uint32_t		processor_id_low;
+	uint32_t		cache_level;
+	uint32_t		cache_size;
+	uint32_t		cacheline_size;
+	uint32_t		cachelines_per_tag;
+	uint32_t		cache_assoc;
+	uint32_t		cache_latency;
+	uint32_t		cache_type;
+	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_iolink_properties {
+	struct list_head	list;
+	uint32_t		iolink_type;
+	uint32_t		ver_maj;
+	uint32_t		ver_min;
+	uint32_t		node_from;
+	uint32_t		node_to;
+	uint32_t		weight;
+	uint32_t		min_latency;
+	uint32_t		max_latency;
+	uint32_t		min_bandwidth;
+	uint32_t		max_bandwidth;
+	uint32_t		rec_transfer_size;
+	uint32_t		flags;
+	struct kobject		*kobj;
+	struct attribute	attr;
+};
+
+struct kfd_topology_device {
+	struct list_head		list;
+	uint32_t			gpu_id;
+	struct kfd_node_properties	node_props;
+	uint32_t			mem_bank_count;
+	struct list_head		mem_props;
+	uint32_t			cache_count;
+	struct list_head		cache_props;
+	uint32_t			io_link_count;
+	struct list_head		io_link_props;
+	struct kfd_dev			*gpu;
+	struct kobject			*kobj_node;
+	struct kobject			*kobj_mem;
+	struct kobject			*kobj_cache;
+	struct kobject			*kobj_iolink;
+	struct attribute		attr_gpuid;
+	struct attribute		attr_name;
+	struct attribute		attr_props;
+};
+
+struct kfd_system_properties {
+	uint32_t		num_devices;     /* Number of H-NUMA nodes */
+	uint32_t		generation_count;
+	uint64_t		platform_oem;
+	uint64_t		platform_id;
+	uint64_t		platform_rev;
+	struct kobject		*kobj_topology;
+	struct kobject		*kobj_nodes;
+	struct attribute	attr_genid;
+	struct attribute	attr_props;
+};
+
+
+
+#endif /* __KFD_TOPOLOGY_H__ */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 10/83] hsa/radeon: Add initialization and unmapping of doorbell aperture
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (6 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 11/83] hsa/radeon: Add scheduler code Oded Gabbay
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds initialization of the doorbell aperture when
initializing a kfd device.

It also adds a call to unmap the doorbell when a process unbinds
from the kfd

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/Makefile       |  3 +-
 drivers/gpu/hsa/radeon/kfd_device.c   |  2 +
 drivers/gpu/hsa/radeon/kfd_doorbell.c | 72 +++++++++++++++++++++++++++++++++++
 3 files changed, 76 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/hsa/radeon/kfd_doorbell.c

diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
index ba16a09..989518a 100644
--- a/drivers/gpu/hsa/radeon/Makefile
+++ b/drivers/gpu/hsa/radeon/Makefile
@@ -3,6 +3,7 @@
 #
 
 radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
-		kfd_pasid.o kfd_topology.o kfd_process.o
+		kfd_pasid.o kfd_topology.o kfd_process.o \
+		kfd_doorbell.o
 
 obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
index d122920..4e9fe6c 100644
--- a/drivers/gpu/hsa/radeon/kfd_device.c
+++ b/drivers/gpu/hsa/radeon/kfd_device.c
@@ -123,6 +123,8 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 	kfd->regs = gpu_resources->mmio_registers;
 
+	radeon_kfd_doorbell_init(kfd);
+
 	if (!device_iommu_pasid_init(kfd))
 		return false;
 
diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
new file mode 100644
index 0000000..79a9d4b
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
@@ -0,0 +1,72 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+#include <linux/mm.h>
+#include <linux/mman.h>
+
+/*
+ * Each device exposes a doorbell aperture, a PCI MMIO aperture that
+ * receives 32-bit writes that are passed to queues as wptr values.
+ * The doorbells are intended to be written by applications as part
+ * of queueing work on user-mode queues.
+ * We assign doorbells to applications in PAGE_SIZE-sized and aligned chunks.
+ * We map the doorbell address space into user-mode when a process creates
+ * its first queue on each device.
+ * Although the mapping is done by KFD, it is equivalent to an mmap of
+ * the /dev/kfd with the particular device encoded in the mmap offset.
+ * There will be other uses for mmap of /dev/kfd, so only a range of
+ * offsets (KFD_MMAP_DOORBELL_START-END) is used for doorbells.
+ */
+
+/* # of doorbell bytes allocated for each process. */
+static inline size_t doorbell_process_allocation(void)
+{
+	return roundup(sizeof(doorbell_t) * MAX_PROCESS_QUEUES, PAGE_SIZE);
+}
+
+/* Doorbell calculations for device init. */
+void radeon_kfd_doorbell_init(struct kfd_dev *kfd)
+{
+	size_t doorbell_start_offset;
+	size_t doorbell_aperture_size;
+	size_t doorbell_process_limit;
+
+	/* We start with calculations in bytes because the input data might only be byte-aligned.
+	** Only after we have done the rounding can we assume any alignment. */
+
+	doorbell_start_offset = roundup(kfd->shared_resources.doorbell_start_offset,
+					doorbell_process_allocation());
+	doorbell_aperture_size = rounddown(kfd->shared_resources.doorbell_aperture_size,
+					doorbell_process_allocation());
+
+	if (doorbell_aperture_size > doorbell_start_offset)
+		doorbell_process_limit =
+			(doorbell_aperture_size - doorbell_start_offset) / doorbell_process_allocation();
+	else
+		doorbell_process_limit = 0;
+
+	kfd->doorbell_base = kfd->shared_resources.doorbell_physical_address + doorbell_start_offset;
+	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(doorbell_t);
+	kfd->doorbell_process_limit = doorbell_process_limit;
+}
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 11/83] hsa/radeon: Add scheduler code
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (7 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 10/83] hsa/radeon: Add initialization and unmapping of doorbell aperture Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 18:25     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 12/83] hsa/radeon: Add kfd mmap handler Oded Gabbay
                   ` (17 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds the code base of the scheduler, which handles queue
creation, deletion and scheduling on the CP of the GPU.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/Makefile               |   3 +-
 drivers/gpu/hsa/radeon/cik_regs.h             | 213 +++++++
 drivers/gpu/hsa/radeon/kfd_device.c           |   1 +
 drivers/gpu/hsa/radeon/kfd_registers.c        |  50 ++
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 800 ++++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_vidmem.c           |  61 ++
 6 files changed, 1127 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
 create mode 100644 drivers/gpu/hsa/radeon/kfd_registers.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
 create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c

diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
index 989518a..28da10c 100644
--- a/drivers/gpu/hsa/radeon/Makefile
+++ b/drivers/gpu/hsa/radeon/Makefile
@@ -4,6 +4,7 @@
 
 radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
 		kfd_pasid.o kfd_topology.o kfd_process.o \
-		kfd_doorbell.o
+		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
+		kfd_vidmem.o
 
 obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
new file mode 100644
index 0000000..d0cdc57
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -0,0 +1,213 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef CIK_REGS_H
+#define CIK_REGS_H
+
+#define BIF_DOORBELL_CNTL				0x530Cu
+
+#define	SRBM_GFX_CNTL					0xE44
+#define	PIPEID(x)					((x) << 0)
+#define	MEID(x)						((x) << 2)
+#define	VMID(x)						((x) << 4)
+#define	QUEUEID(x)					((x) << 8)
+
+#define	SQ_CONFIG					0x8C00
+
+#define	SH_MEM_BASES					0x8C28
+/* if PTR32, these are the bases for scratch and lds */
+#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
+#define	SHARED_BASE(x)					((x) << 16) /* LDS */
+#define	SH_MEM_APE1_BASE				0x8C2C
+/* if PTR32, this is the base location of GPUVM */
+#define	SH_MEM_APE1_LIMIT				0x8C30
+/* if PTR32, this is the upper limit of GPUVM */
+#define	SH_MEM_CONFIG					0x8C34
+#define	PTR32						(1 << 0)
+#define	ALIGNMENT_MODE(x)				((x) << 2)
+#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
+#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
+#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
+#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
+#define	DEFAULT_MTYPE(x)				((x) << 4)
+#define	APE1_MTYPE(x)					((x) << 7)
+
+/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_NONCACHED					3
+
+
+#define SH_STATIC_MEM_CONFIG				0x9604u
+
+#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
+#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
+#define	TC_CFG_L1_STORE_POLICY				0xAC70
+#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
+#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
+#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
+#define	TC_CFG_L2_STORE_POLICY1				0xAC80
+#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
+#define	TC_CFG_L1_VOLATILE				0xAC88
+#define	TC_CFG_L2_VOLATILE				0xAC8C
+
+#define CP_PQ_WPTR_POLL_CNTL				0xC20C
+#define	WPTR_POLL_EN					(1 << 31)
+
+#define CP_ME1_PIPE0_INT_CNTL				0xC214
+#define CP_ME1_PIPE1_INT_CNTL				0xC218
+#define CP_ME1_PIPE2_INT_CNTL				0xC21C
+#define CP_ME1_PIPE3_INT_CNTL				0xC220
+#define CP_ME2_PIPE0_INT_CNTL				0xC224
+#define CP_ME2_PIPE1_INT_CNTL				0xC228
+#define CP_ME2_PIPE2_INT_CNTL				0xC22C
+#define CP_ME2_PIPE3_INT_CNTL				0xC230
+#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
+#define PRIV_REG_INT_ENABLE				(1 << 23)
+#define TIME_STAMP_INT_ENABLE				(1 << 26)
+#define GENERIC2_INT_ENABLE				(1 << 29)
+#define GENERIC1_INT_ENABLE				(1 << 30)
+#define GENERIC0_INT_ENABLE				(1 << 31)
+#define CP_ME1_PIPE0_INT_STATUS				0xC214
+#define CP_ME1_PIPE1_INT_STATUS				0xC218
+#define CP_ME1_PIPE2_INT_STATUS				0xC21C
+#define CP_ME1_PIPE3_INT_STATUS				0xC220
+#define CP_ME2_PIPE0_INT_STATUS				0xC224
+#define CP_ME2_PIPE1_INT_STATUS				0xC228
+#define CP_ME2_PIPE2_INT_STATUS				0xC22C
+#define CP_ME2_PIPE3_INT_STATUS				0xC230
+#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
+#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
+#define PRIV_REG_INT_STATUS				(1 << 23)
+#define TIME_STAMP_INT_STATUS				(1 << 26)
+#define GENERIC2_INT_STATUS				(1 << 29)
+#define GENERIC1_INT_STATUS				(1 << 30)
+#define GENERIC0_INT_STATUS				(1 << 31)
+
+#define CP_HPD_EOP_BASE_ADDR				0xC904
+#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
+#define CP_HPD_EOP_VMID					0xC90C
+#define CP_HPD_EOP_CONTROL				0xC910
+#define	EOP_SIZE(x)					((x) << 0)
+#define	EOP_SIZE_MASK					(0x3f << 0)
+#define CP_MQD_BASE_ADDR				0xC914
+#define CP_MQD_BASE_ADDR_HI				0xC918
+#define CP_HQD_ACTIVE					0xC91C
+#define CP_HQD_VMID					0xC920
+
+#define CP_HQD_PERSISTENT_STATE				0xC924u
+#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
+
+#define CP_HQD_PIPE_PRIORITY				0xC928u
+#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
+#define CP_HQD_QUANTUM					0xC930u
+#define	QUANTUM_EN					1U
+#define	QUANTUM_SCALE_1MS				(1U << 4)
+#define	QUANTUM_DURATION(x)				((x) << 8)
+
+#define CP_HQD_PQ_BASE					0xC934
+#define CP_HQD_PQ_BASE_HI				0xC938
+#define CP_HQD_PQ_RPTR					0xC93C
+#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
+#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
+#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
+#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
+#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
+#define	DOORBELL_OFFSET(x)				((x) << 2)
+#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
+#define	DOORBELL_SOURCE					(1 << 28)
+#define	DOORBELL_SCHD_HIT				(1 << 29)
+#define	DOORBELL_EN					(1 << 30)
+#define	DOORBELL_HIT					(1 << 31)
+#define CP_HQD_PQ_WPTR					0xC954
+#define CP_HQD_PQ_CONTROL				0xC958
+#define	QUEUE_SIZE(x)					((x) << 0)
+#define	QUEUE_SIZE_MASK					(0x3f << 0)
+#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
+#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
+#define	MIN_AVAIL_SIZE(x)				((x) << 20)
+#define	PQ_ATC_EN					(1 << 23)
+#define	PQ_VOLATILE					(1 << 26)
+#define	NO_UPDATE_RPTR					(1 << 27)
+#define	UNORD_DISPATCH					(1 << 28)
+#define	ROQ_PQ_IB_FLIP					(1 << 29)
+#define	PRIV_STATE					(1 << 30)
+#define	KMD_QUEUE					(1 << 31)
+
+#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
+#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
+
+#define CP_HQD_IB_BASE_ADDR				0xC95Cu
+#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
+#define CP_HQD_IB_RPTR					0xC964u
+#define CP_HQD_IB_CONTROL				0xC968u
+#define	IB_ATC_EN					(1U << 23)
+#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
+
+#define CP_HQD_DEQUEUE_REQUEST				0xC974
+#define	DEQUEUE_REQUEST_DRAIN				1
+
+#define CP_HQD_SEMA_CMD					0xC97Cu
+#define CP_HQD_MSG_TYPE					0xC980u
+#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
+#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
+#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
+#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
+#define CP_HQD_HQ_SCHEDULER0				0xC994u
+#define CP_HQD_HQ_SCHEDULER1				0xC998u
+
+
+#define CP_MQD_CONTROL					0xC99C
+#define	MQD_VMID(x)					((x) << 0)
+#define	MQD_VMID_MASK					(0xf << 0)
+#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
+
+#define GRBM_GFX_INDEX					0x30800
+#define	INSTANCE_INDEX(x)				((x) << 0)
+#define	SH_INDEX(x)					((x) << 8)
+#define	SE_INDEX(x)					((x) << 16)
+#define	SH_BROADCAST_WRITES				(1 << 29)
+#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
+#define	SE_BROADCAST_WRITES				(1 << 31)
+
+#define SQC_CACHES					0x30d20
+#define SQC_POLICY					0x8C38u
+#define SQC_VOLATILE					0x8C3Cu
+
+#define CP_PERFMON_CNTL					0x36020
+
+#define ATC_VMID0_PASID_MAPPING				0x339Cu
+#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
+#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
+
+#define ATC_VM_APERTURE0_CNTL				0x3310u
+#define	ATS_ACCESS_MODE_NEVER				0
+#define	ATS_ACCESS_MODE_ALWAYS				1
+
+#define ATC_VM_APERTURE0_CNTL2				0x3318u
+#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
+#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
+#define ATC_VM_APERTURE1_CNTL				0x3314u
+#define ATC_VM_APERTURE1_CNTL2				0x331Cu
+#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
+#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
+
+#endif
diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
index 4e9fe6c..465c822 100644
--- a/drivers/gpu/hsa/radeon/kfd_device.c
+++ b/drivers/gpu/hsa/radeon/kfd_device.c
@@ -28,6 +28,7 @@
 #include "kfd_scheduler.h"
 
 static const struct kfd_device_info bonaire_device_info = {
+	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
 	.max_pasid_bits = 16,
 };
 
diff --git a/drivers/gpu/hsa/radeon/kfd_registers.c b/drivers/gpu/hsa/radeon/kfd_registers.c
new file mode 100644
index 0000000..223debd
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_registers.c
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/io.h>
+#include "kfd_priv.h"
+
+/* In KFD, "reg" is the byte offset of the register. */
+static void __iomem *reg_address(struct kfd_dev *dev, uint32_t reg)
+{
+	return dev->regs + reg;
+}
+
+void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value)
+{
+	writel(value, reg_address(dev, reg));
+}
+
+uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg)
+{
+	return readl(reg_address(dev, reg));
+}
+
+void radeon_kfd_lock_srbm_index(struct kfd_dev *dev)
+{
+	kfd2kgd->lock_srbm_gfx_cntl(dev->kgd);
+}
+
+void radeon_kfd_unlock_srbm_index(struct kfd_dev *dev)
+{
+	kfd2kgd->unlock_srbm_gfx_cntl(dev->kgd);
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
new file mode 100644
index 0000000..b986ff9
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -0,0 +1,800 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/log2.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+#include <linux/uaccess.h>
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+#include "cik_regs.h"
+
+/* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
+ * The first MEC is ME 1 with the GFX ME as ME 0.
+ * We split the CP with the KGD, they take the first N pipes and we take the rest.
+ */
+#define CIK_QUEUES_PER_PIPE 8
+#define CIK_PIPES_PER_MEC 4
+
+#define CIK_MAX_PIPES (2 * CIK_PIPES_PER_MEC)
+
+#define CIK_NUM_VMID 16
+
+#define CIK_HPD_SIZE_LOG2 11
+#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
+#define CIK_HPD_ALIGNMENT 256
+#define CIK_MQD_ALIGNMENT 4
+
+#pragma pack(push, 4)
+
+struct cik_hqd_registers {
+	u32 cp_mqd_base_addr;
+	u32 cp_mqd_base_addr_hi;
+	u32 cp_hqd_active;
+	u32 cp_hqd_vmid;
+	u32 cp_hqd_persistent_state;
+	u32 cp_hqd_pipe_priority;
+	u32 cp_hqd_queue_priority;
+	u32 cp_hqd_quantum;
+	u32 cp_hqd_pq_base;
+	u32 cp_hqd_pq_base_hi;
+	u32 cp_hqd_pq_rptr;
+	u32 cp_hqd_pq_rptr_report_addr;
+	u32 cp_hqd_pq_rptr_report_addr_hi;
+	u32 cp_hqd_pq_wptr_poll_addr;
+	u32 cp_hqd_pq_wptr_poll_addr_hi;
+	u32 cp_hqd_pq_doorbell_control;
+	u32 cp_hqd_pq_wptr;
+	u32 cp_hqd_pq_control;
+	u32 cp_hqd_ib_base_addr;
+	u32 cp_hqd_ib_base_addr_hi;
+	u32 cp_hqd_ib_rptr;
+	u32 cp_hqd_ib_control;
+	u32 cp_hqd_iq_timer;
+	u32 cp_hqd_iq_rptr;
+	u32 cp_hqd_dequeue_request;
+	u32 cp_hqd_dma_offload;
+	u32 cp_hqd_sema_cmd;
+	u32 cp_hqd_msg_type;
+	u32 cp_hqd_atomic0_preop_lo;
+	u32 cp_hqd_atomic0_preop_hi;
+	u32 cp_hqd_atomic1_preop_lo;
+	u32 cp_hqd_atomic1_preop_hi;
+	u32 cp_hqd_hq_scheduler0;
+	u32 cp_hqd_hq_scheduler1;
+	u32 cp_mqd_control;
+};
+
+struct cik_mqd {
+	u32 header;
+	u32 dispatch_initiator;
+	u32 dimensions[3];
+	u32 start_idx[3];
+	u32 num_threads[3];
+	u32 pipeline_stat_enable;
+	u32 perf_counter_enable;
+	u32 pgm[2];
+	u32 tba[2];
+	u32 tma[2];
+	u32 pgm_rsrc[2];
+	u32 vmid;
+	u32 resource_limits;
+	u32 static_thread_mgmt01[2];
+	u32 tmp_ring_size;
+	u32 static_thread_mgmt23[2];
+	u32 restart[3];
+	u32 thread_trace_enable;
+	u32 reserved1;
+	u32 user_data[16];
+	u32 vgtcs_invoke_count[2];
+	struct cik_hqd_registers queue_state;
+	u32 dequeue_cntr;
+	u32 interrupt_queue[64];
+};
+
+struct cik_mqd_padded {
+	struct cik_mqd mqd;
+	u8 padding[1024 - sizeof(struct cik_mqd)]; /* Pad MQD out to 1KB. (HW requires 4-byte alignment.) */
+};
+
+#pragma pack(pop)
+
+struct cik_static_private {
+	struct kfd_dev *dev;
+
+	struct mutex mutex;
+
+	unsigned int first_pipe;
+	unsigned int num_pipes;
+
+	unsigned long free_vmid_mask; /* unsigned long to make set/clear_bit happy */
+
+	/* Everything below here is offset by first_pipe. E.g. bit 0 in
+	 * free_queues is queue 0 in pipe first_pipe
+	 */
+
+	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
+	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
+
+	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
+	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
+				 * pipes. This is actually struct cik_mqd_padded. */
+	uint64_t hpd_addr;	/* GPU address for hpd_mem. */
+	uint64_t mqd_addr;	/* GPU address for mqd_mem. */
+	 /*
+	  * Pointer for mqd_mem.
+	  * We keep this mapped because multiple processes may need to access it
+	  * in parallel and this is simpler than controlling concurrent kmaps
+	  */
+	struct cik_mqd_padded *mqds;
+};
+
+struct cik_static_process {
+	unsigned int vmid;
+	pasid_t pasid;
+};
+
+struct cik_static_queue {
+	unsigned int queue; /* + first_pipe * QUEUES_PER_PIPE */
+
+	uint64_t mqd_addr;
+	struct cik_mqd *mqd;
+
+	void __user *pq_addr;
+	void __user *rptr_address;
+	doorbell_t __user *wptr_address;
+	uint32_t doorbell_index;
+
+	uint32_t queue_size_encoded; /* CP_HQD_PQ_CONTROL.QUEUE_SIZE takes the queue size as log2(size) - 3. */
+};
+
+static uint32_t lower_32(uint64_t x)
+{
+	return (uint32_t)x;
+}
+
+static uint32_t upper_32(uint64_t x)
+{
+	return (uint32_t)(x >> 32);
+}
+
+/* SRBM_GFX_CNTL provides the MEC/pipe/queue and vmid for many registers that are
+ * In particular, CP_HQD_* and CP_MQD_* are instanced for each queue. CP_HPD_* are instanced for each pipe.
+ * SH_MEM_* are instanced per-VMID.
+ *
+ * We provide queue_select, pipe_select and vmid_select helpers that should be used before accessing
+ * registers from those groups. Note that these overwrite each other, e.g. after vmid_select the current
+ * selected MEC/pipe/queue is undefined.
+ *
+ * SRBM_GFX_CNTL and the registers it indexes are shared with KGD. You must be holding the srbm_gfx_cntl
+ * lock via lock_srbm_index before setting SRBM_GFX_CNTL or accessing any of the instanced registers.
+ */
+static uint32_t make_srbm_gfx_cntl_mpqv(unsigned int me, unsigned int pipe, unsigned int queue, unsigned int vmid)
+{
+	return QUEUEID(queue) | VMID(vmid) | MEID(me) | PIPEID(pipe);
+}
+
+static void pipe_select(struct cik_static_private *priv, unsigned int pipe)
+{
+	unsigned int pipe_in_mec = (pipe + priv->first_pipe) % CIK_PIPES_PER_MEC;
+	unsigned int mec = (pipe + priv->first_pipe) / CIK_PIPES_PER_MEC;
+
+	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, 0, 0));
+}
+
+static void queue_select(struct cik_static_private *priv, unsigned int queue)
+{
+	unsigned int queue_in_pipe = queue % CIK_QUEUES_PER_PIPE;
+	unsigned int pipe = queue / CIK_QUEUES_PER_PIPE + priv->first_pipe;
+	unsigned int pipe_in_mec = pipe % CIK_PIPES_PER_MEC;
+	unsigned int mec = pipe / CIK_PIPES_PER_MEC;
+
+#if 0
+	dev_err(radeon_kfd_chardev(), "queue select %d = %u/%u/%u = 0x%08x\n", queue, mec+1, pipe_in_mec, queue_in_pipe,
+		make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
+#endif
+
+	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
+}
+
+static void vmid_select(struct cik_static_private *priv, unsigned int vmid)
+{
+	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(0, 0, 0, vmid));
+}
+
+static void lock_srbm_index(struct cik_static_private *priv)
+{
+	radeon_kfd_lock_srbm_index(priv->dev);
+}
+
+static void unlock_srbm_index(struct cik_static_private *priv)
+{
+	WRITE_REG(priv->dev, SRBM_GFX_CNTL, 0);	/* Be nice to KGD, reset indexed CP registers to the GFX pipe. */
+	radeon_kfd_unlock_srbm_index(priv->dev);
+}
+
+/* One-time setup for all compute pipes. They need to be programmed with the address & size of the HPD EOP buffer. */
+static void init_pipes(struct cik_static_private *priv)
+{
+	unsigned int i;
+
+	lock_srbm_index(priv);
+
+	for (i = 0; i < priv->num_pipes; i++) {
+		uint64_t pipe_hpd_addr = priv->hpd_addr + i * CIK_HPD_SIZE;
+
+		pipe_select(priv, i);
+
+		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR, lower_32(pipe_hpd_addr >> 8));
+		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR_HI, upper_32(pipe_hpd_addr >> 8));
+		WRITE_REG(priv->dev, CP_HPD_EOP_VMID, 0);
+		WRITE_REG(priv->dev, CP_HPD_EOP_CONTROL, CIK_HPD_SIZE_LOG2 - 1);
+	}
+
+	unlock_srbm_index(priv);
+}
+
+/* Program the VMID -> PASID mapping for one VMID.
+ * PASID 0 is special: it means to associate no PASID with that VMID.
+ * This function waits for the VMID/PASID mapping to complete.
+ */
+static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int vmid, pasid_t pasid)
+{
+	/* We have to assume that there is no outstanding mapping.
+	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
+	 * is in progress or because a mapping finished and the SW cleared it.
+	 * So the protocol is to always wait & clear.
+	 */
+
+	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
+
+	WRITE_REG(priv->dev, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
+
+	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
+		cpu_relax();
+	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
+}
+
+static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
+{
+	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
+	 * The hardware fills in the remaining 59 bits according to the following pattern:
+	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
+	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
+	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
+	 *
+	 * (where X/Y is the configurable nybble with the low-bit 0)
+	 *
+	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
+	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
+	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
+	 */
+
+	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE);
+
+	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
+}
+
+/* Initial programming for all ATS registers.
+ * - enable ATS for all compute VMIDs
+ * - clear the VMID/PASID mapping for all compute VMIDS
+ * - program the shader core flat address settings:
+ * -- 64-bit mode
+ * -- unaligned access allowed
+ * -- noncached (this is the only CPU-coherent mode in CIK)
+ * -- APE 1 disabled
+ */
+static void init_ats(struct cik_static_private *priv)
+{
+	unsigned int i;
+
+	/* Enable self-ringing doorbell recognition and direct the BIF to send
+	 * untranslated writes to the IOMMU before comparing to the aperture.*/
+	WRITE_REG(priv->dev, BIF_DOORBELL_CNTL, 0);
+
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_ALWAYS);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, priv->free_vmid_mask);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_LOW_ADDR, 0);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_HIGH_ADDR, 0);
+
+	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL, 0);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL2, 0);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE1_LOW_ADDR, 0);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE1_HIGH_ADDR, 0);
+
+	lock_srbm_index(priv);
+
+	for (i = 0; i < CIK_NUM_VMID; i++) {
+		if (priv->free_vmid_mask & (1U << i)) {
+			uint32_t sh_mem_config;
+
+			set_vmid_pasid_mapping(priv, i, 0);
+
+			vmid_select(priv, i);
+
+			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
+			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
+
+			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
+
+			/* Configure apertures:
+			 * LDS:		0x60000000'00000000 - 0x60000001'00000000 (4GB)
+			 * Scratch:	0x60000001'00000000 - 0x60000002'00000000 (4GB)
+			 * GPUVM:	0x60010000'00000000 - 0x60020000'00000000 (1TB)
+			 */
+			WRITE_REG(priv->dev, SH_MEM_BASES, compute_sh_mem_bases_64bit(6));
+
+			/* Scratch aperture is not supported for now. */
+			WRITE_REG(priv->dev, SH_STATIC_MEM_CONFIG, 0);
+
+			/* APE1 disabled for now. */
+			WRITE_REG(priv->dev, SH_MEM_APE1_BASE, 1);
+			WRITE_REG(priv->dev, SH_MEM_APE1_LIMIT, 0);
+		}
+	}
+
+	unlock_srbm_index(priv);
+}
+
+static void exit_ats(struct cik_static_private *priv)
+{
+	unsigned int i;
+
+	for (i = 0; i < CIK_NUM_VMID; i++)
+		if (priv->free_vmid_mask & (1U << i))
+			set_vmid_pasid_mapping(priv, i, 0);
+
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_NEVER);
+	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, 0);
+}
+
+static struct cik_static_private *kfd_scheduler_to_private(struct kfd_scheduler *scheduler)
+{
+	return (struct cik_static_private *)scheduler;
+}
+
+static struct cik_static_process *kfd_process_to_private(struct kfd_scheduler_process *process)
+{
+	return (struct cik_static_process *)process;
+}
+
+static struct cik_static_queue *kfd_queue_to_private(struct kfd_scheduler_queue *queue)
+{
+	return (struct cik_static_queue *)queue;
+}
+
+static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **scheduler)
+{
+	struct cik_static_private *priv;
+	unsigned int i;
+	int err;
+	void *hpdptr;
+
+	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
+	if (priv == NULL)
+		return -ENOMEM;
+
+	mutex_init(&priv->mutex);
+
+	priv->dev = dev;
+
+	priv->first_pipe = dev->shared_resources.first_compute_pipe;
+	priv->num_pipes = dev->shared_resources.compute_pipe_count;
+
+	for (i = 0; i < priv->num_pipes * CIK_QUEUES_PER_PIPE; i++)
+		__set_bit(i, priv->free_queues);
+
+	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
+
+	/*
+	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
+	 * The driver never accesses this memory after zeroing it. It doesn't even have
+	 * to be saved/restored on suspend/resume because it contains no data when there
+	 * are no active queues.
+	 */
+	err = radeon_kfd_vidmem_alloc(dev,
+				      CIK_HPD_SIZE * priv->num_pipes * 2,
+				      PAGE_SIZE,
+				      KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
+				      &priv->hpd_mem);
+	if (err)
+		goto err_hpd_alloc;
+
+	err = radeon_kfd_vidmem_kmap(dev, priv->hpd_mem, &hpdptr);
+	if (err)
+		goto err_hpd_kmap;
+	memset(hpdptr, 0, CIK_HPD_SIZE * priv->num_pipes);
+	radeon_kfd_vidmem_unkmap(dev, priv->hpd_mem);
+
+	/*
+	 * Allocate memory for all the MQDs.
+	 * These are per-queue data that is hardware owned but with driver init.
+	 * The driver has to copy this data into HQD registers when a
+	 * pipe is (re)activated.
+	 */
+	err = radeon_kfd_vidmem_alloc(dev,
+				      sizeof(struct cik_mqd_padded) * priv->num_pipes * CIK_QUEUES_PER_PIPE,
+				      PAGE_SIZE,
+				      KFD_MEMPOOL_SYSTEM_CACHEABLE,
+				      &priv->mqd_mem);
+	if (err)
+		goto err_mqd_alloc;
+	radeon_kfd_vidmem_kmap(dev, priv->mqd_mem, (void **)&priv->mqds);
+	if (err)
+		goto err_mqd_kmap;
+
+	*scheduler = (struct kfd_scheduler *)priv;
+
+	return 0;
+
+err_mqd_kmap:
+	radeon_kfd_vidmem_free(dev, priv->mqd_mem);
+err_mqd_alloc:
+err_hpd_kmap:
+	radeon_kfd_vidmem_free(dev, priv->hpd_mem);
+err_hpd_alloc:
+	mutex_destroy(&priv->mutex);
+	kfree(priv);
+	return err;
+}
+
+static void cik_static_destroy(struct kfd_scheduler *scheduler)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+
+	radeon_kfd_vidmem_unkmap(priv->dev, priv->mqd_mem);
+	radeon_kfd_vidmem_free(priv->dev, priv->mqd_mem);
+	radeon_kfd_vidmem_free(priv->dev, priv->hpd_mem);
+
+	mutex_destroy(&priv->mutex);
+
+	kfree(priv);
+}
+
+static void cik_static_start(struct kfd_scheduler *scheduler)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+
+	radeon_kfd_vidmem_gpumap(priv->dev, priv->hpd_mem, &priv->hpd_addr);
+	radeon_kfd_vidmem_gpumap(priv->dev, priv->mqd_mem, &priv->mqd_addr);
+
+	init_pipes(priv);
+	init_ats(priv);
+}
+
+static void cik_static_stop(struct kfd_scheduler *scheduler)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+
+	exit_ats(priv);
+
+	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
+	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
+}
+
+static bool allocate_vmid(struct cik_static_private *priv, unsigned int *vmid)
+{
+	bool ok = false;
+
+	mutex_lock(&priv->mutex);
+
+	if (priv->free_vmid_mask != 0) {
+		unsigned int v = __ffs64(priv->free_vmid_mask);
+
+		clear_bit(v, &priv->free_vmid_mask);
+		*vmid = v;
+
+		ok = true;
+	}
+
+	mutex_unlock(&priv->mutex);
+
+	return ok;
+}
+
+static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
+{
+	/* It's okay to race against allocate_vmid because this only adds bits to free_vmid_mask.
+	 * And set_bit/clear_bit are atomic wrt each other. */
+	set_bit(vmid, &priv->free_vmid_mask);
+}
+
+static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
+{
+	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
+
+	/*
+	 * SH_MEM_CONFIG and others need to be programmed differently
+	 * for 32/64-bit processes. And maybe other reasons.
+	 */
+}
+
+static int
+cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process *process,
+			    struct kfd_scheduler_process **scheduler_process)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+
+	struct cik_static_process *hwp;
+
+	hwp = kmalloc(sizeof(*hwp), GFP_KERNEL);
+	if (hwp == NULL)
+		return -ENOMEM;
+
+	if (!allocate_vmid(priv, &hwp->vmid)) {
+		kfree(hwp);
+		return -ENOMEM;
+	}
+
+	hwp->pasid = process->pasid;
+
+	setup_vmid_for_process(priv, hwp);
+
+	*scheduler_process = (struct kfd_scheduler_process *)hwp;
+
+	return 0;
+}
+
+static void cik_static_deregister_process(struct kfd_scheduler *scheduler,
+				struct kfd_scheduler_process *scheduler_process)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+	struct cik_static_process *pp = kfd_process_to_private(scheduler_process);
+
+	release_vmid(priv, pp->vmid);
+	kfree(pp);
+}
+
+static bool allocate_hqd(struct cik_static_private *priv, unsigned int *queue)
+{
+	bool ok = false;
+	unsigned int q;
+
+	mutex_lock(&priv->mutex);
+
+	q = find_first_bit(priv->free_queues, priv->num_pipes * CIK_QUEUES_PER_PIPE);
+
+	if (q != priv->num_pipes * CIK_QUEUES_PER_PIPE) {
+		clear_bit(q, priv->free_queues);
+		*queue = q;
+
+		ok = true;
+	}
+
+	mutex_unlock(&priv->mutex);
+
+	return ok;
+}
+
+static void release_hqd(struct cik_static_private *priv, unsigned int queue)
+{
+	/* It's okay to race against allocate_hqd because this only adds bits to free_queues.
+	 * And set_bit/clear_bit are atomic wrt each other. */
+	set_bit(queue, priv->free_queues);
+}
+
+static void init_mqd(const struct cik_static_queue *queue, const struct cik_static_process *process)
+{
+	struct cik_mqd *mqd = queue->mqd;
+
+	memset(mqd, 0, sizeof(*mqd));
+
+	mqd->header = 0xC0310800;
+	mqd->pipeline_stat_enable = 1;
+	mqd->static_thread_mgmt01[0] = 0xffffffff;
+	mqd->static_thread_mgmt01[1] = 0xffffffff;
+	mqd->static_thread_mgmt23[0] = 0xffffffff;
+	mqd->static_thread_mgmt23[1] = 0xffffffff;
+
+	mqd->queue_state.cp_mqd_base_addr = lower_32(queue->mqd_addr);
+	mqd->queue_state.cp_mqd_base_addr_hi = upper_32(queue->mqd_addr);
+	mqd->queue_state.cp_mqd_control = MQD_CONTROL_PRIV_STATE_EN;
+
+	mqd->queue_state.cp_hqd_pq_base = lower_32((uintptr_t)queue->pq_addr >> 8);
+	mqd->queue_state.cp_hqd_pq_base_hi = upper_32((uintptr_t)queue->pq_addr >> 8);
+	mqd->queue_state.cp_hqd_pq_control = QUEUE_SIZE(queue->queue_size_encoded) | DEFAULT_RPTR_BLOCK_SIZE
+					    | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
+	mqd->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uintptr_t)queue->rptr_address);
+	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uintptr_t)queue->rptr_address);
+	mqd->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(queue->doorbell_index) | DOORBELL_EN;
+	mqd->queue_state.cp_hqd_vmid = process->vmid;
+	mqd->queue_state.cp_hqd_active = 1;
+
+	mqd->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
+
+	/* The values for these 3 are from WinKFD. */
+	mqd->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
+	mqd->queue_state.cp_hqd_pipe_priority = 1;
+	mqd->queue_state.cp_hqd_queue_priority = 15;
+
+	mqd->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
+}
+
+/* Write the HQD registers and activate the queue.
+ * Requires that SRBM_GFX_CNTL has already been programmed for the queue.
+ */
+static void load_hqd(struct cik_static_private *priv, struct cik_static_queue *queue)
+{
+	struct kfd_dev *dev = priv->dev;
+	const struct cik_hqd_registers *qs = &queue->mqd->queue_state;
+
+	WRITE_REG(dev, CP_MQD_BASE_ADDR, qs->cp_mqd_base_addr);
+	WRITE_REG(dev, CP_MQD_BASE_ADDR_HI, qs->cp_mqd_base_addr_hi);
+	WRITE_REG(dev, CP_MQD_CONTROL, qs->cp_mqd_control);
+
+	WRITE_REG(dev, CP_HQD_PQ_BASE, qs->cp_hqd_pq_base);
+	WRITE_REG(dev, CP_HQD_PQ_BASE_HI, qs->cp_hqd_pq_base_hi);
+	WRITE_REG(dev, CP_HQD_PQ_CONTROL, qs->cp_hqd_pq_control);
+	/* DOORBELL_CONTROL before WPTR because WPTR writes are dropped if DOORBELL_HIT is set. */
+	WRITE_REG(dev, CP_HQD_PQ_DOORBELL_CONTROL, qs->cp_hqd_pq_doorbell_control);
+	WRITE_REG(dev, CP_HQD_PQ_WPTR, qs->cp_hqd_pq_wptr);
+	WRITE_REG(dev, CP_HQD_PQ_RPTR, qs->cp_hqd_pq_rptr);
+	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR, qs->cp_hqd_pq_rptr_report_addr);
+	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, qs->cp_hqd_pq_rptr_report_addr_hi);
+
+	WRITE_REG(dev, CP_HQD_VMID, qs->cp_hqd_vmid);
+	WRITE_REG(dev, CP_HQD_PERSISTENT_STATE, qs->cp_hqd_persistent_state);
+	WRITE_REG(dev, CP_HQD_QUANTUM, qs->cp_hqd_quantum);
+	WRITE_REG(dev, CP_HQD_PIPE_PRIORITY, qs->cp_hqd_pipe_priority);
+	WRITE_REG(dev, CP_HQD_QUEUE_PRIORITY, qs->cp_hqd_queue_priority);
+
+	WRITE_REG(dev, CP_HQD_IB_CONTROL, qs->cp_hqd_ib_control);
+	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR, qs->cp_hqd_ib_base_addr);
+	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR_HI, qs->cp_hqd_ib_base_addr_hi);
+	WRITE_REG(dev, CP_HQD_IB_RPTR, qs->cp_hqd_ib_rptr);
+	WRITE_REG(dev, CP_HQD_SEMA_CMD, qs->cp_hqd_sema_cmd);
+	WRITE_REG(dev, CP_HQD_MSG_TYPE, qs->cp_hqd_msg_type);
+	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_LO, qs->cp_hqd_atomic0_preop_lo);
+	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_HI, qs->cp_hqd_atomic0_preop_hi);
+	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_LO, qs->cp_hqd_atomic1_preop_lo);
+	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_HI, qs->cp_hqd_atomic1_preop_hi);
+	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER0, qs->cp_hqd_hq_scheduler0);
+	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER1, qs->cp_hqd_hq_scheduler1);
+
+	WRITE_REG(dev, CP_HQD_ACTIVE, 1);
+}
+
+static void activate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
+{
+	bool wptr_shadow_valid;
+	doorbell_t wptr_shadow;
+
+	/* Avoid sleeping while holding the SRBM lock. */
+	wptr_shadow_valid = !get_user(wptr_shadow, queue->wptr_address);
+
+	lock_srbm_index(priv);
+	queue_select(priv, queue->queue);
+
+	load_hqd(priv, queue);
+
+	/* Doorbell and wptr are special because there is a race when reactivating a queue.
+	 * Since doorbell writes to deactivated queues are ignored by hardware, the application
+	 * shadows the doorbell into memory at queue->wptr_address.
+	 *
+	 * We want the queue to automatically resume processing as if it were always active,
+	 * so we want to copy from queue->wptr_address into the wptr/doorbell.
+	 *
+	 * The race is that the app could write a new wptr into the doorbell before we
+	 * write the shadowed wptr, resulting in an old wptr written later.
+	 *
+	 * The hardware solves this ignoring CP_HQD_WPTR writes after a doorbell write.
+	 * So the KFD can activate the doorbell then write the shadow wptr to CP_HQD_WPTR
+	 * knowing it will be ignored if the user has written a more-recent doorbell.
+	 */
+	if (wptr_shadow_valid)
+		WRITE_REG(priv->dev, CP_HQD_PQ_WPTR, wptr_shadow);
+
+	unlock_srbm_index(priv);
+}
+
+static void drain_hqd(struct cik_static_private *priv)
+{
+	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
+}
+
+static void wait_hqd_inactive(struct cik_static_private *priv)
+{
+	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
+		cpu_relax();
+}
+
+static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
+{
+	lock_srbm_index(priv);
+	queue_select(priv, queue->queue);
+
+	drain_hqd(priv);
+	wait_hqd_inactive(priv);
+
+	unlock_srbm_index(priv);
+}
+
+#define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
+#define RING_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 8))
+#define RWPTR_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 2))
+
+#define MAX_QUEUE_SIZE (1ULL << 32)
+#define MIN_QUEUE_SIZE (1ULL << 10)
+
+static int
+cik_static_create_queue(struct kfd_scheduler *scheduler,
+			struct kfd_scheduler_process *process,
+			struct kfd_scheduler_queue *queue,
+			void __user *ring_address,
+			uint64_t ring_size,
+			void __user *rptr_address,
+			void __user *wptr_address,
+			unsigned int doorbell)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+	struct cik_static_process *hwp = kfd_process_to_private(process);
+	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
+
+	if ((uint64_t)ring_address & RING_ADDRESS_BAD_BIT_MASK
+	    || (uint64_t)rptr_address & RWPTR_ADDRESS_BAD_BIT_MASK
+	    || (uint64_t)wptr_address & RWPTR_ADDRESS_BAD_BIT_MASK)
+		return -EINVAL;
+
+	if (ring_size > MAX_QUEUE_SIZE || ring_size < MIN_QUEUE_SIZE || !is_power_of_2(ring_size))
+		return -EINVAL;
+
+	if (!allocate_hqd(priv, &hwq->queue))
+		return -ENOMEM;
+
+	hwq->mqd_addr = priv->mqd_addr + sizeof(struct cik_mqd_padded) * hwq->queue;
+	hwq->mqd = &priv->mqds[hwq->queue].mqd;
+	hwq->pq_addr = ring_address;
+	hwq->rptr_address = rptr_address;
+	hwq->wptr_address = wptr_address;
+	hwq->doorbell_index = doorbell;
+	hwq->queue_size_encoded = ilog2(ring_size) - 3;
+
+	init_mqd(hwq, hwp);
+	activate_queue(priv, hwq);
+
+	return 0;
+}
+
+static void
+cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_queue *queue)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
+
+	deactivate_queue(priv, hwq);
+
+	release_hqd(priv, hwq->queue);
+}
+
+const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
+	.name = "CIK static scheduler",
+	.create = cik_static_create,
+	.destroy = cik_static_destroy,
+	.start = cik_static_start,
+	.stop = cik_static_stop,
+	.register_process = cik_static_register_process,
+	.deregister_process = cik_static_deregister_process,
+	.queue_size = sizeof(struct cik_static_queue),
+	.create_queue = cik_static_create_queue,
+	.destroy_queue = cik_static_destroy_queue,
+};
diff --git a/drivers/gpu/hsa/radeon/kfd_vidmem.c b/drivers/gpu/hsa/radeon/kfd_vidmem.c
new file mode 100644
index 0000000..c8d3770
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_vidmem.c
@@ -0,0 +1,61 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+
+int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
+				enum kfd_mempool pool, kfd_mem_obj *mem_obj)
+{
+	return kfd2kgd->allocate_mem(kfd->kgd,
+					size,
+					alignment,
+					(enum kgd_memory_pool)pool,
+					(struct kgd_mem **)mem_obj);
+}
+
+void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
+
+int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
+				uint64_t *vmid0_address)
+{
+	return kfd2kgd->gpumap_mem(kfd->kgd,
+					(struct kgd_mem *)mem_obj,
+					vmid0_address);
+}
+
+void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
+
+int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
+{
+	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
+}
+
+void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
+{
+	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
+}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 12/83] hsa/radeon: Add kfd mmap handler
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (8 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 11/83] hsa/radeon: Add scheduler code Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 18:47     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds the kfd mmap handler that maps the physical address
of a doorbell page to a user-space virtual address. That virtual address
belongs to the process that uses the doorbell page.

This mmap handler is called only from within the kernel and not to be
called from user-mode mmap of /dev/kfd.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_chardev.c  | 20 +++++++++
 drivers/gpu/hsa/radeon/kfd_doorbell.c | 85 +++++++++++++++++++++++++++++++++++
 2 files changed, 105 insertions(+)

diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
index 7a56a8f..0b5bc74 100644
--- a/drivers/gpu/hsa/radeon/kfd_chardev.c
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -39,6 +39,7 @@ static const struct file_operations kfd_fops = {
 	.owner = THIS_MODULE,
 	.unlocked_ioctl = kfd_ioctl,
 	.open = kfd_open,
+	.mmap = kfd_mmap,
 };
 
 static int kfd_char_dev_major = -1;
@@ -131,3 +132,22 @@ kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 
 	return err;
 }
+
+static int
+kfd_mmap(struct file *filp, struct vm_area_struct *vma)
+{
+	unsigned long pgoff = vma->vm_pgoff;
+	struct kfd_process *process;
+
+	process = radeon_kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
+	if (pgoff < KFD_MMAP_DOORBELL_START)
+		return -EINVAL;
+
+	if (pgoff < KFD_MMAP_DOORBELL_END)
+		return radeon_kfd_doorbell_mmap(process, vma);
+
+	return -EINVAL;
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
index 79a9d4b..e1d8506 100644
--- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
+++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
@@ -70,3 +70,88 @@ void radeon_kfd_doorbell_init(struct kfd_dev *kfd)
 	kfd->doorbell_process_limit = doorbell_process_limit;
 }
 
+/* This is the /dev/kfd mmap (for doorbell) implementation. We intend that this is only called through map_doorbells,
+** not through user-mode mmap of /dev/kfd. */
+int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
+{
+	unsigned int device_index;
+	struct kfd_dev *dev;
+	phys_addr_t start;
+
+	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
+
+	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
+	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
+		return -EINVAL;
+
+	/* device_index must be GPU ID!! */
+	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
+
+	dev = radeon_kfd_device_by_id(device_index);
+	if (dev == NULL)
+		return -EINVAL;
+
+	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
+	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
+
+	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
+
+	pr_debug("kfd: mapping doorbell page in radeon_kfd_doorbell_mmap\n"
+		 "     target user address == 0x%016llX\n"
+		 "     physical address    == 0x%016llX\n"
+		 "     vm_flags            == 0x%08lX\n"
+		 "     size                == 0x%08lX\n",
+		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
+		 doorbell_process_allocation());
+
+	return io_remap_pfn_range(vma,
+				vma->vm_start,
+				start >> PAGE_SHIFT,
+				doorbell_process_allocation(),
+				vma->vm_page_prot);
+}
+
+/* Map the doorbells for a single process & device. This will indirectly call radeon_kfd_doorbell_mmap.
+** This assumes that the process mutex is being held. */
+static int
+map_doorbells(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev)
+{
+	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, process);
+
+	if (pdd == NULL)
+		return -ENOMEM;
+
+	if (pdd->doorbell_mapping == NULL) {
+		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
+		doorbell_t __user *doorbell_mapping;
+
+		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
+								MAP_SHARED, offset);
+		if (IS_ERR(doorbell_mapping))
+			return PTR_ERR(doorbell_mapping);
+
+		pdd->doorbell_mapping = doorbell_mapping;
+	}
+
+	return 0;
+}
+
+/* Get the user-mode address of a doorbell. Assumes that the process mutex is being held. */
+doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
+					   unsigned int doorbell_index)
+{
+	struct kfd_process_device *pdd;
+	int err;
+
+	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
+
+	err = map_doorbells(devkfd, process, dev);
+	if (err)
+		return ERR_PTR(err);
+
+	pdd = radeon_kfd_get_process_device_data(dev, process);
+	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
+
+	return &pdd->doorbell_mapping[doorbell_index];
+}
+
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (9 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 12/83] hsa/radeon: Add kfd mmap handler Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 19:19     ` Jerome Glisse
                     ` (2 more replies)
  2014-07-10 21:50 ` [PATCH 14/83] hsa/radeon: Update MAINTAINERS and CREDITS files Oded Gabbay
                   ` (15 subsequent siblings)
  26 siblings, 3 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Alexey Skidanov, Ben Goz,
	Evgeny Pinchuk, linux-api

This patch adds 2 new IOCTL to kfd driver.

The first IOCTL is KFD_IOC_CREATE_QUEUE that is used by the user-mode
application to create a compute queue on the GPU.

The second IOCTL is KFD_IOC_DESTROY_QUEUE that is used by the
user-mode application to destroy an existing compute queue on the GPU.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_chardev.c  | 155 ++++++++++++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_doorbell.c |  11 +++
 include/uapi/linux/kfd_ioctl.h        |  69 +++++++++++++++
 3 files changed, 235 insertions(+)
 create mode 100644 include/uapi/linux/kfd_ioctl.h

diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
index 0b5bc74..4e7d5d0 100644
--- a/drivers/gpu/hsa/radeon/kfd_chardev.c
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -27,11 +27,13 @@
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
+#include <uapi/linux/kfd_ioctl.h>
 #include "kfd_priv.h"
 #include "kfd_scheduler.h"
 
 static long kfd_ioctl(struct file *, unsigned int, unsigned long);
 static int kfd_open(struct inode *, struct file *);
+static int kfd_mmap(struct file *, struct vm_area_struct *);
 
 static const char kfd_dev_name[] = "kfd";
 
@@ -108,17 +110,170 @@ kfd_open(struct inode *inode, struct file *filep)
 	return 0;
 }
 
+static long
+kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_create_queue_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	unsigned int queue_id;
+	struct kfd_queue *queue;
+	struct kfd_process_device *pdd;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	dev = radeon_kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	queue = kzalloc(
+		offsetof(struct kfd_queue, scheduler_queue) + dev->device_info->scheduler_class->queue_size,
+		GFP_KERNEL);
+
+	if (!queue)
+		return -ENOMEM;
+
+	queue->dev = dev;
+
+	mutex_lock(&p->mutex);
+
+	pdd = radeon_kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd) < 0) {
+		err = PTR_ERR(pdd);
+		goto err_bind_pasid;
+	}
+
+	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
+			pdd->queue_count,
+			p->pasid,
+			dev->id);
+
+	if (pdd->queue_count++ == 0) {
+		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
+		if (err < 0)
+			goto err_register_process;
+	}
+
+	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
+		goto err_allocate_queue_id;
+
+	err = dev->device_info->scheduler_class->create_queue(dev->scheduler, pdd->scheduler_process,
+							      &queue->scheduler_queue,
+							      (void __user *)args.ring_base_address,
+							      args.ring_size,
+							      (void __user *)args.read_pointer_address,
+							      (void __user *)args.write_pointer_address,
+							      radeon_kfd_queue_id_to_doorbell(dev, p, queue_id));
+	if (err)
+		goto err_create_queue;
+
+	radeon_kfd_install_queue(p, queue_id, queue);
+
+	args.queue_id = queue_id;
+	args.doorbell_address = (uint64_t)(uintptr_t)radeon_kfd_get_doorbell(filep, p, dev, queue_id);
+
+	if (copy_to_user(arg, &args, sizeof(args))) {
+		err = -EFAULT;
+		goto err_copy_args_out;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	pr_debug("kfd: queue id %d was created successfully.\n"
+		 "     ring buffer address == 0x%016llX\n"
+		 "     read ptr address    == 0x%016llX\n"
+		 "     write ptr address   == 0x%016llX\n"
+		 "     doorbell address    == 0x%016llX\n",
+			args.queue_id,
+			args.ring_base_address,
+			args.read_pointer_address,
+			args.write_pointer_address,
+			args.doorbell_address);
+
+	return 0;
+
+err_copy_args_out:
+	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
+err_create_queue:
+	radeon_kfd_remove_queue(p, queue_id);
+err_allocate_queue_id:
+	if (--pdd->queue_count == 0) {
+		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
+		pdd->scheduler_process = NULL;
+	}
+err_register_process:
+err_bind_pasid:
+	kfree(queue);
+	mutex_unlock(&p->mutex);
+	return err;
+}
+
+static int
+kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_destroy_queue_args args;
+	struct kfd_queue *queue;
+	struct kfd_dev *dev;
+	struct kfd_process_device *pdd;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	mutex_lock(&p->mutex);
+
+	queue = radeon_kfd_get_queue(p, args.queue_id);
+	if (!queue) {
+		mutex_unlock(&p->mutex);
+		return -EINVAL;
+	}
+
+	dev = queue->dev;
+
+	pr_debug("kfd: destroying queue id %d for PASID %d\n",
+			args.queue_id,
+			p->pasid);
+
+	radeon_kfd_remove_queue(p, args.queue_id);
+	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
+
+	kfree(queue);
+
+	pdd = radeon_kfd_get_process_device_data(dev, p);
+	BUG_ON(pdd == NULL); /* Because a queue exists. */
+
+	if (--pdd->queue_count == 0) {
+		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
+		pdd->scheduler_process = NULL;
+	}
+
+	mutex_unlock(&p->mutex);
+	return 0;
+}
 
 static long
 kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
+	struct kfd_process *process;
 	long err = -EINVAL;
 
 	dev_info(kfd_device,
 		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
 		 cmd, _IOC_NR(cmd), arg);
 
+	process = radeon_kfd_get_process(current);
+	if (IS_ERR(process))
+		return PTR_ERR(process);
+
 	switch (cmd) {
+	case KFD_IOC_CREATE_QUEUE:
+		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
+		break;
+
+	case KFD_IOC_DESTROY_QUEUE:
+		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
+		break;
+
 	default:
 		dev_err(kfd_device,
 			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
index e1d8506..3de8a02 100644
--- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
+++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
@@ -155,3 +155,14 @@ doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_proce
 	return &pdd->doorbell_mapping[doorbell_index];
 }
 
+/*
+ * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
+ * to doorbells with the process's doorbell page
+ */
+unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
+{
+	/* doorbell_id_offset accounts for doorbells taken by KGD.
+	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts to the process's doorbells */
+	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
+}
+
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
new file mode 100644
index 0000000..dcc5fe0
--- /dev/null
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -0,0 +1,69 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef KFD_IOCTL_H_INCLUDED
+#define KFD_IOCTL_H_INCLUDED
+
+#include <linux/types.h>
+#include <linux/ioctl.h>
+
+#define KFD_IOCTL_CURRENT_VERSION 1
+
+/* The 64-bit ABI is the authoritative version. */
+#pragma pack(push, 8)
+
+struct kfd_ioctl_get_version_args {
+	uint32_t min_supported_version;	/* from KFD */
+	uint32_t max_supported_version;	/* from KFD */
+};
+
+/* For kfd_ioctl_create_queue_args.queue_type. */
+#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
+#define KFD_IOC_QUEUE_TYPE_SDMA      1
+
+struct kfd_ioctl_create_queue_args {
+	uint64_t ring_base_address;	/* to KFD */
+	uint32_t ring_size;		/* to KFD */
+	uint32_t gpu_id;		/* to KFD */
+	uint32_t queue_type;		/* to KFD */
+	uint32_t queue_percentage;	/* to KFD */
+	uint32_t queue_priority;	/* to KFD */
+	uint64_t write_pointer_address;	/* to KFD */
+	uint64_t read_pointer_address;	/* to KFD */
+
+	uint64_t doorbell_address;	/* from KFD */
+	uint32_t queue_id;		/* from KFD */
+};
+
+struct kfd_ioctl_destroy_queue_args {
+	uint32_t queue_id;		/* to KFD */
+};
+
+#define KFD_IOC_MAGIC 'K'
+
+#define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
+#define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
+#define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
+
+#pragma pack(pop)
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 14/83] hsa/radeon: Update MAINTAINERS and CREDITS files
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (10 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 15/83] hsa/radeon: Add interrupt handling module Oded Gabbay
                   ` (14 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Andrew Morton, Geert Uytterhoeven,
	Jean Delvare, Jingoo Han, Jiri Kosina, Joe Perches, Chris Cheney,
	Christoph Lameter, Mauro Carvalho Chehab, Michael Opdenacker,
	Sebastian Reichel, Mikael Pettersson, David S. Miller,
	Greg Kroah-Hartman

Update MAINTAINERS and CREDITS files with kfd driver information

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 CREDITS     | 7 +++++++
 MAINTAINERS | 8 ++++++++
 2 files changed, 15 insertions(+)

diff --git a/CREDITS b/CREDITS
index 03343bf..c5f0aeae 100644
--- a/CREDITS
+++ b/CREDITS
@@ -1197,6 +1197,13 @@ S: R. Tocantins, 89 - Cristo Rei
 S: 80050-430 - Curitiba - Paraná
 S: Brazil
 
+N: Oded Gabbay
+E: oded.gabbay@gmail.com
+D: AMD HSA Radeon (KFD) driver maintainer
+S: 12 Shraga Raphaeli
+S: Petah-Tikva, 4906418
+S: Israel
+
 N: Kumar Gala
 E: galak@kernel.crashing.org
 D: Embedded PowerPC 6xx/7xx/74xx/82xx/83xx/85xx support
diff --git a/MAINTAINERS b/MAINTAINERS
index 3efbeaf..bf1081f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -592,6 +592,14 @@ F:	drivers/crypto/geode*
 F:	drivers/video/fbdev/geode/
 F:	arch/x86/include/asm/geode.h
 
+AMD HSA RADEON DRIVER (KFD)
+M:	Oded Gabbay <oded.gabbay@amd.com>
+L:	dri-devel@lists.freedesktop.org
+S:	Supported
+F:	drivers/gpu/hsa/radeon
+F:	include/linux/radeon_kfd.h
+F:	include/linux/uapi/linux/kfd_ioctl.h
+
 AMD IOMMU (AMD-VI)
 M:	Joerg Roedel <joro@8bytes.org>
 L:	iommu@lists.linux-foundation.org
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 15/83] hsa/radeon: Add interrupt handling module
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (11 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 14/83] hsa/radeon: Update MAINTAINERS and CREDITS files Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-11 19:57     ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 16/83] hsa/radeon: Add the isr function of the KFD scehduler Oded Gabbay
                   ` (13 subsequent siblings)
  26 siblings, 1 reply; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds the interrupt handling module, in kfd_interrupt.c,
and its related members in different data structures to the KFD
driver.

The KFD interrupt module maintains an internal interrupt ring per kfd
device. The internal interrupt ring contains interrupts that needs further
handling.The extra handling is deferred to a later time through a workqueue.

There's no acknowledgment for the interrupts we use. The hardware simply queues a new interrupt each time without waiting.

The fixed-size internal queue means that it's possible for us to lose interrupts because we have no back-pressure to the hardware.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/Makefile        |   2 +-
 drivers/gpu/hsa/radeon/kfd_device.c    |   1 +
 drivers/gpu/hsa/radeon/kfd_interrupt.c | 179 +++++++++++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_priv.h      |  18 ++++
 drivers/gpu/hsa/radeon/kfd_scheduler.h |   3 +
 5 files changed, 202 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c

diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
index 28da10c..5422e6a 100644
--- a/drivers/gpu/hsa/radeon/Makefile
+++ b/drivers/gpu/hsa/radeon/Makefile
@@ -5,6 +5,6 @@
 radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
 		kfd_pasid.o kfd_topology.o kfd_process.o \
 		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
-		kfd_vidmem.o
+		kfd_vidmem.o kfd_interrupt.o
 
 obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
index 465c822..b2d2861 100644
--- a/drivers/gpu/hsa/radeon/kfd_device.c
+++ b/drivers/gpu/hsa/radeon/kfd_device.c
@@ -30,6 +30,7 @@
 static const struct kfd_device_info bonaire_device_info = {
 	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
 	.max_pasid_bits = 16,
+	.ih_ring_entry_size = 4 * sizeof(uint32_t)
 };
 
 struct kfd_deviceid {
diff --git a/drivers/gpu/hsa/radeon/kfd_interrupt.c b/drivers/gpu/hsa/radeon/kfd_interrupt.c
new file mode 100644
index 0000000..2179780
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_interrupt.c
@@ -0,0 +1,179 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/*
+ * KFD Interrupts.
+ *
+ * AMD GPUs deliver interrupts by pushing an interrupt description onto the
+ * interrupt ring and then sending an interrupt. KGD receives the interrupt
+ * in ISR and sends us a pointer to each new entry on the interrupt ring.
+ *
+ * We generally can't process interrupt-signaled events from ISR, so we call
+ * out to each interrupt client module (currently only the scheduler) to ask if
+ * each interrupt is interesting. If they return true, then it requires further
+ * processing so we copy it to an internal interrupt ring and call each
+ * interrupt client again from a work-queue.
+ *
+ * There's no acknowledgment for the interrupts we use. The hardware simply
+ * queues a new interrupt each time without waiting.
+ *
+ * The fixed-size internal queue means that it's possible for us to lose
+ * interrupts because we have no back-pressure to the hardware.
+ */
+
+#include <linux/slab.h>
+#include <linux/device.h>
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+
+#define KFD_INTERRUPT_RING_SIZE 256
+
+static void interrupt_wq(struct work_struct *);
+
+int
+radeon_kfd_interrupt_init(struct kfd_dev *kfd)
+{
+	void *interrupt_ring = kmalloc_array(KFD_INTERRUPT_RING_SIZE,
+					kfd->device_info->ih_ring_entry_size,
+					GFP_KERNEL);
+	if (!interrupt_ring)
+		return -ENOMEM;
+
+	kfd->interrupt_ring = interrupt_ring;
+	kfd->interrupt_ring_size =
+		KFD_INTERRUPT_RING_SIZE * kfd->device_info->ih_ring_entry_size;
+	atomic_set(&kfd->interrupt_ring_wptr, 0);
+	atomic_set(&kfd->interrupt_ring_rptr, 0);
+
+	spin_lock_init(&kfd->interrupt_lock);
+
+	INIT_WORK(&kfd->interrupt_work, interrupt_wq);
+
+	kfd->interrupts_active = true;
+
+	/*
+	 * After this function returns, the interrupt will be enabled. This
+	 * barrier ensures that the interrupt running on a different processor
+	 * sees all the above writes.
+	 */
+	smp_wmb();
+
+	return 0;
+}
+
+void
+radeon_kfd_interrupt_exit(struct kfd_dev *kfd)
+{
+	/*
+	 * Stop the interrupt handler from writing to the ring and scheduling
+	 * workqueue items. The spinlock ensures that any interrupt running
+	 * after we have unlocked sees interrupts_active = false.
+	 */
+	unsigned long flags;
+
+	spin_lock_irqsave(&kfd->interrupt_lock, flags);
+	kfd->interrupts_active = false;
+	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
+
+	/*
+	 * Flush_scheduled_work ensures that there are no outstanding work-queue
+	 * items that will access interrupt_ring. New work items can't be
+	 * created because we stopped interrupt handling above.
+	 */
+	flush_scheduled_work();
+
+	kfree(kfd->interrupt_ring);
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with dequeue_ih_ring_entry.
+ */
+static bool
+enqueue_ih_ring_entry(struct kfd_dev *kfd, const void *ih_ring_entry)
+{
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+
+	if ((rptr - wptr) % kfd->interrupt_ring_size == kfd->device_info->ih_ring_entry_size) {
+		/* This is very bad, the system is likely to hang. */
+		dev_err_ratelimited(radeon_kfd_chardev(),
+			"Interrupt ring overflow, dropping interrupt.\n");
+		return false;
+	}
+
+	memcpy(kfd->interrupt_ring + wptr, ih_ring_entry, kfd->device_info->ih_ring_entry_size);
+	wptr = (wptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
+	smp_wmb(); /* Ensure memcpy'd data is visible before wptr update. */
+	atomic_set(&kfd->interrupt_ring_wptr, wptr);
+
+	return true;
+}
+
+/*
+ * This assumes that it can't be called concurrently with itself
+ * but only with enqueue_ih_ring_entry.
+ */
+static bool
+dequeue_ih_ring_entry(struct kfd_dev *kfd, void *ih_ring_entry)
+{
+	/*
+	 * Assume that wait queues have an implicit barrier, i.e. anything that
+	 * happened in the ISR before it queued work is visible.
+	 */
+
+	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
+	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
+
+	if (rptr == wptr)
+		return false;
+
+	memcpy(ih_ring_entry, kfd->interrupt_ring + rptr, kfd->device_info->ih_ring_entry_size);
+	rptr = (rptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
+	smp_mb(); /* Ensure the rptr write update is not visible until memcpy has finished reading. */
+	atomic_set(&kfd->interrupt_ring_rptr, rptr);
+
+	return true;
+}
+
+static void interrupt_wq(struct work_struct *work)
+{
+	struct kfd_dev *dev = container_of(work, struct kfd_dev, interrupt_work);
+
+	uint32_t ih_ring_entry[DIV_ROUND_UP(dev->device_info->ih_ring_entry_size, sizeof(uint32_t))];
+
+	while (dequeue_ih_ring_entry(dev, ih_ring_entry))
+		dev->device_info->scheduler_class->interrupt_wq(dev->scheduler, ih_ring_entry);
+}
+
+/* This is called directly from KGD at ISR. */
+void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
+{
+	spin_lock(&kfd->interrupt_lock);
+
+	if (kfd->interrupts_active
+	    && kfd->device_info->scheduler_class->interrupt_isr(kfd->scheduler, ih_ring_entry)
+	    && enqueue_ih_ring_entry(kfd, ih_ring_entry))
+		schedule_work(&kfd->interrupt_work);
+
+	spin_unlock(&kfd->interrupt_lock);
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
index 1d1dbcf..5b6611f 100644
--- a/drivers/gpu/hsa/radeon/kfd_priv.h
+++ b/drivers/gpu/hsa/radeon/kfd_priv.h
@@ -28,6 +28,9 @@
 #include <linux/mutex.h>
 #include <linux/radeon_kfd.h>
 #include <linux/types.h>
+#include <linux/atomic.h>
+#include <linux/workqueue.h>
+#include <linux/spinlock.h>
 
 struct kfd_scheduler_class;
 
@@ -63,6 +66,7 @@ typedef u32 doorbell_t;
 struct kfd_device_info {
 	const struct kfd_scheduler_class *scheduler_class;
 	unsigned int max_pasid_bits;
+	size_t ih_ring_entry_size;
 };
 
 struct kfd_dev {
@@ -90,6 +94,15 @@ struct kfd_dev {
 	struct kgd2kfd_shared_resources shared_resources;
 
 	struct kfd_scheduler *scheduler;
+
+	/* Interrupts of interest to KFD are copied from the HW ring into a SW ring. */
+	bool interrupts_active;
+	void *interrupt_ring;
+	size_t interrupt_ring_size;
+	atomic_t interrupt_ring_rptr;
+	atomic_t interrupt_ring_wptr;
+	struct work_struct interrupt_work;
+	spinlock_t interrupt_lock;
 };
 
 /* KGD2KFD callbacks */
@@ -229,4 +242,9 @@ struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
 void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
 uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
 
+/* Interrupts */
+int radeon_kfd_interrupt_init(struct kfd_dev *dev);
+void radeon_kfd_interrupt_exit(struct kfd_dev *dev);
+void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
+
 #endif
diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
index 48a032f..e5a93c4 100644
--- a/drivers/gpu/hsa/radeon/kfd_scheduler.h
+++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
@@ -55,6 +55,9 @@ struct kfd_scheduler_class {
 			    unsigned int doorbell);
 
 	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
+
+	bool (*interrupt_isr)(struct kfd_scheduler *, const void *ih_ring_entry);
+	void (*interrupt_wq)(struct kfd_scheduler *, const void *ih_ring_entry);
 };
 
 extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 16/83] hsa/radeon: Add the isr function of the KFD scehduler
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (12 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 15/83] hsa/radeon: Add interrupt handling module Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 17/83] hsa/radeon: Handle deactivation of queues using interrupts Oded Gabbay
                   ` (12 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds the isr function to the KFD scheduler code. This
function us called from the kgd2kfd_interrupt function which is
an interrupt-context function.

The purpose of the isr function is to determine whether the interrupt
that arrived is interesting, i.e. some action need to be taken.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/cik_int.h              | 50 ++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/cik_regs.h             |  2 +
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 56 +++++++++++++++++++++++++++
 3 files changed, 108 insertions(+)
 create mode 100644 drivers/gpu/hsa/radeon/cik_int.h

diff --git a/drivers/gpu/hsa/radeon/cik_int.h b/drivers/gpu/hsa/radeon/cik_int.h
new file mode 100644
index 0000000..e98551d
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/cik_int.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef HSA_RADEON_CIK_INT_H_INCLUDED
+#define HSA_RADEON_CIK_INT_H_INCLUDED
+
+#include <linux/types.h>
+
+struct cik_ih_ring_entry {
+	uint32_t source_id	: 8;
+	uint32_t reserved1	: 8;
+	uint32_t reserved2	: 16;
+
+	uint32_t data		: 28;
+	uint32_t reserved3	: 4;
+
+	/* pipeid, meid and unused3 are officially called RINGID,
+	 * but for our purposes, they always decode into pipe and ME. */
+	uint32_t pipeid		: 2;
+	uint32_t meid		: 2;
+	uint32_t reserved4	: 4;
+	uint32_t vmid		: 8;
+	uint32_t pasid		: 16;
+
+	uint32_t reserved5;
+};
+
+#define CIK_INTSRC_DEQUEUE_COMPLETE	0xC6
+
+#endif
+
diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
index d0cdc57..ef1d7ab 100644
--- a/drivers/gpu/hsa/radeon/cik_regs.h
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -23,6 +23,8 @@
 #ifndef CIK_REGS_H
 #define CIK_REGS_H
 
+#define IH_VMID_0_LUT					0x3D40u
+
 #define BIF_DOORBELL_CNTL				0x530Cu
 
 #define	SRBM_GFX_CNTL					0xE44
diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
index b986ff9..f86f958 100644
--- a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -25,9 +25,12 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 #include <linux/uaccess.h>
+#include <linux/device.h>
+#include <linux/sched.h>
 #include "kfd_priv.h"
 #include "kfd_scheduler.h"
 #include "cik_regs.h"
+#include "cik_int.h"
 
 /* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
  * The first MEC is ME 1 with the GFX ME as ME 0.
@@ -273,6 +276,8 @@ static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int
 	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
 		cpu_relax();
 	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
+
+	WRITE_REG(priv->dev, IH_VMID_0_LUT + vmid*sizeof(uint32_t), pasid);
 }
 
 static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
@@ -786,6 +791,54 @@ cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_q
 	release_hqd(priv, hwq->queue);
 }
 
+/* Figure out the KFD compute pipe ID for an interrupt ring entry.
+ * Returns true if it's a KFD compute pipe, false otherwise. */
+static bool int_compute_pipe(const struct cik_static_private *priv,
+			     const struct cik_ih_ring_entry *ih_ring_entry,
+			     uint32_t *kfd_pipe)
+{
+	uint32_t pipe_id;
+
+	if (ih_ring_entry->meid == 0) /* Ignore graphics interrupts - compute only. */
+		return false;
+
+	pipe_id = (ih_ring_entry->meid - 1) * CIK_PIPES_PER_MEC + ih_ring_entry->pipeid;
+	if (pipe_id < priv->first_pipe)
+		return false;
+
+	pipe_id -= priv->first_pipe;
+
+	*kfd_pipe = pipe_id;
+
+	return true;
+}
+
+static bool
+cik_static_interrupt_isr(struct kfd_scheduler *scheduler, const void *ih_ring_entry)
+{
+	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
+	const struct cik_ih_ring_entry *ihre = ih_ring_entry;
+	uint32_t source_id = ihre->source_id;
+	uint32_t pipe_id;
+
+	/* We only care about CP interrupts here, they all come with a pipe. */
+	if (!int_compute_pipe(priv, ihre, &pipe_id))
+		return false;
+
+	dev_info(radeon_kfd_chardev(), "INT(ISR): src=%02x, data=0x%x, pipe=%u, vmid=%u, pasid=%u\n",
+		 ihre->source_id, ihre->data, pipe_id, ihre->vmid, ihre->pasid);
+
+	switch (source_id) {
+	default:
+		return false; /* Not interested. */
+	}
+}
+
+static void
+cik_static_interrupt_wq(struct kfd_scheduler *scheduler, const void *ih_ring_entry)
+{
+}
+
 const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 	.name = "CIK static scheduler",
 	.create = cik_static_create,
@@ -797,4 +850,7 @@ const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 	.queue_size = sizeof(struct cik_static_queue),
 	.create_queue = cik_static_create_queue,
 	.destroy_queue = cik_static_destroy_queue,
+
+	.interrupt_isr = cik_static_interrupt_isr,
+	.interrupt_wq = cik_static_interrupt_wq,
 };
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 17/83] hsa/radeon: Handle deactivation of queues using interrupts
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (13 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 16/83] hsa/radeon: Add the isr function of the KFD scehduler Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 18/83] hsa/radeon: Enable interrupts in KFD scheduler Oded Gabbay
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch modifies the scheduler code to use interrupts to handle the
deactivation of queues. We prefer to use interrupts because the
deactivation could take a long time since we need to wait for the
wavefront to finish executing before deactivating the queue.

There is an array of waitqueues, each cell is represents queues for a
specific pipe. When a queue should be deactivated, it is inserted to the
wait queue. The event that triggers the waitqueue is a dequeue-complete
interrupt that arrives through the isr function of the scheduler.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/cik_regs.h             |  1 +
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 45 +++++++++++++++++++++------
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
index ef1d7ab..9c3ce97 100644
--- a/drivers/gpu/hsa/radeon/cik_regs.h
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -166,6 +166,7 @@
 
 #define CP_HQD_DEQUEUE_REQUEST				0xC974
 #define	DEQUEUE_REQUEST_DRAIN				1
+#define		DEQUEUE_INT					(1U << 8)
 
 #define CP_HQD_SEMA_CMD					0xC97Cu
 #define CP_HQD_MSG_TYPE					0xC980u
diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
index f86f958..5d42e88 100644
--- a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -139,6 +139,13 @@ struct cik_static_private {
 	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
 	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
 
+	/*
+	 * Dequeue waits for waves to finish so it could take a long time. We
+	 * defer through an interrupt. dequeue_wait is woken when a dequeue-
+	 * complete interrupt comes for that pipe.
+	 */
+	wait_queue_head_t dequeue_wait[CIK_MAX_PIPES];
+
 	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
 	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
 				 * pipes. This is actually struct cik_mqd_padded. */
@@ -411,6 +418,9 @@ static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **schedul
 
 	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
 
+	for (i = 0; i < priv->num_pipes; i++)
+		init_waitqueue_head(&priv->dequeue_wait[i]);
+
 	/*
 	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
 	 * The driver never accesses this memory after zeroing it. It doesn't even have
@@ -712,15 +722,18 @@ static void activate_queue(struct cik_static_private *priv, struct cik_static_qu
 	unlock_srbm_index(priv);
 }
 
-static void drain_hqd(struct cik_static_private *priv)
+static bool queue_inactive(struct cik_static_private *priv, struct cik_static_queue *queue)
 {
-	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
-}
+	bool inactive;
 
-static void wait_hqd_inactive(struct cik_static_private *priv)
-{
-	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
-		cpu_relax();
+	lock_srbm_index(priv);
+	queue_select(priv, queue->queue);
+
+	inactive = (READ_REG(priv->dev, CP_HQD_ACTIVE) == 0);
+
+	unlock_srbm_index(priv);
+
+	return inactive;
 }
 
 static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
@@ -728,10 +741,12 @@ static void deactivate_queue(struct cik_static_private *priv, struct cik_static_
 	lock_srbm_index(priv);
 	queue_select(priv, queue->queue);
 
-	drain_hqd(priv);
-	wait_hqd_inactive(priv);
+	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN | DEQUEUE_INT);
 
 	unlock_srbm_index(priv);
+
+	wait_event(priv->dequeue_wait[queue->queue/CIK_QUEUES_PER_PIPE],
+		   queue_inactive(priv, queue));
 }
 
 #define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
@@ -791,6 +806,14 @@ cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_q
 	release_hqd(priv, hwq->queue);
 }
 
+static void
+dequeue_int_received(struct cik_static_private *priv, uint32_t pipe_id)
+{
+	/* The waiting threads will check CP_HQD_ACTIVE to see whether their
+	 * queue completed. */
+	wake_up_all(&priv->dequeue_wait[pipe_id]);
+}
+
 /* Figure out the KFD compute pipe ID for an interrupt ring entry.
  * Returns true if it's a KFD compute pipe, false otherwise. */
 static bool int_compute_pipe(const struct cik_static_private *priv,
@@ -829,6 +852,10 @@ cik_static_interrupt_isr(struct kfd_scheduler *scheduler, const void *ih_ring_en
 		 ihre->source_id, ihre->data, pipe_id, ihre->vmid, ihre->pasid);
 
 	switch (source_id) {
+	case CIK_INTSRC_DEQUEUE_COMPLETE:
+		dequeue_int_received(priv, pipe_id);
+		return false; /* Already handled. */
+
 	default:
 		return false; /* Not interested. */
 	}
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 18/83] hsa/radeon: Enable interrupts in KFD scheduler
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (14 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 17/83] hsa/radeon: Handle deactivation of queues using interrupts Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 19/83] hsa/radeon: Enable/Disable KFD interrupt module Oded Gabbay
                   ` (10 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch enables the use of interrupts in the KFD scheduler when the
scheduler performs its initialization.

It also disables the interrupts when the scheduler stops its work.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 28 +++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
index 5d42e88..9add5e5 100644
--- a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -486,6 +486,32 @@ static void cik_static_destroy(struct kfd_scheduler *scheduler)
 	kfree(priv);
 }
 
+static void
+enable_interrupts(struct cik_static_private *priv)
+{
+	unsigned int i;
+
+	lock_srbm_index(priv);
+	for (i = 0; i < priv->num_pipes; i++) {
+		pipe_select(priv, i);
+		WRITE_REG(priv->dev, CPC_INT_CNTL, DEQUEUE_REQUEST_INT_ENABLE);
+	}
+	unlock_srbm_index(priv);
+}
+
+static void
+disable_interrupts(struct cik_static_private *priv)
+{
+	unsigned int i;
+
+	lock_srbm_index(priv);
+	for (i = 0; i < priv->num_pipes; i++) {
+		pipe_select(priv, i);
+		WRITE_REG(priv->dev, CPC_INT_CNTL, 0);
+	}
+	unlock_srbm_index(priv);
+}
+
 static void cik_static_start(struct kfd_scheduler *scheduler)
 {
 	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
@@ -495,6 +521,7 @@ static void cik_static_start(struct kfd_scheduler *scheduler)
 
 	init_pipes(priv);
 	init_ats(priv);
+	enable_interrupts(priv);
 }
 
 static void cik_static_stop(struct kfd_scheduler *scheduler)
@@ -502,6 +529,7 @@ static void cik_static_stop(struct kfd_scheduler *scheduler)
 	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
 
 	exit_ats(priv);
+	disable_interrupts(priv);
 
 	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
 	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 19/83] hsa/radeon: Enable/Disable KFD interrupt module
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (15 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 18/83] hsa/radeon: Enable interrupts in KFD scheduler Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 20/83] hsa/radeon: Add interrupt callback function to kgd2kfd interface Oded Gabbay
                   ` (9 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch add calls to initialize and finalize the KFD interrupt
module.

The calls are done per device initialize/finalize inside the kgd-->kfd
interface.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/cik_regs.h   |  1 +
 drivers/gpu/hsa/radeon/kfd_device.c | 10 ++++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
index 9c3ce97..813cdc4 100644
--- a/drivers/gpu/hsa/radeon/cik_regs.h
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -73,6 +73,7 @@
 #define CP_PQ_WPTR_POLL_CNTL				0xC20C
 #define	WPTR_POLL_EN					(1 << 31)
 
+#define CPC_INT_CNTL					0xC2D0
 #define CP_ME1_PIPE0_INT_CNTL				0xC214
 #define CP_ME1_PIPE1_INT_CNTL				0xC218
 #define CP_ME1_PIPE2_INT_CNTL				0xC21C
diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
index b2d2861..b627e57 100644
--- a/drivers/gpu/hsa/radeon/kfd_device.c
+++ b/drivers/gpu/hsa/radeon/kfd_device.c
@@ -127,6 +127,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 
 	radeon_kfd_doorbell_init(kfd);
 
+	if (radeon_kfd_interrupt_init(kfd))
+		return false;
+
 	if (!device_iommu_pasid_init(kfd))
 		return false;
 
@@ -155,10 +158,13 @@ void kgd2kfd_device_exit(struct kfd_dev *kfd)
 
 	BUG_ON(err != 0);
 
-	if (kfd->init_complete) {
+	if (kfd->init_complete)
 		kfd->device_info->scheduler_class->stop(kfd->scheduler);
-		kfd->device_info->scheduler_class->destroy(kfd->scheduler);
 
+	radeon_kfd_interrupt_exit(kfd);
+
+	if (kfd->init_complete) {
+		kfd->device_info->scheduler_class->destroy(kfd->scheduler);
 		amd_iommu_free_device(kfd->pdev);
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 20/83] hsa/radeon: Add interrupt callback function to kgd2kfd interface
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (16 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 19/83] hsa/radeon: Enable/Disable KFD interrupt module Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 21/83] hsa/radeon: Add kgd-->kfd interfaces for suspend and resume Oded Gabbay
                   ` (8 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds a new callback function to the kgd2kfd interface. The
new callback is for propagating interrupts from radeon driver to the kfd
driver.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_module.c | 1 +
 include/linux/radeon_kfd.h          | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/hsa/radeon/kfd_module.c b/drivers/gpu/hsa/radeon/kfd_module.c
index 6978bc0..ad21c6d 100644
--- a/drivers/gpu/hsa/radeon/kfd_module.c
+++ b/drivers/gpu/hsa/radeon/kfd_module.c
@@ -38,6 +38,7 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.probe		= kgd2kfd_probe,
 	.device_init	= kgd2kfd_device_init,
 	.device_exit	= kgd2kfd_device_exit,
+	.interrupt	= kgd2kfd_interrupt,
 };
 
 bool kgd2kfd_init(unsigned interface_version,
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
index 40b691c..2f4f7c0 100644
--- a/include/linux/radeon_kfd.h
+++ b/include/linux/radeon_kfd.h
@@ -62,6 +62,7 @@ struct kgd2kfd_calls {
 	struct kfd_dev* (*probe)(struct kgd_dev *kgd, struct pci_dev *pdev);
 	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
 	void (*device_exit)(struct kfd_dev *kfd);
+	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
 };
 
 struct kfd2kgd_calls {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 21/83] hsa/radeon: Add kgd-->kfd interfaces for suspend and resume
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (17 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 20/83] hsa/radeon: Add interrupt callback function to kgd2kfd interface Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 22/83] drm/radeon: Add calls to suspend and resume of kfd driver Oded Gabbay
                   ` (7 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

This patch adds two new interfaces to the kgd2kfd structure. Those
interfaces are for doing suspend and resume of a kfd device, when its
matching radeon device does suspend and resume.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/Makefile     |  2 +-
 drivers/gpu/hsa/radeon/kfd_module.c |  2 ++
 drivers/gpu/hsa/radeon/kfd_pm.c     | 43 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_priv.h   |  4 ++++
 include/linux/radeon_kfd.h          |  2 ++
 5 files changed, 52 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/hsa/radeon/kfd_pm.c

diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
index 5422e6a..935f9b7 100644
--- a/drivers/gpu/hsa/radeon/Makefile
+++ b/drivers/gpu/hsa/radeon/Makefile
@@ -5,6 +5,6 @@
 radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
 		kfd_pasid.o kfd_topology.o kfd_process.o \
 		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
-		kfd_vidmem.o kfd_interrupt.o
+		kfd_vidmem.o kfd_interrupt.o kfd_pm.o
 
 obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
diff --git a/drivers/gpu/hsa/radeon/kfd_module.c b/drivers/gpu/hsa/radeon/kfd_module.c
index ad21c6d..a03743a 100644
--- a/drivers/gpu/hsa/radeon/kfd_module.c
+++ b/drivers/gpu/hsa/radeon/kfd_module.c
@@ -39,6 +39,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.device_init	= kgd2kfd_device_init,
 	.device_exit	= kgd2kfd_device_exit,
 	.interrupt	= kgd2kfd_interrupt,
+	.suspend	= kgd2kfd_suspend,
+	.resume		= kgd2kfd_resume,
 };
 
 bool kgd2kfd_init(unsigned interface_version,
diff --git a/drivers/gpu/hsa/radeon/kfd_pm.c b/drivers/gpu/hsa/radeon/kfd_pm.c
new file mode 100644
index 0000000..783311f
--- /dev/null
+++ b/drivers/gpu/hsa/radeon/kfd_pm.c
@@ -0,0 +1,43 @@
+/*
+ * Copyright 2014 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Author: Oded Gabbay
+ */
+
+#include <linux/device.h>
+#include "kfd_priv.h"
+#include "kfd_scheduler.h"
+
+void kgd2kfd_suspend(struct kfd_dev *kfd)
+{
+	BUG_ON(kfd == NULL);
+
+	kfd->device_info->scheduler_class->stop(kfd->scheduler);
+}
+
+int kgd2kfd_resume(struct kfd_dev *kfd)
+{
+	BUG_ON(kfd == NULL);
+
+	kfd->device_info->scheduler_class->start(kfd->scheduler);
+
+	return 0;
+}
diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
index 5b6611f..630d690 100644
--- a/drivers/gpu/hsa/radeon/kfd_priv.h
+++ b/drivers/gpu/hsa/radeon/kfd_priv.h
@@ -247,4 +247,8 @@ int radeon_kfd_interrupt_init(struct kfd_dev *dev);
 void radeon_kfd_interrupt_exit(struct kfd_dev *dev);
 void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
 
+/* Power Management */
+void kgd2kfd_suspend(struct kfd_dev *dev);
+int kgd2kfd_resume(struct kfd_dev *dev);
+
 #endif
diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
index 2f4f7c0..63b7bac 100644
--- a/include/linux/radeon_kfd.h
+++ b/include/linux/radeon_kfd.h
@@ -63,6 +63,8 @@ struct kgd2kfd_calls {
 	bool (*device_init)(struct kfd_dev *kfd, const struct kgd2kfd_shared_resources *gpu_resources);
 	void (*device_exit)(struct kfd_dev *kfd);
 	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
+	void (*suspend)(struct kfd_dev *kfd);
+	int (*resume)(struct kfd_dev *kfd);
 };
 
 struct kfd2kgd_calls {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 22/83] drm/radeon: Add calls to suspend and resume of kfd driver
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (18 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 21/83] hsa/radeon: Add kgd-->kfd interfaces for suspend and resume Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 23/83] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

The radeon driver can suspend and resume its device. For each device it
suspends/resumes, it should inform the kfd about it, so the kfd could
perform relevant actions per that device.

This patch adds the calls to kfd's suspend and resume functions. The
device is passed as an argument.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c        |  7 +++++++
 drivers/gpu/drm/radeon/radeon_kfd.c | 16 ++++++++++++++++
 2 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index e0c8052..b1c50f4 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -138,6 +138,8 @@ static void cik_fini_pg(struct radeon_device *rdev);
 static void cik_fini_cg(struct radeon_device *rdev);
 static void cik_enable_gui_idle_interrupt(struct radeon_device *rdev,
 					  bool enable);
+extern void radeon_kfd_suspend(struct radeon_device *rdev);
+extern int radeon_kfd_resume(struct radeon_device *rdev);
 
 /* get temperature in millidegrees */
 int ci_get_temp(struct radeon_device *rdev)
@@ -8429,6 +8431,10 @@ static int cik_startup(struct radeon_device *rdev)
 	if (r)
 		return r;
 
+	r = radeon_kfd_resume(rdev);
+	if (r)
+		return r;
+
 	return 0;
 }
 
@@ -8477,6 +8483,7 @@ int cik_resume(struct radeon_device *rdev)
  */
 int cik_suspend(struct radeon_device *rdev)
 {
+	radeon_kfd_suspend(rdev);
 	radeon_pm_suspend(rdev);
 	dce6_audio_fini(rdev);
 	radeon_vm_manager_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index 594020e..e3af85b 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -124,6 +124,22 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
 	}
 }
 
+void radeon_kfd_suspend(struct radeon_device *rdev)
+{
+	if (rdev->kfd)
+		kgd2kfd->suspend(rdev->kfd);
+}
+
+int radeon_kfd_resume(struct radeon_device *rdev)
+{
+	int r = 0;
+
+	if (rdev->kfd)
+		r = kgd2kfd->resume(rdev->kfd);
+
+	return r;
+}
+
 static u32 pool_to_domain(enum kgd_memory_pool p)
 {
 	switch (p) {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 23/83] drm/radeon/cik: Don't touch int of pipes 1-7
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (19 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 22/83] drm/radeon: Add calls to suspend and resume of kfd driver Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 24/83] drm/radeon/cik: Call kfd isr function Oded Gabbay
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

HSA radeon driver (kfd) should set interrupts for pipes 1-7.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c | 71 +-------------------------------------------
 1 file changed, 1 insertion(+), 70 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index b1c50f4..803d0cb 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -7272,8 +7272,7 @@ static int cik_irq_init(struct radeon_device *rdev)
 int cik_irq_set(struct radeon_device *rdev)
 {
 	u32 cp_int_cntl;
-	u32 cp_m1p0, cp_m1p1, cp_m1p2, cp_m1p3;
-	u32 cp_m2p0, cp_m2p1, cp_m2p2, cp_m2p3;
+	u32 cp_m1p0;
 	u32 crtc1 = 0, crtc2 = 0, crtc3 = 0, crtc4 = 0, crtc5 = 0, crtc6 = 0;
 	u32 hpd1, hpd2, hpd3, hpd4, hpd5, hpd6;
 	u32 grbm_int_cntl = 0;
@@ -7307,13 +7306,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	dma_cntl1 = RREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET) & ~TRAP_ENABLE;
 
 	cp_m1p0 = RREG32(CP_ME1_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p1 = RREG32(CP_ME1_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p2 = RREG32(CP_ME1_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m1p3 = RREG32(CP_ME1_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p0 = RREG32(CP_ME2_PIPE0_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p1 = RREG32(CP_ME2_PIPE1_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p2 = RREG32(CP_ME2_PIPE2_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
-	cp_m2p3 = RREG32(CP_ME2_PIPE3_INT_CNTL) & ~TIME_STAMP_INT_ENABLE;
 
 	if (rdev->flags & RADEON_IS_IGP)
 		thermal_int = RREG32_SMC(CG_THERMAL_INT_CTRL) &
@@ -7335,33 +7327,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp1 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7378,33 +7343,6 @@ int cik_irq_set(struct radeon_device *rdev)
 			case 0:
 				cp_m1p0 |= TIME_STAMP_INT_ENABLE;
 				break;
-			case 1:
-				cp_m1p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m1p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			default:
-				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
-				break;
-			}
-		} else if (ring->me == 2) {
-			switch (ring->pipe) {
-			case 0:
-				cp_m2p0 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 1:
-				cp_m2p1 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 2:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
-			case 3:
-				cp_m2p2 |= TIME_STAMP_INT_ENABLE;
-				break;
 			default:
 				DRM_DEBUG("si_irq_set: sw int cp2 invalid pipe %d\n", ring->pipe);
 				break;
@@ -7487,13 +7425,6 @@ int cik_irq_set(struct radeon_device *rdev)
 	WREG32(SDMA0_CNTL + SDMA1_REGISTER_OFFSET, dma_cntl1);
 
 	WREG32(CP_ME1_PIPE0_INT_CNTL, cp_m1p0);
-	WREG32(CP_ME1_PIPE1_INT_CNTL, cp_m1p1);
-	WREG32(CP_ME1_PIPE2_INT_CNTL, cp_m1p2);
-	WREG32(CP_ME1_PIPE3_INT_CNTL, cp_m1p3);
-	WREG32(CP_ME2_PIPE0_INT_CNTL, cp_m2p0);
-	WREG32(CP_ME2_PIPE1_INT_CNTL, cp_m2p1);
-	WREG32(CP_ME2_PIPE2_INT_CNTL, cp_m2p2);
-	WREG32(CP_ME2_PIPE3_INT_CNTL, cp_m2p3);
 
 	WREG32(GRBM_INT_CNTL, grbm_int_cntl);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 24/83] drm/radeon/cik: Call kfd isr function
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (20 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 23/83] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 25/83] hsa/radeon: fix the OEMID assignment in kfd_topology Oded Gabbay
                   ` (4 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Christian König

When radeon handles interrupts for cik, propogate this interrupt to kfd.

Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/drm/radeon/cik.c        | 4 ++++
 drivers/gpu/drm/radeon/radeon_kfd.c | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 803d0cb..6f4999a 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -140,6 +140,7 @@ static void cik_enable_gui_idle_interrupt(struct radeon_device *rdev,
 					  bool enable);
 extern void radeon_kfd_suspend(struct radeon_device *rdev);
 extern int radeon_kfd_resume(struct radeon_device *rdev);
+extern void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry);
 
 /* get temperature in millidegrees */
 int ci_get_temp(struct radeon_device *rdev)
@@ -7703,6 +7704,9 @@ restart_ih:
 	while (rptr != wptr) {
 		/* wptr/rptr are in bytes! */
 		ring_index = rptr / 4;
+
+		radeon_kfd_interrupt(rdev, (const void *) &rdev->ih.ring[ring_index]);
+
 		src_id =  le32_to_cpu(rdev->ih.ring[ring_index]) & 0xff;
 		src_data = le32_to_cpu(rdev->ih.ring[ring_index + 1]) & 0xfffffff;
 		ring_id = le32_to_cpu(rdev->ih.ring[ring_index + 2]) & 0xff;
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
index e3af85b..f4cc3c5 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -124,6 +124,12 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
 	}
 }
 
+void radeon_kfd_interrupt(struct radeon_device *rdev, const void *ih_ring_entry)
+{
+	if (rdev->kfd)
+		kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
+}
+
 void radeon_kfd_suspend(struct radeon_device *rdev)
 {
 	if (rdev->kfd)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 25/83] hsa/radeon: fix the OEMID assignment in kfd_topology
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (21 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 24/83] drm/radeon/cik: Call kfd isr function Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50 ` [PATCH 26/83] hsa/radeon: Make binding of process to device permanent Oded Gabbay
                   ` (3 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Evgeny Pinchuk, Oded Gabbay

From: Evgeny Pinchuk <evgeny.pinchuk@amd.com>

The assignment of OEMID from the CRAT table is into a 64 variable. The OEMID is 48bit wide in the CRAT.
This fix makes sure that only 48bit are assigned for the OEMID value from the CRAT table.

Signed-off-by: Evgeny Pinchuk <evgeny.pinchuk@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_crat.h     | 2 ++
 drivers/gpu/hsa/radeon/kfd_topology.c | 4 ++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/kfd_crat.h b/drivers/gpu/hsa/radeon/kfd_crat.h
index 587455d..a374fa3 100644
--- a/drivers/gpu/hsa/radeon/kfd_crat.h
+++ b/drivers/gpu/hsa/radeon/kfd_crat.h
@@ -42,6 +42,8 @@
 #define CRAT_OEMTABLEID_LENGTH	8
 #define CRAT_RESERVED_LENGTH	6
 
+#define CRAT_OEMID_64BIT_MASK ((1ULL << (CRAT_OEMID_LENGTH * 8)) - 1)
+
 struct crat_header {
 	uint32_t	signature;
 	uint32_t	length;
diff --git a/drivers/gpu/hsa/radeon/kfd_topology.c b/drivers/gpu/hsa/radeon/kfd_topology.c
index 6acac25..2ee5444 100644
--- a/drivers/gpu/hsa/radeon/kfd_topology.c
+++ b/drivers/gpu/hsa/radeon/kfd_topology.c
@@ -467,10 +467,10 @@ static int kfd_parse_crat_table(void *crat_image)
 		if (!top_dev) {
 			kfd_release_live_view();
 			return -ENOMEM;
+		}
 	}
-}
 
-	sys_props.platform_id = *((uint64_t *)crat_table->oem_id);
+	sys_props.platform_id = (*((uint64_t *)crat_table->oem_id)) & CRAT_OEMID_64BIT_MASK;
 	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
 	sys_props.platform_rev = crat_table->revision;
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 26/83] hsa/radeon: Make binding of process to device permanent
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (22 preceding siblings ...)
  2014-07-10 21:50 ` [PATCH 25/83] hsa/radeon: fix the OEMID assignment in kfd_topology Oded Gabbay
@ 2014-07-10 21:50 ` Oded Gabbay
  2014-07-10 21:50   ` Oded Gabbay
                   ` (2 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay

From: Andrew Lewycky <Andrew.Lewycky@amd.com>

Permanently bind the process to the device.
The binding survives even when all queues are destroyed.
Process exit and device removal terminate the binding.

Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/kfd_chardev.c | 27 +++------------------------
 drivers/gpu/hsa/radeon/kfd_priv.h    |  3 ---
 drivers/gpu/hsa/radeon/kfd_process.c | 21 ++++++++++-----------
 3 files changed, 13 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
index 4e7d5d0..e0b276d 100644
--- a/drivers/gpu/hsa/radeon/kfd_chardev.c
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -141,20 +141,13 @@ kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *a
 	pdd = radeon_kfd_bind_process_to_device(dev, p);
 	if (IS_ERR(pdd) < 0) {
 		err = PTR_ERR(pdd);
-		goto err_bind_pasid;
+		goto err_bind_process;
 	}
 
-	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
-			pdd->queue_count,
+	pr_debug("kfd: creating queue for PASID %d on GPU 0x%x\n",
 			p->pasid,
 			dev->id);
 
-	if (pdd->queue_count++ == 0) {
-		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
-		if (err < 0)
-			goto err_register_process;
-	}
-
 	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
 		goto err_allocate_queue_id;
 
@@ -198,12 +191,7 @@ err_copy_args_out:
 err_create_queue:
 	radeon_kfd_remove_queue(p, queue_id);
 err_allocate_queue_id:
-	if (--pdd->queue_count == 0) {
-		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
-		pdd->scheduler_process = NULL;
-	}
-err_register_process:
-err_bind_pasid:
+err_bind_process:
 	kfree(queue);
 	mutex_unlock(&p->mutex);
 	return err;
@@ -215,7 +203,6 @@ kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *a
 	struct kfd_ioctl_destroy_queue_args args;
 	struct kfd_queue *queue;
 	struct kfd_dev *dev;
-	struct kfd_process_device *pdd;
 
 	if (copy_from_user(&args, arg, sizeof(args)))
 		return -EFAULT;
@@ -239,14 +226,6 @@ kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *a
 
 	kfree(queue);
 
-	pdd = radeon_kfd_get_process_device_data(dev, p);
-	BUG_ON(pdd == NULL); /* Because a queue exists. */
-
-	if (--pdd->queue_count == 0) {
-		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
-		pdd->scheduler_process = NULL;
-	}
-
 	mutex_unlock(&p->mutex);
 	return 0;
 }
diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
index 630d690..bca9cce 100644
--- a/drivers/gpu/hsa/radeon/kfd_priv.h
+++ b/drivers/gpu/hsa/radeon/kfd_priv.h
@@ -166,9 +166,6 @@ struct kfd_process_device {
 	/* The user-mode address of the doorbell mapping for this device. */
 	doorbell_t __user *doorbell_mapping;
 
-	/* The number of queues created by this process for this device. */
-	uint32_t queue_count;
-
 	/* Scheduler process data for this device. */
 	struct kfd_scheduler_process *scheduler_process;
 
diff --git a/drivers/gpu/hsa/radeon/kfd_process.c b/drivers/gpu/hsa/radeon/kfd_process.c
index 145ee38..f89f855 100644
--- a/drivers/gpu/hsa/radeon/kfd_process.c
+++ b/drivers/gpu/hsa/radeon/kfd_process.c
@@ -120,15 +120,6 @@ destroy_queues(struct kfd_process *p, struct kfd_dev *dev_filter)
 			dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
 
 			kfree(queue);
-
-			BUG_ON(pdd->queue_count == 0);
-			BUG_ON(pdd->scheduler_process == NULL);
-
-			if (--pdd->queue_count == 0) {
-				dev->device_info->scheduler_class->deregister_process(dev->scheduler,
-							pdd->scheduler_process);
-				pdd->scheduler_process = NULL;
-			}
 		}
 	}
 }
@@ -144,6 +135,8 @@ static void free_process(struct kfd_process *p)
 	/* doorbell mappings: automatic */
 
 	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
+		pdd->dev->device_info->scheduler_class->deregister_process(pdd->dev->scheduler, pdd->scheduler_process);
+		pdd->scheduler_process = NULL;
 		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
 		list_del(&pdd->per_device_list);
 		kfree(pdd);
@@ -255,6 +248,12 @@ struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev
 	if (err < 0)
 		return ERR_PTR(err);
 
+	err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
+	if (err < 0) {
+		amd_iommu_unbind_pasid(dev->pdev, p->pasid);
+		return ERR_PTR(err);
+	}
+
 	pdd->bound = true;
 
 	return pdd;
@@ -285,8 +284,8 @@ void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
 
 	destroy_queues(p, dev);
 
-	/* All queues just got destroyed so this should be gone. */
-	BUG_ON(pdd->scheduler_process != NULL);
+	dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
+	pdd->scheduler_process = NULL;
 
 	/*
 	 * Just mark pdd as unbound, because we still need it to call
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 27/83] hsa/radeon: Implement hsaKmtSetMemoryPolicy
@ 2014-07-10 21:50   ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel, dri-devel, John Bridgman, Andrew Lewycky,
	Joerg Roedel, Oded Gabbay, Ben Goz, Evgeny Pinchuk,
	Alexey Skidanov, linux-api

From: Andrew Lewycky <Andrew.Lewycky@amd.com>

This patch adds support in KFD for the hsaKmtSetMemoryPolicy
HSA thunk API call

Signed-off-by: Andrew Lewycky <Andrew.Lewycky@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
---
 drivers/gpu/hsa/radeon/cik_regs.h             |  1 +
 drivers/gpu/hsa/radeon/kfd_chardev.c          | 59 +++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 91 +++++++++++++++++++++++++--
 drivers/gpu/hsa/radeon/kfd_scheduler.h        | 12 ++++
 include/uapi/linux/kfd_ioctl.h                | 13 ++++
 5 files changed, 172 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
index 813cdc4..93f7b34 100644
--- a/drivers/gpu/hsa/radeon/cik_regs.h
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -54,6 +54,7 @@
 #define	APE1_MTYPE(x)					((x) << 7)
 
 /* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
 #define	MTYPE_NONCACHED					3
 
 
diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
index e0b276d..ddaf357 100644
--- a/drivers/gpu/hsa/radeon/kfd_chardev.c
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -231,6 +231,61 @@ kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *a
 }
 
 static long
+kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_set_memory_policy_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	struct kfd_process_device *pdd;
+	enum cache_policy default_policy, alternate_policy;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (args.default_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.default_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	if (args.alternate_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.alternate_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	dev = radeon_kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = radeon_kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd) < 0) {
+		err = PTR_ERR(pdd);
+		goto out;
+	}
+
+	default_policy = (args.default_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			 ? cache_policy_coherent : cache_policy_noncoherent;
+
+	alternate_policy = (args.alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			   ? cache_policy_coherent : cache_policy_noncoherent;
+
+	if (!dev->device_info->scheduler_class->set_cache_policy(dev->scheduler,
+								 pdd->scheduler_process,
+								 default_policy,
+								 alternate_policy,
+								 (void __user *)args.alternate_aperture_base,
+								 args.alternate_aperture_size))
+		err = -EINVAL;
+
+out:
+	mutex_unlock(&p->mutex);
+
+	return err;
+}
+
+
+static long
 kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct kfd_process *process;
@@ -253,6 +308,10 @@ kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
 		break;
 
+	case KFD_IOC_SET_MEMORY_POLICY:
+		err = kfd_ioctl_set_memory_policy(filep, process, (void __user *)arg);
+		break;
+
 	default:
 		dev_err(kfd_device,
 			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
index 9add5e5..3c3e7d6 100644
--- a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -162,6 +162,10 @@ struct cik_static_private {
 struct cik_static_process {
 	unsigned int vmid;
 	pasid_t pasid;
+
+	uint32_t sh_mem_config;
+	uint32_t ape1_base;
+	uint32_t ape1_limit;
 };
 
 struct cik_static_queue {
@@ -346,6 +350,7 @@ static void init_ats(struct cik_static_private *priv)
 
 			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
 			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
+			sh_mem_config |= APE1_MTYPE(MTYPE_NONCACHED);
 
 			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
 
@@ -562,14 +567,26 @@ static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
 	set_bit(vmid, &priv->free_vmid_mask);
 }
 
+static void program_sh_mem_settings(struct cik_static_private *sched,
+				    struct cik_static_process *proc)
+{
+	lock_srbm_index(sched);
+
+	vmid_select(sched, proc->vmid);
+
+	WRITE_REG(sched->dev, SH_MEM_CONFIG, proc->sh_mem_config);
+
+	WRITE_REG(sched->dev, SH_MEM_APE1_BASE, proc->ape1_base);
+	WRITE_REG(sched->dev, SH_MEM_APE1_LIMIT, proc->ape1_limit);
+
+	unlock_srbm_index(sched);
+}
+
 static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
 {
 	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
 
-	/*
-	 * SH_MEM_CONFIG and others need to be programmed differently
-	 * for 32/64-bit processes. And maybe other reasons.
-	 */
+	program_sh_mem_settings(priv, p);
 }
 
 static int
@@ -591,6 +608,12 @@ cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process
 
 	hwp->pasid = process->pasid;
 
+	hwp->sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			     | DEFAULT_MTYPE(MTYPE_NONCACHED)
+			     | APE1_MTYPE(MTYPE_NONCACHED);
+	hwp->ape1_base = 1;
+	hwp->ape1_limit = 0;
+
 	setup_vmid_for_process(priv, hwp);
 
 	*scheduler_process = (struct kfd_scheduler_process *)hwp;
@@ -894,6 +917,64 @@ cik_static_interrupt_wq(struct kfd_scheduler *scheduler, const void *ih_ring_ent
 {
 }
 
+/* Low bits must be 0000/FFFF as required by HW, high bits must be 0 to stay in user mode. */
+#define APE1_FIXED_BITS_MASK 0xFFFF80000000FFFFULL
+#define APE1_LIMIT_ALIGNMENT 0xFFFF /* APE1 limit is inclusive and 64K aligned. */
+
+static bool cik_static_set_cache_policy(struct kfd_scheduler *scheduler,
+					struct kfd_scheduler_process *process,
+					enum cache_policy default_policy,
+					enum cache_policy alternate_policy,
+					void __user *alternate_aperture_base,
+					uint64_t alternate_aperture_size)
+{
+	struct cik_static_private *sched = kfd_scheduler_to_private(scheduler);
+	struct cik_static_process *proc = kfd_process_to_private(process);
+
+	uint32_t default_mtype;
+	uint32_t ape1_mtype;
+
+	if (alternate_aperture_size == 0) {
+		/* base > limit disables APE1 */
+		proc->ape1_base = 1;
+		proc->ape1_limit = 0;
+	} else {
+		/*
+		 * In FSA64, APE1_Base[63:0] = { 16{SH_MEM_APE1_BASE[31]}, SH_MEM_APE1_BASE[31:0], 0x0000 }
+		 * APE1_Limit[63:0] = { 16{SH_MEM_APE1_LIMIT[31]}, SH_MEM_APE1_LIMIT[31:0], 0xFFFF }
+		 * Verify that the base and size parameters can be represented in this format
+		 * and convert them. Additionally restrict APE1 to user-mode addresses.
+		 */
+
+		uint64_t base = (uintptr_t)alternate_aperture_base;
+		uint64_t limit = base + alternate_aperture_size - 1;
+
+		if (limit <= base)
+			return false;
+
+		if ((base & APE1_FIXED_BITS_MASK) != 0)
+			return false;
+
+		if ((limit & APE1_FIXED_BITS_MASK) != APE1_LIMIT_ALIGNMENT)
+			return false;
+
+		proc->ape1_base = base >> 16;
+		proc->ape1_limit = limit >> 16;
+	}
+
+	default_mtype = (default_policy == cache_policy_coherent) ? MTYPE_NONCACHED : MTYPE_CACHED;
+	ape1_mtype = (alternate_policy == cache_policy_coherent) ? MTYPE_NONCACHED : MTYPE_CACHED;
+
+	proc->sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			      | DEFAULT_MTYPE(default_mtype)
+			      | APE1_MTYPE(ape1_mtype);
+
+	program_sh_mem_settings(sched, proc);
+
+	return true;
+}
+
+
 const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 	.name = "CIK static scheduler",
 	.create = cik_static_create,
@@ -908,4 +989,6 @@ const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 
 	.interrupt_isr = cik_static_interrupt_isr,
 	.interrupt_wq = cik_static_interrupt_wq,
+
+	.set_cache_policy = cik_static_set_cache_policy,
 };
diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
index e5a93c4..9dc2994 100644
--- a/drivers/gpu/hsa/radeon/kfd_scheduler.h
+++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
@@ -31,6 +31,11 @@ struct kfd_scheduler;
 struct kfd_scheduler_process;
 struct kfd_scheduler_queue;
 
+enum cache_policy {
+	cache_policy_coherent,
+	cache_policy_noncoherent
+};
+
 struct kfd_scheduler_class {
 	const char *name;
 
@@ -58,6 +63,13 @@ struct kfd_scheduler_class {
 
 	bool (*interrupt_isr)(struct kfd_scheduler *, const void *ih_ring_entry);
 	void (*interrupt_wq)(struct kfd_scheduler *, const void *ih_ring_entry);
+
+	bool (*set_cache_policy)(struct kfd_scheduler *scheduler,
+				 struct kfd_scheduler_process *process,
+				 enum cache_policy default_policy,
+				 enum cache_policy alternate_policy,
+				 void __user *alternate_aperture_base,
+				 uint64_t alternate_aperture_size);
 };
 
 extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index dcc5fe0..928e628 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -58,11 +58,24 @@ struct kfd_ioctl_destroy_queue_args {
 	uint32_t queue_id;		/* to KFD */
 };
 
+/* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
+#define KFD_IOC_CACHE_POLICY_COHERENT 0
+#define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
+
+struct kfd_ioctl_set_memory_policy_args {
+	uint32_t gpu_id;			/* to KFD */
+	uint32_t default_policy;		/* to KFD */
+	uint32_t alternate_policy;		/* to KFD */
+	uint64_t alternate_aperture_base;	/* to KFD */
+	uint64_t alternate_aperture_size;	/* to KFD */
+};
+
 #define KFD_IOC_MAGIC 'K'
 
 #define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
 #define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
 #define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
+#define KFD_IOC_SET_MEMORY_POLICY	_IOW(KFD_IOC_MAGIC, 4, struct kfd_ioctl_set_memory_policy_args)
 
 #pragma pack(pop)
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 116+ messages in thread

* [PATCH 27/83] hsa/radeon: Implement hsaKmtSetMemoryPolicy
@ 2014-07-10 21:50   ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-10 21:50 UTC (permalink / raw)
  To: David Airlie, Alex Deucher, Jerome Glisse
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, John Bridgman,
	Andrew Lewycky, Joerg Roedel, Oded Gabbay, Ben Goz,
	Evgeny Pinchuk, Alexey Skidanov,
	linux-api-u79uwXL29TY76Z2rM5mHXA

From: Andrew Lewycky <Andrew.Lewycky-5C7GfCeVMHo@public.gmane.org>

This patch adds support in KFD for the hsaKmtSetMemoryPolicy
HSA thunk API call

Signed-off-by: Andrew Lewycky <Andrew.Lewycky-5C7GfCeVMHo@public.gmane.org>
Signed-off-by: Oded Gabbay <oded.gabbay-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/hsa/radeon/cik_regs.h             |  1 +
 drivers/gpu/hsa/radeon/kfd_chardev.c          | 59 +++++++++++++++++
 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 91 +++++++++++++++++++++++++--
 drivers/gpu/hsa/radeon/kfd_scheduler.h        | 12 ++++
 include/uapi/linux/kfd_ioctl.h                | 13 ++++
 5 files changed, 172 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
index 813cdc4..93f7b34 100644
--- a/drivers/gpu/hsa/radeon/cik_regs.h
+++ b/drivers/gpu/hsa/radeon/cik_regs.h
@@ -54,6 +54,7 @@
 #define	APE1_MTYPE(x)					((x) << 7)
 
 /* valid for both DEFAULT_MTYPE and APE1_MTYPE */
+#define	MTYPE_CACHED					0
 #define	MTYPE_NONCACHED					3
 
 
diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
index e0b276d..ddaf357 100644
--- a/drivers/gpu/hsa/radeon/kfd_chardev.c
+++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
@@ -231,6 +231,61 @@ kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *a
 }
 
 static long
+kfd_ioctl_set_memory_policy(struct file *filep, struct kfd_process *p, void __user *arg)
+{
+	struct kfd_ioctl_set_memory_policy_args args;
+	struct kfd_dev *dev;
+	int err = 0;
+	struct kfd_process_device *pdd;
+	enum cache_policy default_policy, alternate_policy;
+
+	if (copy_from_user(&args, arg, sizeof(args)))
+		return -EFAULT;
+
+	if (args.default_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.default_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	if (args.alternate_policy != KFD_IOC_CACHE_POLICY_COHERENT
+	    && args.alternate_policy != KFD_IOC_CACHE_POLICY_NONCOHERENT) {
+		return -EINVAL;
+	}
+
+	dev = radeon_kfd_device_by_id(args.gpu_id);
+	if (dev == NULL)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = radeon_kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd) < 0) {
+		err = PTR_ERR(pdd);
+		goto out;
+	}
+
+	default_policy = (args.default_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			 ? cache_policy_coherent : cache_policy_noncoherent;
+
+	alternate_policy = (args.alternate_policy == KFD_IOC_CACHE_POLICY_COHERENT)
+			   ? cache_policy_coherent : cache_policy_noncoherent;
+
+	if (!dev->device_info->scheduler_class->set_cache_policy(dev->scheduler,
+								 pdd->scheduler_process,
+								 default_policy,
+								 alternate_policy,
+								 (void __user *)args.alternate_aperture_base,
+								 args.alternate_aperture_size))
+		err = -EINVAL;
+
+out:
+	mutex_unlock(&p->mutex);
+
+	return err;
+}
+
+
+static long
 kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 {
 	struct kfd_process *process;
@@ -253,6 +308,10 @@ kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
 		break;
 
+	case KFD_IOC_SET_MEMORY_POLICY:
+		err = kfd_ioctl_set_memory_policy(filep, process, (void __user *)arg);
+		break;
+
 	default:
 		dev_err(kfd_device,
 			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
index 9add5e5..3c3e7d6 100644
--- a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
+++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
@@ -162,6 +162,10 @@ struct cik_static_private {
 struct cik_static_process {
 	unsigned int vmid;
 	pasid_t pasid;
+
+	uint32_t sh_mem_config;
+	uint32_t ape1_base;
+	uint32_t ape1_limit;
 };
 
 struct cik_static_queue {
@@ -346,6 +350,7 @@ static void init_ats(struct cik_static_private *priv)
 
 			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
 			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
+			sh_mem_config |= APE1_MTYPE(MTYPE_NONCACHED);
 
 			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
 
@@ -562,14 +567,26 @@ static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
 	set_bit(vmid, &priv->free_vmid_mask);
 }
 
+static void program_sh_mem_settings(struct cik_static_private *sched,
+				    struct cik_static_process *proc)
+{
+	lock_srbm_index(sched);
+
+	vmid_select(sched, proc->vmid);
+
+	WRITE_REG(sched->dev, SH_MEM_CONFIG, proc->sh_mem_config);
+
+	WRITE_REG(sched->dev, SH_MEM_APE1_BASE, proc->ape1_base);
+	WRITE_REG(sched->dev, SH_MEM_APE1_LIMIT, proc->ape1_limit);
+
+	unlock_srbm_index(sched);
+}
+
 static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
 {
 	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
 
-	/*
-	 * SH_MEM_CONFIG and others need to be programmed differently
-	 * for 32/64-bit processes. And maybe other reasons.
-	 */
+	program_sh_mem_settings(priv, p);
 }
 
 static int
@@ -591,6 +608,12 @@ cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process
 
 	hwp->pasid = process->pasid;
 
+	hwp->sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			     | DEFAULT_MTYPE(MTYPE_NONCACHED)
+			     | APE1_MTYPE(MTYPE_NONCACHED);
+	hwp->ape1_base = 1;
+	hwp->ape1_limit = 0;
+
 	setup_vmid_for_process(priv, hwp);
 
 	*scheduler_process = (struct kfd_scheduler_process *)hwp;
@@ -894,6 +917,64 @@ cik_static_interrupt_wq(struct kfd_scheduler *scheduler, const void *ih_ring_ent
 {
 }
 
+/* Low bits must be 0000/FFFF as required by HW, high bits must be 0 to stay in user mode. */
+#define APE1_FIXED_BITS_MASK 0xFFFF80000000FFFFULL
+#define APE1_LIMIT_ALIGNMENT 0xFFFF /* APE1 limit is inclusive and 64K aligned. */
+
+static bool cik_static_set_cache_policy(struct kfd_scheduler *scheduler,
+					struct kfd_scheduler_process *process,
+					enum cache_policy default_policy,
+					enum cache_policy alternate_policy,
+					void __user *alternate_aperture_base,
+					uint64_t alternate_aperture_size)
+{
+	struct cik_static_private *sched = kfd_scheduler_to_private(scheduler);
+	struct cik_static_process *proc = kfd_process_to_private(process);
+
+	uint32_t default_mtype;
+	uint32_t ape1_mtype;
+
+	if (alternate_aperture_size == 0) {
+		/* base > limit disables APE1 */
+		proc->ape1_base = 1;
+		proc->ape1_limit = 0;
+	} else {
+		/*
+		 * In FSA64, APE1_Base[63:0] = { 16{SH_MEM_APE1_BASE[31]}, SH_MEM_APE1_BASE[31:0], 0x0000 }
+		 * APE1_Limit[63:0] = { 16{SH_MEM_APE1_LIMIT[31]}, SH_MEM_APE1_LIMIT[31:0], 0xFFFF }
+		 * Verify that the base and size parameters can be represented in this format
+		 * and convert them. Additionally restrict APE1 to user-mode addresses.
+		 */
+
+		uint64_t base = (uintptr_t)alternate_aperture_base;
+		uint64_t limit = base + alternate_aperture_size - 1;
+
+		if (limit <= base)
+			return false;
+
+		if ((base & APE1_FIXED_BITS_MASK) != 0)
+			return false;
+
+		if ((limit & APE1_FIXED_BITS_MASK) != APE1_LIMIT_ALIGNMENT)
+			return false;
+
+		proc->ape1_base = base >> 16;
+		proc->ape1_limit = limit >> 16;
+	}
+
+	default_mtype = (default_policy == cache_policy_coherent) ? MTYPE_NONCACHED : MTYPE_CACHED;
+	ape1_mtype = (alternate_policy == cache_policy_coherent) ? MTYPE_NONCACHED : MTYPE_CACHED;
+
+	proc->sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED)
+			      | DEFAULT_MTYPE(default_mtype)
+			      | APE1_MTYPE(ape1_mtype);
+
+	program_sh_mem_settings(sched, proc);
+
+	return true;
+}
+
+
 const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 	.name = "CIK static scheduler",
 	.create = cik_static_create,
@@ -908,4 +989,6 @@ const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
 
 	.interrupt_isr = cik_static_interrupt_isr,
 	.interrupt_wq = cik_static_interrupt_wq,
+
+	.set_cache_policy = cik_static_set_cache_policy,
 };
diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
index e5a93c4..9dc2994 100644
--- a/drivers/gpu/hsa/radeon/kfd_scheduler.h
+++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
@@ -31,6 +31,11 @@ struct kfd_scheduler;
 struct kfd_scheduler_process;
 struct kfd_scheduler_queue;
 
+enum cache_policy {
+	cache_policy_coherent,
+	cache_policy_noncoherent
+};
+
 struct kfd_scheduler_class {
 	const char *name;
 
@@ -58,6 +63,13 @@ struct kfd_scheduler_class {
 
 	bool (*interrupt_isr)(struct kfd_scheduler *, const void *ih_ring_entry);
 	void (*interrupt_wq)(struct kfd_scheduler *, const void *ih_ring_entry);
+
+	bool (*set_cache_policy)(struct kfd_scheduler *scheduler,
+				 struct kfd_scheduler_process *process,
+				 enum cache_policy default_policy,
+				 enum cache_policy alternate_policy,
+				 void __user *alternate_aperture_base,
+				 uint64_t alternate_aperture_size);
 };
 
 extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index dcc5fe0..928e628 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -58,11 +58,24 @@ struct kfd_ioctl_destroy_queue_args {
 	uint32_t queue_id;		/* to KFD */
 };
 
+/* For kfd_ioctl_set_memory_policy_args.default_policy and alternate_policy */
+#define KFD_IOC_CACHE_POLICY_COHERENT 0
+#define KFD_IOC_CACHE_POLICY_NONCOHERENT 1
+
+struct kfd_ioctl_set_memory_policy_args {
+	uint32_t gpu_id;			/* to KFD */
+	uint32_t default_policy;		/* to KFD */
+	uint32_t alternate_policy;		/* to KFD */
+	uint64_t alternate_aperture_base;	/* to KFD */
+	uint64_t alternate_aperture_size;	/* to KFD */
+};
+
 #define KFD_IOC_MAGIC 'K'
 
 #define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
 #define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
 #define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
+#define KFD_IOC_SET_MEMORY_POLICY	_IOW(KFD_IOC_MAGIC, 4, struct kfd_ioctl_set_memory_policy_args)
 
 #pragma pack(pop)
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 116+ messages in thread

* Re: [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
  2014-07-10 21:50 ` [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface Oded Gabbay
@ 2014-07-10 22:38     ` Joe Perches
  0 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-10 22:38 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, Jerome Glisse, linux-kernel,
	dri-devel, John Bridgman, Andrew Lewycky, Joerg Roedel,
	Oded Gabbay, Christian König

On Fri, 2014-07-11 at 00:50 +0300, Oded Gabbay wrote:
> This patch adds the interface between the radeon driver and the kfd
> driver. The interface implementation is contained in
> radeon_kfd.c and radeon_kfd.h.
[]
>  include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++

Is there a good reason to put this file in include/linux?



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
@ 2014-07-10 22:38     ` Joe Perches
  0 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-10 22:38 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, 2014-07-11 at 00:50 +0300, Oded Gabbay wrote:
> This patch adds the interface between the radeon driver and the kfd
> driver. The interface implementation is contained in
> radeon_kfd.c and radeon_kfd.h.
[]
>  include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++

Is there a good reason to put this file in include/linux?

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
@ 2014-07-11 16:05   ` Jerome Glisse
  2014-07-10 21:50 ` [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface Oded Gabbay
                     ` (25 subsequent siblings)
  26 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:05 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
> To support HSA on KV, we need to limit the number of vmids and pipes
> that are available for radeon's use with KV.
> 
> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
> 0-7) and also makes radeon thinks that KV has only a single MEC with a single
> pipe in it
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

> ---
>  drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index 4bfc2c0..e0c8052 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
>  	/*
>  	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>  	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
> +	 * Nonetheless, we assign only 1 pipe because all other pipes will
> +	 * be handled by KFD
>  	 */
> -	if (rdev->family == CHIP_KAVERI)
> -		rdev->mec.num_mec = 2;
> -	else
> -		rdev->mec.num_mec = 1;
> -	rdev->mec.num_pipe = 4;
> +	rdev->mec.num_mec = 1;
> +	rdev->mec.num_pipe = 1;
>  	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>  
>  	if (rdev->mec.hpd_eop_obj == NULL) {
> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
>  
>  	/* init the pipes */
>  	mutex_lock(&rdev->srbm_mutex);
> -	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
> -		int me = (i < 4) ? 1 : 2;
> -		int pipe = (i < 4) ? i : (i - 4);
>  
> -		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
> +	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>  
> -		cik_srbm_select(rdev, me, pipe, 0, 0);
> +	cik_srbm_select(rdev, 0, 0, 0, 0);
>  
> -		/* write the EOP addr */
> -		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
> -		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
> +	/* write the EOP addr */
> +	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
> +	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>  
> -		/* set the VMID assigned */
> -		WREG32(CP_HPD_EOP_VMID, 0);
> +	/* set the VMID assigned */
> +	WREG32(CP_HPD_EOP_VMID, 0);
> +
> +	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
> +	tmp = RREG32(CP_HPD_EOP_CONTROL);
> +	tmp &= ~EOP_SIZE_MASK;
> +	tmp |= order_base_2(MEC_HPD_SIZE / 8);
> +	WREG32(CP_HPD_EOP_CONTROL, tmp);
>  
> -		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
> -		tmp = RREG32(CP_HPD_EOP_CONTROL);
> -		tmp &= ~EOP_SIZE_MASK;
> -		tmp |= order_base_2(MEC_HPD_SIZE / 8);
> -		WREG32(CP_HPD_EOP_CONTROL, tmp);
> -	}
> -	cik_srbm_select(rdev, 0, 0, 0, 0);
>  	mutex_unlock(&rdev->srbm_mutex);
>  
>  	/* init the queues.  Just two for now. */
> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>   */
>  int cik_vm_init(struct radeon_device *rdev)
>  {
> -	/* number of VMs */
> -	rdev->vm_manager.nvm = 16;
> +	/*
> +	 * number of VMs
> +	 * VMID 0 is reserved for Graphics
> +	 * radeon compute will use VMIDs 1-7
> +	 * KFD will use VMIDs 8-15
> +	 */
> +	rdev->vm_manager.nvm = 8;
>  	/* base offset of vram pages */
>  	if (rdev->flags & RADEON_IS_IGP) {
>  		u64 tmp = RREG32(MC_VM_FB_OFFSET);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-11 16:05   ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:05 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
> To support HSA on KV, we need to limit the number of vmids and pipes
> that are available for radeon's use with KV.
> 
> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
> 0-7) and also makes radeon thinks that KV has only a single MEC with a single
> pipe in it
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

> ---
>  drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
>  1 file changed, 24 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
> index 4bfc2c0..e0c8052 100644
> --- a/drivers/gpu/drm/radeon/cik.c
> +++ b/drivers/gpu/drm/radeon/cik.c
> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
>  	/*
>  	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>  	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
> +	 * Nonetheless, we assign only 1 pipe because all other pipes will
> +	 * be handled by KFD
>  	 */
> -	if (rdev->family == CHIP_KAVERI)
> -		rdev->mec.num_mec = 2;
> -	else
> -		rdev->mec.num_mec = 1;
> -	rdev->mec.num_pipe = 4;
> +	rdev->mec.num_mec = 1;
> +	rdev->mec.num_pipe = 1;
>  	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>  
>  	if (rdev->mec.hpd_eop_obj == NULL) {
> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
>  
>  	/* init the pipes */
>  	mutex_lock(&rdev->srbm_mutex);
> -	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
> -		int me = (i < 4) ? 1 : 2;
> -		int pipe = (i < 4) ? i : (i - 4);
>  
> -		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
> +	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>  
> -		cik_srbm_select(rdev, me, pipe, 0, 0);
> +	cik_srbm_select(rdev, 0, 0, 0, 0);
>  
> -		/* write the EOP addr */
> -		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
> -		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
> +	/* write the EOP addr */
> +	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
> +	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>  
> -		/* set the VMID assigned */
> -		WREG32(CP_HPD_EOP_VMID, 0);
> +	/* set the VMID assigned */
> +	WREG32(CP_HPD_EOP_VMID, 0);
> +
> +	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
> +	tmp = RREG32(CP_HPD_EOP_CONTROL);
> +	tmp &= ~EOP_SIZE_MASK;
> +	tmp |= order_base_2(MEC_HPD_SIZE / 8);
> +	WREG32(CP_HPD_EOP_CONTROL, tmp);
>  
> -		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
> -		tmp = RREG32(CP_HPD_EOP_CONTROL);
> -		tmp &= ~EOP_SIZE_MASK;
> -		tmp |= order_base_2(MEC_HPD_SIZE / 8);
> -		WREG32(CP_HPD_EOP_CONTROL, tmp);
> -	}
> -	cik_srbm_select(rdev, 0, 0, 0, 0);
>  	mutex_unlock(&rdev->srbm_mutex);
>  
>  	/* init the queues.  Just two for now. */
> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>   */
>  int cik_vm_init(struct radeon_device *rdev)
>  {
> -	/* number of VMs */
> -	rdev->vm_manager.nvm = 16;
> +	/*
> +	 * number of VMs
> +	 * VMID 0 is reserved for Graphics
> +	 * radeon compute will use VMIDs 1-7
> +	 * KFD will use VMIDs 8-15
> +	 */
> +	rdev->vm_manager.nvm = 8;
>  	/* base offset of vram pages */
>  	if (rdev->flags & RADEON_IS_IGP) {
>  		u64 tmp = RREG32(MC_VM_FB_OFFSET);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd
  2014-07-10 21:50 ` [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd Oded Gabbay
@ 2014-07-11 16:16     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:16 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:03AM +0300, Oded Gabbay wrote:
> Radeon and KFD share the doorbell aperture.
> Radeon sets it up, takes the doorbells required for its own rings
> and reports the setup to KFD.
> Radeon reserved doorbells are at the start of the doorbell aperture.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

I would need some refreshing on doorbell. You want to map it to userspace
but at the same time it is use by the radeon kernel driver when dispatching
on the compute ring (iirc the gfx ring does not use it).

So now my worry is, given usermapping is done on page granularity, what
would block one process from writting to another process doorbell ? Again
iirc the doorbell is actualy the wptr for the ring buffer associated with
said doorbell (thought i forget how doorbell are associated with a ring).

This sounds really bad.

Cheers,
Jérôme

> ---
>  drivers/gpu/drm/radeon/radeon.h        |  4 ++++
>  drivers/gpu/drm/radeon/radeon_device.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 7cda75d..4e7e41f 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -676,6 +676,10 @@ struct radeon_doorbell {
>  
>  int radeon_doorbell_get(struct radeon_device *rdev, u32 *page);
>  void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
> +void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
> +				  phys_addr_t *aperture_base,
> +				  size_t *aperture_size,
> +				  size_t *start_offset);
>  
>  /*
>   * IRQS.
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 03686fa..98538d2 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -328,6 +328,37 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell)
>  		__clear_bit(doorbell, rdev->doorbell.used);
>  }
>  
> +/**
> + * radeon_doorbell_get_kfd_info - Report doorbell configuration required to
> + *                                setup KFD
> + *
> + * @rdev: radeon_device pointer
> + * @aperture_base: output returning doorbell aperture base physical address
> + * @aperture_size: output returning doorbell aperture size in bytes
> + * @start_offset: output returning # of doorbell bytes reserved for radeon.
> + *
> + * Radeon and the KFD share the doorbell aperture. Radeon sets it up,
> + * takes doorbells required for its own rings and reports the setup to KFD.
> + * Radeon reserved doorbells are at the start of the doorbell aperture.
> + */
> +void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
> +				  phys_addr_t *aperture_base,
> +				  size_t *aperture_size,
> +				  size_t *start_offset)
> +{
> +	/* The first num_doorbells are used by radeon.
> +	 * KFD takes whatever's left in the aperture. */
> +	if (rdev->doorbell.size > rdev->doorbell.num_doorbells * sizeof(u32)) {
> +		*aperture_base = rdev->doorbell.base;
> +		*aperture_size = rdev->doorbell.size;
> +		*start_offset = rdev->doorbell.num_doorbells * sizeof(u32);
> +	} else {
> +		*aperture_base = 0;
> +		*aperture_size = 0;
> +		*start_offset = 0;
> +	}
> +}
> +
>  /*
>   * radeon_wb_*()
>   * Writeback is the the method by which the the GPU updates special pages
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd
@ 2014-07-11 16:16     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:16 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:03AM +0300, Oded Gabbay wrote:
> Radeon and KFD share the doorbell aperture.
> Radeon sets it up, takes the doorbells required for its own rings
> and reports the setup to KFD.
> Radeon reserved doorbells are at the start of the doorbell aperture.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

I would need some refreshing on doorbell. You want to map it to userspace
but at the same time it is use by the radeon kernel driver when dispatching
on the compute ring (iirc the gfx ring does not use it).

So now my worry is, given usermapping is done on page granularity, what
would block one process from writting to another process doorbell ? Again
iirc the doorbell is actualy the wptr for the ring buffer associated with
said doorbell (thought i forget how doorbell are associated with a ring).

This sounds really bad.

Cheers,
Jérôme

> ---
>  drivers/gpu/drm/radeon/radeon.h        |  4 ++++
>  drivers/gpu/drm/radeon/radeon_device.c | 31 +++++++++++++++++++++++++++++++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 7cda75d..4e7e41f 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -676,6 +676,10 @@ struct radeon_doorbell {
>  
>  int radeon_doorbell_get(struct radeon_device *rdev, u32 *page);
>  void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell);
> +void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
> +				  phys_addr_t *aperture_base,
> +				  size_t *aperture_size,
> +				  size_t *start_offset);
>  
>  /*
>   * IRQS.
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index 03686fa..98538d2 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -328,6 +328,37 @@ void radeon_doorbell_free(struct radeon_device *rdev, u32 doorbell)
>  		__clear_bit(doorbell, rdev->doorbell.used);
>  }
>  
> +/**
> + * radeon_doorbell_get_kfd_info - Report doorbell configuration required to
> + *                                setup KFD
> + *
> + * @rdev: radeon_device pointer
> + * @aperture_base: output returning doorbell aperture base physical address
> + * @aperture_size: output returning doorbell aperture size in bytes
> + * @start_offset: output returning # of doorbell bytes reserved for radeon.
> + *
> + * Radeon and the KFD share the doorbell aperture. Radeon sets it up,
> + * takes doorbells required for its own rings and reports the setup to KFD.
> + * Radeon reserved doorbells are at the start of the doorbell aperture.
> + */
> +void radeon_doorbell_get_kfd_info(struct radeon_device *rdev,
> +				  phys_addr_t *aperture_base,
> +				  size_t *aperture_size,
> +				  size_t *start_offset)
> +{
> +	/* The first num_doorbells are used by radeon.
> +	 * KFD takes whatever's left in the aperture. */
> +	if (rdev->doorbell.size > rdev->doorbell.num_doorbells * sizeof(u32)) {
> +		*aperture_base = rdev->doorbell.base;
> +		*aperture_size = rdev->doorbell.size;
> +		*start_offset = rdev->doorbell.num_doorbells * sizeof(u32);
> +	} else {
> +		*aperture_base = 0;
> +		*aperture_size = 0;
> +		*start_offset = 0;
> +	}
> +}
> +
>  /*
>   * radeon_wb_*()
>   * Writeback is the the method by which the the GPU updates special pages
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 16:05   ` Jerome Glisse
@ 2014-07-11 16:18     ` Christian König
  -1 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-11 16:18 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay

Am 11.07.2014 18:05, schrieb Jerome Glisse:
> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>> To support HSA on KV, we need to limit the number of vmids and pipes
>> that are available for radeon's use with KV.
>>
>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>> 0-7) and also makes radeon thinks that KV has only a single MEC with a single
>> pipe in it
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

At least fro the VMIDs on demand allocation should be trivial to 
implement, so I would rather prefer this instead of a fixed assignment.

Christian.

>
>> ---
>>   drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>> index 4bfc2c0..e0c8052 100644
>> --- a/drivers/gpu/drm/radeon/cik.c
>> +++ b/drivers/gpu/drm/radeon/cik.c
>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
>>   	/*
>>   	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>   	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>> +	 * Nonetheless, we assign only 1 pipe because all other pipes will
>> +	 * be handled by KFD
>>   	 */
>> -	if (rdev->family == CHIP_KAVERI)
>> -		rdev->mec.num_mec = 2;
>> -	else
>> -		rdev->mec.num_mec = 1;
>> -	rdev->mec.num_pipe = 4;
>> +	rdev->mec.num_mec = 1;
>> +	rdev->mec.num_pipe = 1;
>>   	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>   
>>   	if (rdev->mec.hpd_eop_obj == NULL) {
>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
>>   
>>   	/* init the pipes */
>>   	mutex_lock(&rdev->srbm_mutex);
>> -	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>> -		int me = (i < 4) ? 1 : 2;
>> -		int pipe = (i < 4) ? i : (i - 4);
>>   
>> -		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
>> +	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>   
>> -		cik_srbm_select(rdev, me, pipe, 0, 0);
>> +	cik_srbm_select(rdev, 0, 0, 0, 0);
>>   
>> -		/* write the EOP addr */
>> -		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>> -		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>> +	/* write the EOP addr */
>> +	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>> +	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>>   
>> -		/* set the VMID assigned */
>> -		WREG32(CP_HPD_EOP_VMID, 0);
>> +	/* set the VMID assigned */
>> +	WREG32(CP_HPD_EOP_VMID, 0);
>> +
>> +	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>> +	tmp = RREG32(CP_HPD_EOP_CONTROL);
>> +	tmp &= ~EOP_SIZE_MASK;
>> +	tmp |= order_base_2(MEC_HPD_SIZE / 8);
>> +	WREG32(CP_HPD_EOP_CONTROL, tmp);
>>   
>> -		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>> -		tmp = RREG32(CP_HPD_EOP_CONTROL);
>> -		tmp &= ~EOP_SIZE_MASK;
>> -		tmp |= order_base_2(MEC_HPD_SIZE / 8);
>> -		WREG32(CP_HPD_EOP_CONTROL, tmp);
>> -	}
>> -	cik_srbm_select(rdev, 0, 0, 0, 0);
>>   	mutex_unlock(&rdev->srbm_mutex);
>>   
>>   	/* init the queues.  Just two for now. */
>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>>    */
>>   int cik_vm_init(struct radeon_device *rdev)
>>   {
>> -	/* number of VMs */
>> -	rdev->vm_manager.nvm = 16;
>> +	/*
>> +	 * number of VMs
>> +	 * VMID 0 is reserved for Graphics
>> +	 * radeon compute will use VMIDs 1-7
>> +	 * KFD will use VMIDs 8-15
>> +	 */
>> +	rdev->vm_manager.nvm = 8;
>>   	/* base offset of vram pages */
>>   	if (rdev->flags & RADEON_IS_IGP) {
>>   		u64 tmp = RREG32(MC_VM_FB_OFFSET);
>> -- 
>> 1.9.1
>>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-11 16:18     ` Christian König
  0 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-11 16:18 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

Am 11.07.2014 18:05, schrieb Jerome Glisse:
> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>> To support HSA on KV, we need to limit the number of vmids and pipes
>> that are available for radeon's use with KV.
>>
>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>> 0-7) and also makes radeon thinks that KV has only a single MEC with a single
>> pipe in it
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

At least fro the VMIDs on demand allocation should be trivial to 
implement, so I would rather prefer this instead of a fixed assignment.

Christian.

>
>> ---
>>   drivers/gpu/drm/radeon/cik.c | 48 ++++++++++++++++++++++----------------------
>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>> index 4bfc2c0..e0c8052 100644
>> --- a/drivers/gpu/drm/radeon/cik.c
>> +++ b/drivers/gpu/drm/radeon/cik.c
>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device *rdev)
>>   	/*
>>   	 * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>   	 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>> +	 * Nonetheless, we assign only 1 pipe because all other pipes will
>> +	 * be handled by KFD
>>   	 */
>> -	if (rdev->family == CHIP_KAVERI)
>> -		rdev->mec.num_mec = 2;
>> -	else
>> -		rdev->mec.num_mec = 1;
>> -	rdev->mec.num_pipe = 4;
>> +	rdev->mec.num_mec = 1;
>> +	rdev->mec.num_pipe = 1;
>>   	rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>   
>>   	if (rdev->mec.hpd_eop_obj == NULL) {
>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct radeon_device *rdev)
>>   
>>   	/* init the pipes */
>>   	mutex_lock(&rdev->srbm_mutex);
>> -	for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>> -		int me = (i < 4) ? 1 : 2;
>> -		int pipe = (i < 4) ? i : (i - 4);
>>   
>> -		eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 2);
>> +	eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>   
>> -		cik_srbm_select(rdev, me, pipe, 0, 0);
>> +	cik_srbm_select(rdev, 0, 0, 0, 0);
>>   
>> -		/* write the EOP addr */
>> -		WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>> -		WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>> +	/* write the EOP addr */
>> +	WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>> +	WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
>>   
>> -		/* set the VMID assigned */
>> -		WREG32(CP_HPD_EOP_VMID, 0);
>> +	/* set the VMID assigned */
>> +	WREG32(CP_HPD_EOP_VMID, 0);
>> +
>> +	/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>> +	tmp = RREG32(CP_HPD_EOP_CONTROL);
>> +	tmp &= ~EOP_SIZE_MASK;
>> +	tmp |= order_base_2(MEC_HPD_SIZE / 8);
>> +	WREG32(CP_HPD_EOP_CONTROL, tmp);
>>   
>> -		/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>> -		tmp = RREG32(CP_HPD_EOP_CONTROL);
>> -		tmp &= ~EOP_SIZE_MASK;
>> -		tmp |= order_base_2(MEC_HPD_SIZE / 8);
>> -		WREG32(CP_HPD_EOP_CONTROL, tmp);
>> -	}
>> -	cik_srbm_select(rdev, 0, 0, 0, 0);
>>   	mutex_unlock(&rdev->srbm_mutex);
>>   
>>   	/* init the queues.  Just two for now. */
>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>>    */
>>   int cik_vm_init(struct radeon_device *rdev)
>>   {
>> -	/* number of VMs */
>> -	rdev->vm_manager.nvm = 16;
>> +	/*
>> +	 * number of VMs
>> +	 * VMID 0 is reserved for Graphics
>> +	 * radeon compute will use VMIDs 1-7
>> +	 * KFD will use VMIDs 8-15
>> +	 */
>> +	rdev->vm_manager.nvm = 8;
>>   	/* base offset of vram pages */
>>   	if (rdev->flags & RADEON_IS_IGP) {
>>   		u64 tmp = RREG32(MC_VM_FB_OFFSET);
>> -- 
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 16:18     ` Christian König
@ 2014-07-11 16:22       ` Alex Deucher
  -1 siblings, 0 replies; 116+ messages in thread
From: Alex Deucher @ 2014-07-11 16:22 UTC (permalink / raw)
  To: Christian König
  Cc: Jerome Glisse, Oded Gabbay, Andrew Lewycky, LKML,
	Maling list - DRI developers, Alex Deucher

On Fri, Jul 11, 2014 at 12:18 PM, Christian König
<christian.koenig@amd.com> wrote:
> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>
>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>
>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>> that are available for radeon's use with KV.
>>>
>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>> single
>>> pipe in it
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>
>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>
>
> At least fro the VMIDs on demand allocation should be trivial to implement,
> so I would rather prefer this instead of a fixed assignment.

IIRC, the way the CP hw scheduler works you have to give it a range of
vmids and it assigns them dynamically as queues are mapped so
effectively they are potentially in use once the CP scheduler is set
up.

Alex


>
> Christian.
>
>
>>
>>> ---
>>>   drivers/gpu/drm/radeon/cik.c | 48
>>> ++++++++++++++++++++++----------------------
>>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>>> index 4bfc2c0..e0c8052 100644
>>> --- a/drivers/gpu/drm/radeon/cik.c
>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>> *rdev)
>>>         /*
>>>          * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>          * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>>> +        * Nonetheless, we assign only 1 pipe because all other pipes
>>> will
>>> +        * be handled by KFD
>>>          */
>>> -       if (rdev->family == CHIP_KAVERI)
>>> -               rdev->mec.num_mec = 2;
>>> -       else
>>> -               rdev->mec.num_mec = 1;
>>> -       rdev->mec.num_pipe = 4;
>>> +       rdev->mec.num_mec = 1;
>>> +       rdev->mec.num_pipe = 1;
>>>         rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>>         if (rdev->mec.hpd_eop_obj == NULL) {
>>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct
>>> radeon_device *rdev)
>>>         /* init the pipes */
>>>         mutex_lock(&rdev->srbm_mutex);
>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>> -               int me = (i < 4) ? 1 : 2;
>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>   -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>> MEC_HPD_SIZE * 2);
>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>   -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>   -             /* write the EOP addr */
>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>> upper_32_bits(eop_gpu_addr) >> 8);
>>> +       /* write the EOP addr */
>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >>
>>> 8);
>>>   -             /* set the VMID assigned */
>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>> +       /* set the VMID assigned */
>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>> +
>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>> +       tmp &= ~EOP_SIZE_MASK;
>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>   -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>> dwords */
>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>> -               tmp &= ~EOP_SIZE_MASK;
>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>> -       }
>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>         mutex_unlock(&rdev->srbm_mutex);
>>>         /* init the queues.  Just two for now. */
>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev,
>>> struct radeon_ib *ib)
>>>    */
>>>   int cik_vm_init(struct radeon_device *rdev)
>>>   {
>>> -       /* number of VMs */
>>> -       rdev->vm_manager.nvm = 16;
>>> +       /*
>>> +        * number of VMs
>>> +        * VMID 0 is reserved for Graphics
>>> +        * radeon compute will use VMIDs 1-7
>>> +        * KFD will use VMIDs 8-15
>>> +        */
>>> +       rdev->vm_manager.nvm = 8;
>>>         /* base offset of vram pages */
>>>         if (rdev->flags & RADEON_IS_IGP) {
>>>                 u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>> --
>>> 1.9.1
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-11 16:22       ` Alex Deucher
  0 siblings, 0 replies; 116+ messages in thread
From: Alex Deucher @ 2014-07-11 16:22 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Andrew Lewycky, LKML, Maling list - DRI developers,
	Alex Deucher

On Fri, Jul 11, 2014 at 12:18 PM, Christian König
<christian.koenig@amd.com> wrote:
> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>
>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>
>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>> that are available for radeon's use with KV.
>>>
>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>> single
>>> pipe in it
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>
>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>
>
> At least fro the VMIDs on demand allocation should be trivial to implement,
> so I would rather prefer this instead of a fixed assignment.

IIRC, the way the CP hw scheduler works you have to give it a range of
vmids and it assigns them dynamically as queues are mapped so
effectively they are potentially in use once the CP scheduler is set
up.

Alex


>
> Christian.
>
>
>>
>>> ---
>>>   drivers/gpu/drm/radeon/cik.c | 48
>>> ++++++++++++++++++++++----------------------
>>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>>> index 4bfc2c0..e0c8052 100644
>>> --- a/drivers/gpu/drm/radeon/cik.c
>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>> *rdev)
>>>         /*
>>>          * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>          * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>>> +        * Nonetheless, we assign only 1 pipe because all other pipes
>>> will
>>> +        * be handled by KFD
>>>          */
>>> -       if (rdev->family == CHIP_KAVERI)
>>> -               rdev->mec.num_mec = 2;
>>> -       else
>>> -               rdev->mec.num_mec = 1;
>>> -       rdev->mec.num_pipe = 4;
>>> +       rdev->mec.num_mec = 1;
>>> +       rdev->mec.num_pipe = 1;
>>>         rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>>         if (rdev->mec.hpd_eop_obj == NULL) {
>>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct
>>> radeon_device *rdev)
>>>         /* init the pipes */
>>>         mutex_lock(&rdev->srbm_mutex);
>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>> -               int me = (i < 4) ? 1 : 2;
>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>   -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>> MEC_HPD_SIZE * 2);
>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>   -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>   -             /* write the EOP addr */
>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>> upper_32_bits(eop_gpu_addr) >> 8);
>>> +       /* write the EOP addr */
>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >>
>>> 8);
>>>   -             /* set the VMID assigned */
>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>> +       /* set the VMID assigned */
>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>> +
>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>> +       tmp &= ~EOP_SIZE_MASK;
>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>   -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>> dwords */
>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>> -               tmp &= ~EOP_SIZE_MASK;
>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>> -       }
>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>         mutex_unlock(&rdev->srbm_mutex);
>>>         /* init the queues.  Just two for now. */
>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev,
>>> struct radeon_ib *ib)
>>>    */
>>>   int cik_vm_init(struct radeon_device *rdev)
>>>   {
>>> -       /* number of VMs */
>>> -       rdev->vm_manager.nvm = 16;
>>> +       /*
>>> +        * number of VMs
>>> +        * VMID 0 is reserved for Graphics
>>> +        * radeon compute will use VMIDs 1-7
>>> +        * KFD will use VMIDs 8-15
>>> +        */
>>> +       rdev->vm_manager.nvm = 8;
>>>         /* base offset of vram pages */
>>>         if (rdev->flags & RADEON_IS_IGP) {
>>>                 u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>> --
>>> 1.9.1
>>>
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
  2014-07-10 22:38     ` Joe Perches
@ 2014-07-11 16:24       ` Jerome Glisse
  -1 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: Oded Gabbay, David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Thu, Jul 10, 2014 at 03:38:33PM -0700, Joe Perches wrote:
> On Fri, 2014-07-11 at 00:50 +0300, Oded Gabbay wrote:
> > This patch adds the interface between the radeon driver and the kfd
> > driver. The interface implementation is contained in
> > radeon_kfd.c and radeon_kfd.h.
> []
> >  include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++
> 
> Is there a good reason to put this file in include/linux?
> 

Agrees, we do not want to clutter include/linux/ with specific driver
include, i think its one of the rules even thought there is some hw header
already in there.

I would rather see either a new dir include/hsa or inside include/drm.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
@ 2014-07-11 16:24       ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: Oded Gabbay, Andrew Lewycky, linux-kernel, dri-devel,
	Alex Deucher, Christian König

On Thu, Jul 10, 2014 at 03:38:33PM -0700, Joe Perches wrote:
> On Fri, 2014-07-11 at 00:50 +0300, Oded Gabbay wrote:
> > This patch adds the interface between the radeon driver and the kfd
> > driver. The interface implementation is contained in
> > radeon_kfd.c and radeon_kfd.h.
> []
> >  include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++
> 
> Is there a good reason to put this file in include/linux?
> 

Agrees, we do not want to clutter include/linux/ with specific driver
include, i think its one of the rules even thought there is some hw header
already in there.

I would rather see either a new dir include/hsa or inside include/drm.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size
  2014-07-10 21:50 ` [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size Oded Gabbay
@ 2014-07-11 16:27     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:27 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:05AM +0300, Oded Gabbay wrote:
> This patch adds a new interface to kfd2kgd_calls structure so that
> the kfd driver could get the virtual ram size of a specific
> radeon device.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

What is vmem_size ? This need to be documented. I assume this is the
number of bits the gpu can handle and i would assume that the minimum
requirement is that the device have at least as many bit as the cpu ?
ie on 48bits x86-64 the hardware also needs to support that.

Otherwise this sounds like broken things can happen.

Cheers,
Jérôme

> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 12 ++++++++++++
>  include/linux/radeon_kfd.h          |  1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 7c7f808..1b859b5 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -25,7 +25,10 @@
>  #include <drm/drmP.h>
>  #include "radeon.h"
>  
> +static uint64_t get_vmem_size(struct kgd_dev *kgd);
> +
>  static const struct kfd2kgd_calls kfd2kgd = {
> +	.get_vmem_size = get_vmem_size,
>  };
>  
>  static const struct kgd2kfd_calls *kgd2kfd;
> @@ -92,3 +95,12 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
>  		rdev->kfd = NULL;
>  	}
>  }
> +
> +static uint64_t get_vmem_size(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	BUG_ON(kgd == NULL);
> +
> +	return rdev->mc.real_vram_size;
> +}
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index 59785e9..28cddf5 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -57,6 +57,7 @@ struct kgd2kfd_calls {
>  };
>  
>  struct kfd2kgd_calls {
> +	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>  };
>  
>  bool kgd2kfd_init(unsigned interface_version,
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size
@ 2014-07-11 16:27     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:27 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:05AM +0300, Oded Gabbay wrote:
> This patch adds a new interface to kfd2kgd_calls structure so that
> the kfd driver could get the virtual ram size of a specific
> radeon device.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

What is vmem_size ? This need to be documented. I assume this is the
number of bits the gpu can handle and i would assume that the minimum
requirement is that the device have at least as many bit as the cpu ?
ie on 48bits x86-64 the hardware also needs to support that.

Otherwise this sounds like broken things can happen.

Cheers,
Jérôme

> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 12 ++++++++++++
>  include/linux/radeon_kfd.h          |  1 +
>  2 files changed, 13 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 7c7f808..1b859b5 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -25,7 +25,10 @@
>  #include <drm/drmP.h>
>  #include "radeon.h"
>  
> +static uint64_t get_vmem_size(struct kgd_dev *kgd);
> +
>  static const struct kfd2kgd_calls kfd2kgd = {
> +	.get_vmem_size = get_vmem_size,
>  };
>  
>  static const struct kgd2kfd_calls *kgd2kfd;
> @@ -92,3 +95,12 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
>  		rdev->kfd = NULL;
>  	}
>  }
> +
> +static uint64_t get_vmem_size(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	BUG_ON(kgd == NULL);
> +
> +	return rdev->mc.real_vram_size;
> +}
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index 59785e9..28cddf5 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -57,6 +57,7 @@ struct kgd2kfd_calls {
>  };
>  
>  struct kfd2kgd_calls {
> +	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>  };
>  
>  bool kgd2kfd_init(unsigned interface_version,
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
  2014-07-10 21:50 ` [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping Oded Gabbay
@ 2014-07-11 16:32     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:06AM +0300, Oded Gabbay wrote:
> This patch adds new interfaces to kfd2kgd_calls structure.
> 
> The new interfaces allow the kfd driver to :
> 
> 1. Allocated video memory through the radeon driver
> 2. Map and unmap video memory with GPUVM through the radeon driver
> 3. Map and unmap system memory with GPUVM through the radeon driver
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 129 ++++++++++++++++++++++++++++++++++++
>  include/linux/radeon_kfd.h          |  23 +++++++
>  2 files changed, 152 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 1b859b5..66ee36b 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -25,9 +25,31 @@
>  #include <drm/drmP.h>
>  #include "radeon.h"
>  
> +struct kgd_mem {
> +	struct radeon_bo *bo;
> +	u32 domain;
> +};
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>  
>  static const struct kfd2kgd_calls kfd2kgd = {
> +	.allocate_mem = allocate_mem,
> +	.free_mem = free_mem,
> +	.gpumap_mem = gpumap_mem,
> +	.ungpumap_mem = ungpumap_mem,
> +	.kmap_mem = kmap_mem,
> +	.unkmap_mem = unkmap_mem,
>  	.get_vmem_size = get_vmem_size,
>  };
>  
> @@ -96,6 +118,113 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
>  	}
>  }
>  
> +static u32 pool_to_domain(enum kgd_memory_pool p)
> +{
> +	switch (p) {
> +	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
> +	default: return RADEON_GEM_DOMAIN_GTT;
> +	}
> +}
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +	struct kgd_mem *mem;
> +	int r;
> +
> +	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
> +	if (!mem)
> +		return -ENOMEM;
> +
> +	mem->domain = pool_to_domain(pool);
> +
> +	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
> +	if (r) {
> +		kfree(mem);
> +		return r;
> +	}
> +
> +	*memory_handle = mem;
> +	return 0;
> +}
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
> +	radeon_bo_unref(&mem->bo);
> +	kfree(mem);
> +}
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}

What is lifetime of such object ? Are they limited in size and number ? How
can the GFX side ie radeon force unmap them ?

Because pining is a big NO we only pin a handfull of buffer and the only
thing we allow userspace to pin are front buffer associated with crtc
which means there is a limited number of such buffer and there is a
legitimate use for pining them.

To me this looks like anyone can pin vram and thus starve the GFX side,
read DDOS.

Cheers,
Jérôme

> +
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_unpin(mem->bo);
> +
> +	/*
> +	 * This unpin only removed NO_EVICT placement flags
> +	 * and should never fail
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_kmap(mem->bo, ptr);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}
> +
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_kunmap(mem->bo);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
>  static uint64_t get_vmem_size(struct kgd_dev *kgd)
>  {
>  	struct radeon_device *rdev = (struct radeon_device *)kgd;
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index 28cddf5..c7997d4 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -36,6 +36,14 @@ struct pci_dev;
>  struct kfd_dev;
>  struct kgd_dev;
>  
> +struct kgd_mem;
> +
> +enum kgd_memory_pool {
> +	KGD_POOL_SYSTEM_CACHEABLE = 1,
> +	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
> +	KGD_POOL_FRAMEBUFFER = 3,
> +};
> +
>  struct kgd2kfd_shared_resources {
>  	void __iomem *mmio_registers; /* Mapped pointer to GFX MMIO registers. */
>  
> @@ -57,6 +65,21 @@ struct kgd2kfd_calls {
>  };
>  
>  struct kfd2kgd_calls {
> +	/* Memory management. */
> +	int (*allocate_mem)(struct kgd_dev *kgd,
> +				size_t size,
> +				size_t alignment,
> +				enum kgd_memory_pool pool,
> +				struct kgd_mem **memory_handle);
> +
> +	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>  };
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping
@ 2014-07-11 16:32     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:32 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:06AM +0300, Oded Gabbay wrote:
> This patch adds new interfaces to kfd2kgd_calls structure.
> 
> The new interfaces allow the kfd driver to :
> 
> 1. Allocated video memory through the radeon driver
> 2. Map and unmap video memory with GPUVM through the radeon driver
> 3. Map and unmap system memory with GPUVM through the radeon driver
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 129 ++++++++++++++++++++++++++++++++++++
>  include/linux/radeon_kfd.h          |  23 +++++++
>  2 files changed, 152 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 1b859b5..66ee36b 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -25,9 +25,31 @@
>  #include <drm/drmP.h>
>  #include "radeon.h"
>  
> +struct kgd_mem {
> +	struct radeon_bo *bo;
> +	u32 domain;
> +};
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle);
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>  
>  static const struct kfd2kgd_calls kfd2kgd = {
> +	.allocate_mem = allocate_mem,
> +	.free_mem = free_mem,
> +	.gpumap_mem = gpumap_mem,
> +	.ungpumap_mem = ungpumap_mem,
> +	.kmap_mem = kmap_mem,
> +	.unkmap_mem = unkmap_mem,
>  	.get_vmem_size = get_vmem_size,
>  };
>  
> @@ -96,6 +118,113 @@ void radeon_kfd_device_fini(struct radeon_device *rdev)
>  	}
>  }
>  
> +static u32 pool_to_domain(enum kgd_memory_pool p)
> +{
> +	switch (p) {
> +	case KGD_POOL_FRAMEBUFFER: return RADEON_GEM_DOMAIN_VRAM;
> +	default: return RADEON_GEM_DOMAIN_GTT;
> +	}
> +}
> +
> +static int allocate_mem(struct kgd_dev *kgd, size_t size, size_t alignment,
> +		enum kgd_memory_pool pool, struct kgd_mem **memory_handle)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +	struct kgd_mem *mem;
> +	int r;
> +
> +	mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
> +	if (!mem)
> +		return -ENOMEM;
> +
> +	mem->domain = pool_to_domain(pool);
> +
> +	r = radeon_bo_create(rdev, size, alignment, true, mem->domain, NULL, &mem->bo);
> +	if (r) {
> +		kfree(mem);
> +		return r;
> +	}
> +
> +	*memory_handle = mem;
> +	return 0;
> +}
> +
> +static void free_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	/* Assume that KFD will never free gpumapped or kmapped memory. This is not quite settled. */
> +	radeon_bo_unref(&mem->bo);
> +	kfree(mem);
> +}
> +
> +static int gpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_pin(mem->bo, mem->domain, vmid0_address);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}

What is lifetime of such object ? Are they limited in size and number ? How
can the GFX side ie radeon force unmap them ?

Because pining is a big NO we only pin a handfull of buffer and the only
thing we allow userspace to pin are front buffer associated with crtc
which means there is a limited number of such buffer and there is a
legitimate use for pining them.

To me this looks like anyone can pin vram and thus starve the GFX side,
read DDOS.

Cheers,
Jérôme

> +
> +static void ungpumap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_unpin(mem->bo);
> +
> +	/*
> +	 * This unpin only removed NO_EVICT placement flags
> +	 * and should never fail
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
> +static int kmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	r = radeon_bo_kmap(mem->bo, ptr);
> +	radeon_bo_unreserve(mem->bo);
> +
> +	return r;
> +}
> +
> +static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +	int r;
> +
> +	r = radeon_bo_reserve(mem->bo, true);
> +	/*
> +	 * ttm_bo_reserve can only fail if the buffer reservation lock
> +	 * is held in circumstances that would deadlock
> +	 */
> +	BUG_ON(r != 0);
> +	radeon_bo_kunmap(mem->bo);
> +	radeon_bo_unreserve(mem->bo);
> +}
> +
>  static uint64_t get_vmem_size(struct kgd_dev *kgd)
>  {
>  	struct radeon_device *rdev = (struct radeon_device *)kgd;
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index 28cddf5..c7997d4 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -36,6 +36,14 @@ struct pci_dev;
>  struct kfd_dev;
>  struct kgd_dev;
>  
> +struct kgd_mem;
> +
> +enum kgd_memory_pool {
> +	KGD_POOL_SYSTEM_CACHEABLE = 1,
> +	KGD_POOL_SYSTEM_WRITECOMBINE = 2,
> +	KGD_POOL_FRAMEBUFFER = 3,
> +};
> +
>  struct kgd2kfd_shared_resources {
>  	void __iomem *mmio_registers; /* Mapped pointer to GFX MMIO registers. */
>  
> @@ -57,6 +65,21 @@ struct kgd2kfd_calls {
>  };
>  
>  struct kfd2kgd_calls {
> +	/* Memory management. */
> +	int (*allocate_mem)(struct kgd_dev *kgd,
> +				size_t size,
> +				size_t alignment,
> +				enum kgd_memory_pool pool,
> +				struct kgd_mem **memory_handle);
> +
> +	void (*free_mem)(struct kgd_dev *kgd, struct kgd_mem *memory_handle);
> +
> +	int (*gpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, uint64_t *vmid0_address);
> +	void (*ungpumap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
> +	int (*kmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem, void **ptr);
> +	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +
>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>  };
>  
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  2014-07-10 21:50 ` [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register Oded Gabbay
@ 2014-07-11 16:34     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
> This patch adds a new interface to kfd2kgd_calls structure, which
> allows the kfd to lock and unlock the srbm_gfx_cntl register

Why does kfd needs to lock this register if kfd can not access
any of those register ? This sounds broken to me, exposing a
driver internal mutex to another driver is not something i am
fan of.

Cheers,
Jérôme

> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>  include/linux/radeon_kfd.h          |  4 ++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 66ee36b..594020e 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
>  
>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>  
> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd);
> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
> +
> +
>  static const struct kfd2kgd_calls kfd2kgd = {
>  	.allocate_mem = allocate_mem,
>  	.free_mem = free_mem,
> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>  	.kmap_mem = kmap_mem,
>  	.unkmap_mem = unkmap_mem,
>  	.get_vmem_size = get_vmem_size,
> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>  };
>  
>  static const struct kgd2kfd_calls *kgd2kfd;
> @@ -233,3 +239,17 @@ static uint64_t get_vmem_size(struct kgd_dev *kgd)
>  
>  	return rdev->mc.real_vram_size;
>  }
> +
> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	mutex_lock(&rdev->srbm_mutex);
> +}
> +
> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	mutex_unlock(&rdev->srbm_mutex);
> +}
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index c7997d4..40b691c 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>  
>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
> +
> +	/* SRBM_GFX_CNTL mutex */
> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>  };
>  
>  bool kgd2kfd_init(unsigned interface_version,
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
@ 2014-07-11 16:34     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:34 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
> This patch adds a new interface to kfd2kgd_calls structure, which
> allows the kfd to lock and unlock the srbm_gfx_cntl register

Why does kfd needs to lock this register if kfd can not access
any of those register ? This sounds broken to me, exposing a
driver internal mutex to another driver is not something i am
fan of.

Cheers,
Jérôme

> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>  include/linux/radeon_kfd.h          |  4 ++++
>  2 files changed, 24 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c b/drivers/gpu/drm/radeon/radeon_kfd.c
> index 66ee36b..594020e 100644
> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct kgd_mem *mem);
>  
>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>  
> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd);
> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
> +
> +
>  static const struct kfd2kgd_calls kfd2kgd = {
>  	.allocate_mem = allocate_mem,
>  	.free_mem = free_mem,
> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>  	.kmap_mem = kmap_mem,
>  	.unkmap_mem = unkmap_mem,
>  	.get_vmem_size = get_vmem_size,
> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>  };
>  
>  static const struct kgd2kfd_calls *kgd2kfd;
> @@ -233,3 +239,17 @@ static uint64_t get_vmem_size(struct kgd_dev *kgd)
>  
>  	return rdev->mc.real_vram_size;
>  }
> +
> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	mutex_lock(&rdev->srbm_mutex);
> +}
> +
> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd)
> +{
> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
> +
> +	mutex_unlock(&rdev->srbm_mutex);
> +}
> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
> index c7997d4..40b691c 100644
> --- a/include/linux/radeon_kfd.h
> +++ b/include/linux/radeon_kfd.h
> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>  
>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
> +
> +	/* SRBM_GFX_CNTL mutex */
> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>  };
>  
>  bool kgd2kfd_init(unsigned interface_version,
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-10 21:50 ` [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon Oded Gabbay
@ 2014-07-11 16:36     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:36 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Christian König

On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
> The KFD driver should be loaded when the radeon driver is loaded and
> should be finalized when the radeon driver is removed.
> 
> This patch adds a function call to initialize kfd from radeon_init
> and a function call to finalize kfd from radeon_exit.
> 
> If the KFD driver is not present in the system, the initialize call
> fails and the radeon driver continues normally.
> 
> This patch also adds calls to probe, initialize and finalize a kfd device
> per radeon device using the kgd-->kfd interface.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

It might be nice to allow to build radeon without HSA so i think an
CONFIG_HSA should be added and have other thing depends on it.
Otherwise this one is.

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>


> ---
>  drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>  drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index cb14213..88a45a0 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>  static inline void radeon_unregister_atpx_handler(void) {}
>  #endif
>  
> +extern bool radeon_kfd_init(void);
> +extern void radeon_kfd_fini(void);
> +
>  int radeon_no_wb;
>  int radeon_modeset = -1;
>  int radeon_dynclks = -1;
> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>  #endif
>  	}
>  
> +	radeon_kfd_init();
> +
>  	/* let modprobe override vga console setting */
>  	return drm_pci_init(driver, pdriver);
>  }
>  
>  static void __exit radeon_exit(void)
>  {
> +	radeon_kfd_fini();
>  	drm_pci_exit(driver, pdriver);
>  	radeon_unregister_atpx_handler();
>  }
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 35d9318..0748284 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -34,6 +34,10 @@
>  #include <linux/slab.h>
>  #include <linux/pm_runtime.h>
>  
> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
> +
>  #if defined(CONFIG_VGA_SWITCHEROO)
>  bool radeon_has_atpx(void);
>  #else
> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>  
>  	pm_runtime_get_sync(dev->dev);
>  
> +	radeon_kfd_device_fini(rdev);
> +
>  	radeon_acpi_fini(rdev);
>  	
>  	radeon_modeset_fini(rdev);
> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>  				"Error during ACPI methods call\n");
>  	}
>  
> +	radeon_kfd_device_probe(rdev);
> +	radeon_kfd_device_init(rdev);
> +
>  	if (radeon_is_px(dev)) {
>  		pm_runtime_use_autosuspend(dev->dev);
>  		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
@ 2014-07-11 16:36     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 16:36 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
> The KFD driver should be loaded when the radeon driver is loaded and
> should be finalized when the radeon driver is removed.
> 
> This patch adds a function call to initialize kfd from radeon_init
> and a function call to finalize kfd from radeon_exit.
> 
> If the KFD driver is not present in the system, the initialize call
> fails and the radeon driver continues normally.
> 
> This patch also adds calls to probe, initialize and finalize a kfd device
> per radeon device using the kgd-->kfd interface.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

It might be nice to allow to build radeon without HSA so i think an
CONFIG_HSA should be added and have other thing depends on it.
Otherwise this one is.

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>


> ---
>  drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>  drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
> index cb14213..88a45a0 100644
> --- a/drivers/gpu/drm/radeon/radeon_drv.c
> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>  static inline void radeon_unregister_atpx_handler(void) {}
>  #endif
>  
> +extern bool radeon_kfd_init(void);
> +extern void radeon_kfd_fini(void);
> +
>  int radeon_no_wb;
>  int radeon_modeset = -1;
>  int radeon_dynclks = -1;
> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>  #endif
>  	}
>  
> +	radeon_kfd_init();
> +
>  	/* let modprobe override vga console setting */
>  	return drm_pci_init(driver, pdriver);
>  }
>  
>  static void __exit radeon_exit(void)
>  {
> +	radeon_kfd_fini();
>  	drm_pci_exit(driver, pdriver);
>  	radeon_unregister_atpx_handler();
>  }
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 35d9318..0748284 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -34,6 +34,10 @@
>  #include <linux/slab.h>
>  #include <linux/pm_runtime.h>
>  
> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
> +
>  #if defined(CONFIG_VGA_SWITCHEROO)
>  bool radeon_has_atpx(void);
>  #else
> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>  
>  	pm_runtime_get_sync(dev->dev);
>  
> +	radeon_kfd_device_fini(rdev);
> +
>  	radeon_acpi_fini(rdev);
>  	
>  	radeon_modeset_fini(rdev);
> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>  				"Error during ACPI methods call\n");
>  	}
>  
> +	radeon_kfd_device_probe(rdev);
> +	radeon_kfd_device_init(rdev);
> +
>  	if (radeon_is_px(dev)) {
>  		pm_runtime_use_autosuspend(dev->dev);
>  		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-10 21:50 ` [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs Oded Gabbay
@ 2014-07-11 17:04     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 17:04 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel

On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> This patch adds the code base of the hsa driver for
> AMD's GPUs.
> 
> This driver is called kfd.
> 
> This initial version supports the first HSA chip, Kaveri.
> 
> This driver is located in a new directory structure under drivers/gpu.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

There is too coding style issues. While we have been lax on the enforcing the
scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
on the 80chars per line but others things needs fixing so we stay inline.

Also i am a bit worried about the license, given top comment in each of the
files i am not sure this is GPL2 compatible. I would need to ask lawyer to
review that.

Others comment inline.


> ---
>  drivers/Kconfig                        |    2 +
>  drivers/gpu/Makefile                   |    1 +
>  drivers/gpu/hsa/Kconfig                |   20 +
>  drivers/gpu/hsa/Makefile               |    1 +
>  drivers/gpu/hsa/radeon/Makefile        |    8 +
>  drivers/gpu/hsa/radeon/kfd_chardev.c   |  133 ++++
>  drivers/gpu/hsa/radeon/kfd_crat.h      |  292 ++++++++
>  drivers/gpu/hsa/radeon/kfd_device.c    |  162 +++++
>  drivers/gpu/hsa/radeon/kfd_module.c    |  117 ++++
>  drivers/gpu/hsa/radeon/kfd_pasid.c     |   92 +++
>  drivers/gpu/hsa/radeon/kfd_priv.h      |  232 ++++++
>  drivers/gpu/hsa/radeon/kfd_process.c   |  400 +++++++++++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h |   62 ++
>  drivers/gpu/hsa/radeon/kfd_topology.c  | 1201 ++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_topology.h  |  168 +++++
>  15 files changed, 2891 insertions(+)
>  create mode 100644 drivers/gpu/hsa/Kconfig
>  create mode 100644 drivers/gpu/hsa/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> 
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 9b2dcc2..c1ac8f8 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -178,4 +178,6 @@ source "drivers/mcb/Kconfig"
>  
>  source "drivers/thunderbolt/Kconfig"
>  
> +source "drivers/gpu/hsa/Kconfig"
> +
>  endmenu
> diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
> index 70da9eb..749a7ea 100644
> --- a/drivers/gpu/Makefile
> +++ b/drivers/gpu/Makefile
> @@ -1,3 +1,4 @@
>  obj-y			+= drm/ vga/
>  obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
>  obj-$(CONFIG_IMX_IPUV3_CORE)	+= ipu-v3/
> +obj-$(CONFIG_HSA)	+= hsa/
> \ No newline at end of file
> diff --git a/drivers/gpu/hsa/Kconfig b/drivers/gpu/hsa/Kconfig
> new file mode 100644
> index 0000000..ee7bb28
> --- /dev/null
> +++ b/drivers/gpu/hsa/Kconfig
> @@ -0,0 +1,20 @@
> +#
> +# Heterogenous system architecture configuration
> +#
> +
> +menuconfig HSA
> +	bool "Heterogenous System Architecture"
> +	default y
> +	help
> +	  Say Y here if you want Heterogenous System Architecture support.

Maybe a bit more chatty here, there is already enough kernel option that
are cryptic even to kernel developer. Not everyone is well aware of all
the fence 3 letter accronym GPU uses :)

> +
> +if HSA
> +
> +config HSA_RADEON
> +	tristate "HSA kernel driver for AMD Radeon devices"
> +	depends on HSA && AMD_IOMMU_V2 && X86_64
> +	default m
> +	help
> +	  Enable this if you want to support HSA on AMD Radeon devices.
> +
> +endif # HSA
> diff --git a/drivers/gpu/hsa/Makefile b/drivers/gpu/hsa/Makefile
> new file mode 100644
> index 0000000..0951584
> --- /dev/null
> +++ b/drivers/gpu/hsa/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_HSA_RADEON)	+= radeon/
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> new file mode 100644
> index 0000000..ba16a09
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -0,0 +1,8 @@
> +#
> +# Makefile for Heterogenous System Architecture support for AMD Radeon devices
> +#
> +
> +radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
> +		kfd_pasid.o kfd_topology.o kfd_process.o
> +
> +obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> new file mode 100644
> index 0000000..7a56a8f
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -0,0 +1,133 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);

Nitpick, avoid unsigned int just use unsigned.

> +static int kfd_open(struct inode *, struct file *);
> +
> +static const char kfd_dev_name[] = "kfd";
> +
> +static const struct file_operations kfd_fops = {
> +	.owner = THIS_MODULE,
> +	.unlocked_ioctl = kfd_ioctl,
> +	.open = kfd_open,
> +};
> +
> +static int kfd_char_dev_major = -1;
> +static struct class *kfd_class;
> +struct device *kfd_device;
> +
> +int
> +radeon_kfd_chardev_init(void)
> +{
> +	int err = 0;
> +
> +	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
> +	err = kfd_char_dev_major;
> +	if (err < 0)
> +		goto err_register_chrdev;
> +
> +	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
> +	err = PTR_ERR(kfd_class);
> +	if (IS_ERR(kfd_class))
> +		goto err_class_create;
> +
> +	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
> +	err = PTR_ERR(kfd_device);
> +	if (IS_ERR(kfd_device))
> +		goto err_device_create;
> +
> +	return 0;
> +
> +err_device_create:
> +	class_destroy(kfd_class);
> +err_class_create:
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +err_register_chrdev:
> +	return err;
> +}
> +
> +void
> +radeon_kfd_chardev_exit(void)
> +{
> +	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
> +	class_destroy(kfd_class);
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +}
> +
> +struct device*
> +radeon_kfd_chardev(void)
> +{
> +	return kfd_device;
> +}
> +
> +
> +static int
> +kfd_open(struct inode *inode, struct file *filep)
> +{
> +	struct kfd_process *process;
> +
> +	if (iminor(inode) != 0)
> +		return -ENODEV;
> +
> +	process = radeon_kfd_create_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	pr_debug("\nkfd: process %d opened dev/kfd", process->pasid);
> +
> +	return 0;
> +}
> +
> +
> +static long
> +kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> +{
> +	long err = -EINVAL;
> +
> +	dev_info(kfd_device,
> +		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
> +		 cmd, _IOC_NR(cmd), arg);
> +
> +	switch (cmd) {
> +	default:
> +		dev_err(kfd_device,
> +			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> +			cmd, arg);
> +		err = -EINVAL;
> +		break;
> +	}
> +
> +	if (err < 0)
> +		dev_err(kfd_device, "ioctl error %ld\n", err);
> +
> +	return err;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_crat.h b/drivers/gpu/hsa/radeon/kfd_crat.h
> new file mode 100644
> index 0000000..587455d
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_crat.h
> @@ -0,0 +1,292 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_CRAT_H_INCLUDED
> +#define KFD_CRAT_H_INCLUDED
> +
> +#include <linux/types.h>
> +
> +#pragma pack(1)
> +
> +/*
> + * 4CC signature values for the CRAT and CDIT ACPI tables
> + */
> +
> +#define CRAT_SIGNATURE	"CRAT"
> +#define CDIT_SIGNATURE	"CDIT"
> +
> +/*
> + * Component Resource Association Table (CRAT)
> + */
> +
> +#define CRAT_OEMID_LENGTH	6
> +#define CRAT_OEMTABLEID_LENGTH	8
> +#define CRAT_RESERVED_LENGTH	6
> +
> +struct crat_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CRAT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		reserved[CRAT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * The header structure is immediately followed by total_entries of the
> + * data definitions
> + */
> +
> +/*
> + * The currently defined subtype entries in the CRAT
> + */
> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
> +#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
> +#define CRAT_SUBTYPE_CACHE_AFFINITY		2
> +#define CRAT_SUBTYPE_TLB_AFFINITY		3
> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
> +#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
> +#define CRAT_SUBTYPE_MAX			6
> +
> +#define CRAT_SIBLINGMAP_SIZE	32
> +
> +/*
> + * ComputeUnit Affinity structure and definitions
> + */
> +#define CRAT_CU_FLAGS_ENABLED		0x00000001
> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
> +#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
> +#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
> +#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
> +
> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
> +
> +struct crat_subtype_computeunit {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain;
> +	uint32_t	processor_id_low;
> +	uint16_t	num_cpu_cores;
> +	uint16_t	num_simd_cores;
> +	uint16_t	max_waves_simd;
> +	uint16_t	io_count;
> +	uint16_t	hsa_capability;
> +	uint16_t	lds_size_in_kb;
> +	uint8_t		wave_front_size;
> +	uint8_t		num_banks;
> +	uint16_t	micro_engine_id;
> +	uint8_t		num_arrays;
> +	uint8_t		num_cu_per_array;
> +	uint8_t		num_simd_per_cu;
> +	uint8_t		max_slots_scatch_cu;
> +	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Memory Affinity structure and definitions
> + */
> +#define CRAT_MEM_FLAGS_ENABLED		0x00000001
> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
> +#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
> +
> +#define CRAT_MEMORY_RESERVED_LENGTH 8
> +
> +struct crat_subtype_memory {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	promixity_domain;
> +	uint32_t	base_addr_low;
> +	uint32_t	base_addr_high;
> +	uint32_t	length_low;
> +	uint32_t	length_high;
> +	uint32_t	width;
> +	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Cache Affinity structure and definitions
> + */
> +#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
> +#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
> +#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
> +#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
> +#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_CACHE_RESERVED_LENGTH 8
> +
> +struct crat_subtype_cache {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	cache_size;
> +	uint8_t		cache_level;
> +	uint8_t		lines_per_tag;
> +	uint16_t	cache_line_size;
> +	uint8_t		associativity;
> +	uint8_t		cache_properties;
> +	uint16_t	cache_latency;
> +	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA TLB Affinity structure and definitions
> + */
> +#define CRAT_TLB_FLAGS_ENABLED	0x00000001
> +#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
> +#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
> +#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
> +#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
> +#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_TLB_RESERVED_LENGTH 4
> +
> +struct crat_subtype_tlb {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	tlb_level;
> +	uint8_t		data_tlb_associativity_2mb;
> +	uint8_t		data_tlb_size_2mb;
> +	uint8_t		instruction_tlb_associativity_2mb;
> +	uint8_t		instruction_tlb_size_2mb;
> +	uint8_t		data_tlb_associativity_4k;
> +	uint8_t		data_tlb_size_4k;
> +	uint8_t		instruction_tlb_associativity_4k;
> +	uint8_t		instruction_tlb_size_4k;
> +	uint8_t		data_tlb_associativity_1gb;
> +	uint8_t		data_tlb_size_1gb;
> +	uint8_t		instruction_tlb_associativity_1gb;
> +	uint8_t		instruction_tlb_size_1gb;
> +	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA CCompute/APU Affinity structure and definitions
> + */
> +#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
> +
> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
> +
> +struct crat_subtype_ccompute {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	apu_size;
> +	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA IO Link Affinity structure and definitions
> + */
> +#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
> +#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
> +#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
> +
> +/*
> + * IO interface types
> + */
> +#define CRAT_IOLINK_TYPE_UNDEFINED	0
> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
> +#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
> +#define CRAT_IOLINK_TYPE_OTHER		3
> +#define CRAT_IOLINK_TYPE_MAX		255
> +
> +#define CRAT_IOLINK_RESERVED_LENGTH 24
> +
> +struct crat_subtype_iolink {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain_from;
> +	uint32_t	proximity_domain_to;
> +	uint8_t		io_interface_type;
> +	uint8_t		version_major;
> +	uint16_t	version_minor;
> +	uint32_t	minimum_latency;
> +	uint32_t	maximum_latency;
> +	uint32_t	minimum_bandwidth_mbs;
> +	uint32_t	maximum_bandwidth_mbs;
> +	uint32_t	recommended_transfer_size;
> +	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA generic sub-type header
> + */
> +
> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
> +
> +struct crat_subtype_generic {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +};
> +
> +/*
> + * Component Locality Distance Information Table (CDIT)
> + */
> +#define CDIT_OEMID_LENGTH	6
> +#define CDIT_OEMTABLEID_LENGTH	8
> +
> +struct cdit_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CDIT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		entry[1];
> +};
> +
> +#pragma pack()
> +
> +#endif /* KFD_CRAT_H_INCLUDED */
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> new file mode 100644
> index 0000000..d122920
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -0,0 +1,162 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/amd-iommu.h>
> +#include <linux/bsearch.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +static const struct kfd_device_info bonaire_device_info = {
> +	.max_pasid_bits = 16,
> +};
> +
> +struct kfd_deviceid {
> +	unsigned short did;
> +	const struct kfd_device_info *device_info;
> +};
> +
> +/* Please keep this sorted by increasing device id. */
> +static const struct kfd_deviceid supported_devices[] = {
> +	{ 0x1305, &bonaire_device_info },	/* Kaveri */
> +	{ 0x1307, &bonaire_device_info },	/* Kaveri */
> +	{ 0x130F, &bonaire_device_info },	/* Kaveri */
> +	{ 0x665C, &bonaire_device_info },	/* Bonaire */
> +};
> +
> +static const struct kfd_device_info *
> +lookup_device_info(unsigned short did)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
> +		if (supported_devices[i].did == did) {
> +			BUG_ON(supported_devices[i].device_info == NULL);
> +			return supported_devices[i].device_info;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
> +{
> +	struct kfd_dev *kfd;
> +
> +	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
> +
> +	if (!device_info)
> +		return NULL;
> +
> +	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
> +	kfd->kgd = kgd;
> +	kfd->device_info = device_info;
> +	kfd->pdev = pdev;
> +
> +	return kfd;
> +}
> +
> +static bool
> +device_iommu_pasid_init(struct kfd_dev *kfd)
> +{
> +	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
> +					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +
> +	struct amd_iommu_device_info iommu_info;
> +	pasid_t pasid_limit;
> +	int err;
> +
> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +	if (err < 0)
> +		return false;
> +
> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
> +		return false;
> +
> +	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
> +	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit);
> +
> +	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +	if (err < 0)
> +		return false;
> +
> +	if (!radeon_kfd_set_pasid_limit(pasid_limit)) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> +{
> +	struct kfd_dev *dev = radeon_kfd_device_by_pci_dev(pdev);
> +
> +	if (dev)
> +		radeon_kfd_unbind_process_from_device(dev, pasid);
> +}
> +
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources)
> +{
> +	kfd->shared_resources = *gpu_resources;
> +
> +	kfd->regs = gpu_resources->mmio_registers;
> +
> +	if (!device_iommu_pasid_init(kfd))
> +		return false;
> +
> +	if (kfd_topology_add_device(kfd) != 0) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
> +
> +	if (kfd->device_info->scheduler_class->create(kfd, &kfd->scheduler)) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	kfd->device_info->scheduler_class->start(kfd->scheduler);
> +
> +	kfd->init_complete = true;
> +
> +	return true;
> +}
> +
> +void kgd2kfd_device_exit(struct kfd_dev *kfd)
> +{
> +	int err = kfd_topology_remove_device(kfd);
> +
> +	BUG_ON(err != 0);
> +
> +	if (kfd->init_complete) {
> +		kfd->device_info->scheduler_class->stop(kfd->scheduler);
> +		kfd->device_info->scheduler_class->destroy(kfd->scheduler);
> +
> +		amd_iommu_free_device(kfd->pdev);
> +	}
> +
> +	kfree(kfd);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_module.c b/drivers/gpu/hsa/radeon/kfd_module.c
> new file mode 100644
> index 0000000..6978bc0
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_module.c
> @@ -0,0 +1,117 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/notifier.h>
> +
> +#include "kfd_priv.h"
> +
> +#define DRIVER_AUTHOR		"Andrew Lewycky, Oded Gabbay, Evgeny Pinchuk, others."
> +
> +#define DRIVER_NAME		"kfd"
> +#define DRIVER_DESC		"AMD HSA Kernel Fusion Driver"
> +#define DRIVER_DATE		"20140127"
> +
> +const struct kfd2kgd_calls *kfd2kgd;
> +static const struct kgd2kfd_calls kgd2kfd = {
> +	.exit		= kgd2kfd_exit,
> +	.probe		= kgd2kfd_probe,
> +	.device_init	= kgd2kfd_device_init,
> +	.device_exit	= kgd2kfd_device_exit,
> +};
> +
> +bool kgd2kfd_init(unsigned interface_version,
> +		  const struct kfd2kgd_calls *f2g,
> +		  const struct kgd2kfd_calls **g2f)
> +{
> +	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
> +	if (interface_version != KFD_INTERFACE_VERSION)
> +		return false;
> +
> +	kfd2kgd = f2g;
> +	*g2f = &kgd2kfd;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(kgd2kfd_init);
> +
> +void kgd2kfd_exit(void)
> +{
> +}
> +
> +extern int kfd_process_exit(struct notifier_block *nb,
> +				unsigned long action, void *data);
> +
> +static struct notifier_block kfd_mmput_nb = {
> +	.notifier_call		= kfd_process_exit,
> +	.priority		= 3,
> +};
> +
> +static int __init kfd_module_init(void)
> +{
> +	int err;
> +
> +	err = radeon_kfd_pasid_init();
> +	if (err < 0)
> +		goto err_pasid;
> +
> +	err = radeon_kfd_chardev_init();
> +	if (err < 0)
> +		goto err_ioctl;
> +
> +	err = mmput_register_notifier(&kfd_mmput_nb);
> +	if (err)
> +		goto err_mmu_notifier;
> +
> +	err = kfd_topology_init();
> +	if (err < 0)
> +		goto err_topology;
> +
> +	pr_info("[hsa] Initialized kfd module");
> +
> +	return 0;
> +err_topology:
> +	mmput_unregister_notifier(&kfd_mmput_nb);
> +err_mmu_notifier:
> +	radeon_kfd_chardev_exit();
> +err_ioctl:
> +	radeon_kfd_pasid_exit();
> +err_pasid:
> +	return err;
> +}
> +
> +static void __exit kfd_module_exit(void)
> +{
> +	kfd_topology_shutdown();
> +	mmput_unregister_notifier(&kfd_mmput_nb);
> +	radeon_kfd_chardev_exit();
> +	radeon_kfd_pasid_exit();
> +	pr_info("[hsa] Removed kfd module");
> +}
> +
> +module_init(kfd_module_init);
> +module_exit(kfd_module_exit);
> +
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +MODULE_LICENSE("GPL");

If it is GPL then comment at the top of all files must reflect that
and not use some special worded license.

> diff --git a/drivers/gpu/hsa/radeon/kfd_pasid.c b/drivers/gpu/hsa/radeon/kfd_pasid.c
> new file mode 100644
> index 0000000..d78bd00
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_pasid.c
> @@ -0,0 +1,92 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include "kfd_priv.h"
> +
> +#define INITIAL_PASID_LIMIT (1<<20)
> +
> +static unsigned long *pasid_bitmap;
> +static pasid_t pasid_limit;
> +static DEFINE_MUTEX(pasid_mutex);
> +
> +int radeon_kfd_pasid_init(void)
> +{
> +	pasid_limit = INITIAL_PASID_LIMIT;
> +
> +	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
> +	if (!pasid_bitmap)
> +		return -ENOMEM;
> +
> +	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
> +
> +	return 0;
> +}
> +
> +void radeon_kfd_pasid_exit(void)
> +{
> +	kfree(pasid_bitmap);
> +}
> +
> +bool radeon_kfd_set_pasid_limit(pasid_t new_limit)
> +{
> +	if (new_limit < pasid_limit) {
> +		bool ok;
> +
> +		mutex_lock(&pasid_mutex);
> +
> +		/* ensure that no pasids >= new_limit are in-use */
> +		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
> +		if (ok)
> +			pasid_limit = new_limit;
> +
> +		mutex_unlock(&pasid_mutex);
> +
> +		return ok;
> +	}
> +
> +	return true;
> +}
> +
> +pasid_t radeon_kfd_pasid_alloc(void)
> +{
> +	pasid_t found;
> +
> +	mutex_lock(&pasid_mutex);
> +
> +	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
> +	if (found == pasid_limit)
> +		found = 0;
> +	else
> +		set_bit(found, pasid_bitmap);
> +
> +	mutex_unlock(&pasid_mutex);
> +
> +	return found;
> +}
> +
> +void radeon_kfd_pasid_free(pasid_t pasid)
> +{
> +	BUG_ON(pasid == 0 || pasid >= pasid_limit);
> +	clear_bit(pasid, pasid_bitmap);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
> new file mode 100644
> index 0000000..1d1dbcf
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_priv.h
> @@ -0,0 +1,232 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_PRIV_H_INCLUDED
> +#define KFD_PRIV_H_INCLUDED
> +
> +#include <linux/hashtable.h>
> +#include <linux/mmu_notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/radeon_kfd.h>
> +#include <linux/types.h>
> +
> +struct kfd_scheduler_class;
> +
> +#define MAX_KFD_DEVICES 16	/* Global limit - only MAX_KFD_DEVICES will be supported by KFD. */
> +
> +/*
> + * Per-process limit. Each process can only
> + * create MAX_PROCESS_QUEUES across all devices
> + */
> +#define MAX_PROCESS_QUEUES 1024
> +
> +#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
> +#define KFD_SYSFS_FILE_MODE 0444
> +
> +/* We multiplex different sorts of mmap-able memory onto /dev/kfd.
> +** We figure out what type of memory the caller wanted by comparing the mmap page offset to known ranges. */
> +#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
> +#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
> +
> +/* GPU ID hash width in bits */
> +#define KFD_GPU_ID_HASH_WIDTH 16
> +
> +/* Macro for allocating structures */
> +#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
> +
> +/* Large enough to hold the maximum usable pasid + 1.
> +** It must also be able to store the number of doorbells reported by a KFD device. */
> +typedef unsigned int pasid_t;

Same on unsigned int.

> +
> +/* Type that represents a HW doorbell slot. */
> +typedef u32 doorbell_t;
> +
> +struct kfd_device_info {
> +	const struct kfd_scheduler_class *scheduler_class;
> +	unsigned int max_pasid_bits;
> +};
> +
> +struct kfd_dev {
> +	struct kgd_dev *kgd;
> +
> +	const struct kfd_device_info *device_info;
> +	struct pci_dev *pdev;
> +
> +	void __iomem *regs;
> +
> +	bool init_complete;
> +
> +	unsigned int id;		/* topology stub index */
> +
> +	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
> +					 * KFD. It is aligned for mapping
> +					 * into user mode
> +					 */
> +	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
> +					 * to HW doorbell, GFX reserved some
> +					 * at the start)
> +					 */
> +	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
> +
> +	struct kgd2kfd_shared_resources shared_resources;
> +
> +	struct kfd_scheduler *scheduler;
> +};
> +
> +/* KGD2KFD callbacks */
> +void kgd2kfd_exit(void);
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources);
> +void kgd2kfd_device_exit(struct kfd_dev *kfd);
> +
> +extern const struct kfd2kgd_calls *kfd2kgd;
> +
> +
> +/* KFD2KGD callback wrappers */
> +void radeon_kfd_lock_srbm_index(struct kfd_dev *kfd);
> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *kfd);
> +
> +enum kfd_mempool {
> +	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
> +	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
> +	KFD_MEMPOOL_FRAMEBUFFER = 3,
> +};
> +
> +struct kfd_mem_obj_s; /* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
> +typedef struct kfd_mem_obj_s *kfd_mem_obj;
> +
> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj);
> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +
> +/* Character device interface */
> +int radeon_kfd_chardev_init(void);
> +void radeon_kfd_chardev_exit(void);
> +struct device *radeon_kfd_chardev(void);
> +
> +/* Scheduler */
> +struct kfd_scheduler;
> +struct kfd_scheduler_process;
> +struct kfd_scheduler_queue {
> +	uint64_t dummy;
> +};
> +
> +struct kfd_queue {
> +	struct kfd_dev *dev;
> +
> +	/* scheduler_queue must be last. It is variable sized (dev->device_info->scheduler_class->queue_size) */
> +	struct kfd_scheduler_queue scheduler_queue;
> +};
> +
> +/* Data that is per-process-per device. */
> +struct kfd_process_device {
> +	/* List of all per-device data for a process. Starts from kfd_process.per_device_data. */
> +	struct list_head per_device_list;
> +
> +	/* The device that owns this data. */
> +	struct kfd_dev *dev;
> +
> +	/* The user-mode address of the doorbell mapping for this device. */
> +	doorbell_t __user *doorbell_mapping;
> +
> +	/* The number of queues created by this process for this device. */
> +	uint32_t queue_count;
> +
> +	/* Scheduler process data for this device. */
> +	struct kfd_scheduler_process *scheduler_process;
> +
> +	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
> +	bool bound;
> +};
> +
> +/* Process data */
> +struct kfd_process {
> +	struct list_head processes_list;
> +
> +	struct mm_struct *mm;
> +
> +	struct mutex mutex;
> +
> +	/* In any process, the thread that started main() is the lead thread and outlives the rest.
> +	 * It is here because amd_iommu_bind_pasid wants a task_struct. */
> +	struct task_struct *lead_thread;
> +
> +	pasid_t pasid;
> +
> +	/* List of kfd_process_device structures, one for each device the process is using. */
> +	struct list_head per_device_data;
> +
> +	/* The process's queues. */
> +	size_t queue_array_size;
> +	struct kfd_queue **queues;	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
> +	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
> +};
> +
> +struct kfd_process *radeon_kfd_create_process(const struct task_struct *);
> +struct kfd_process *radeon_kfd_get_process(const struct task_struct *);
> +
> +struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p);
> +void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
> +struct kfd_process_device *radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p);
> +
> +bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id);
> +void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue);
> +void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id);
> +struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id);
> +
> +
> +/* PASIDs */
> +int radeon_kfd_pasid_init(void);
> +void radeon_kfd_pasid_exit(void);
> +bool radeon_kfd_set_pasid_limit(pasid_t new_limit);
> +pasid_t radeon_kfd_pasid_alloc(void);
> +void radeon_kfd_pasid_free(pasid_t pasid);
> +
> +/* Doorbells */
> +void radeon_kfd_doorbell_init(struct kfd_dev *kfd);
> +int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
> +doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
> +					   unsigned int doorbell_index);
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id);
> +
> +extern struct device *kfd_device;
> +
> +/* Topology */
> +int kfd_topology_init(void);
> +void kfd_topology_shutdown(void);
> +int kfd_topology_add_device(struct kfd_dev *gpu);
> +int kfd_topology_remove_device(struct kfd_dev *gpu);
> +struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id);
> +struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
> +
> +/* MMIO registers */
> +#define WRITE_REG(dev, reg, value) radeon_kfd_write_reg((dev), (reg), (value))
> +#define READ_REG(dev, reg) radeon_kfd_read_reg((dev), (reg))
> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_process.c b/drivers/gpu/hsa/radeon/kfd_process.c
> new file mode 100644
> index 0000000..145ee38
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_process.c
> @@ -0,0 +1,400 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/mutex.h>
> +#include <linux/log2.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/amd-iommu.h>
> +#include <linux/notifier.h>
> +struct mm_struct;
> +
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +/* Initial size for the array of queues.
> + * The allocated size is doubled each time it is exceeded up to MAX_PROCESS_QUEUES. */
> +#define INITIAL_QUEUE_ARRAY_SIZE 16
> +
> +/* List of struct kfd_process */
> +static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
> +
> +static DEFINE_MUTEX(kfd_processes_mutex);
> +
> +static struct kfd_process *create_process(const struct task_struct *thread);
> +
> +struct kfd_process*
> +radeon_kfd_create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	/*
> +	 * take kfd processes mutex before starting of process creation
> +	 * so there won't be a case where two threads of the same process
> +	 * create two kfd_process structures
> +	 */
> +	mutex_lock(&kfd_processes_mutex);

Given that this is to protect mm->kfd_process i would rather that you
use some mm lock so that if another non kfd code ever need to check
this variable in a sensible way then it could protect itself with a
mm lock.

But again i believe that mm_struct should not have a new kfd field but
rather some generic iommu pasid field that can then forward through
generic iommu code things to kfd.

> +
> +	/* A prior open of /dev/kfd could have already created the process. */
> +	process = thread->mm->kfd_process;
> +	if (process)
> +		pr_debug("kfd: process already found\n");
> +
> +	if (!process)
> +		process = create_process(thread);
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return process;
> +}
> +
> +struct kfd_process*
> +radeon_kfd_get_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	process = thread->mm->kfd_process;
> +
> +	return process;
> +}
> +
> +/* Assumes that the kfd_process mutex is held.
> + * (Or that it doesn't need to be held because the process is exiting.)
> + *
> + * dev_filter can be set to only destroy queues for one device.
> + * Otherwise all queues for the process are destroyed.
> + */
> +static void
> +destroy_queues(struct kfd_process *p, struct kfd_dev *dev_filter)
> +{
> +	unsigned long queue_id;
> +
> +	for_each_set_bit(queue_id, p->allocated_queue_bitmap, MAX_PROCESS_QUEUES) {
> +
> +		struct kfd_queue *queue = radeon_kfd_get_queue(p, queue_id);
> +		struct kfd_dev *dev;
> +
> +		BUG_ON(queue == NULL);
> +
> +		dev = queue->dev;
> +
> +		if (!dev_filter || dev == dev_filter) {
> +			struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
> +
> +			BUG_ON(pdd == NULL); /* A queue exists so pdd must. */
> +
> +			radeon_kfd_remove_queue(p, queue_id);
> +			dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +			kfree(queue);
> +
> +			BUG_ON(pdd->queue_count == 0);
> +			BUG_ON(pdd->scheduler_process == NULL);
> +
> +			if (--pdd->queue_count == 0) {
> +				dev->device_info->scheduler_class->deregister_process(dev->scheduler,
> +							pdd->scheduler_process);
> +				pdd->scheduler_process = NULL;
> +			}
> +		}
> +	}
> +}
> +
> +static void free_process(struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd, *temp;
> +
> +	BUG_ON(p == NULL);
> +
> +	destroy_queues(p, NULL);
> +
> +	/* doorbell mappings: automatic */
> +
> +	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
> +		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> +		list_del(&pdd->per_device_list);
> +		kfree(pdd);
> +	}
> +
> +	radeon_kfd_pasid_free(p->pasid);
> +
> +	mutex_destroy(&p->mutex);
> +
> +	kfree(p->queues);
> +
> +	list_del(&p->processes_list);
> +
> +	kfree(p);
> +}
> +
> +int kfd_process_exit(struct notifier_block *nb,
> +			unsigned long action, void *data)
> +{
> +	struct mm_struct *mm = data;
> +	struct kfd_process *p;
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	p = mm->kfd_process;
> +	if (p) {
> +		free_process(p);
> +		mm->kfd_process = NULL;
> +	}
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return 0;
> +}
> +
> +static struct kfd_process *create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +	int err = -ENOMEM;
> +
> +	process = kzalloc(sizeof(*process), GFP_KERNEL);
> +
> +	if (!process)
> +		goto err_alloc;
> +
> +	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
> +	if (!process->queues)
> +		goto err_alloc;
> +
> +	process->pasid = radeon_kfd_pasid_alloc();
> +	if (process->pasid == 0)
> +		goto err_alloc;
> +
> +	mutex_init(&process->mutex);
> +
> +	process->mm = thread->mm;
> +	thread->mm->kfd_process = process;
> +	list_add_tail(&process->processes_list, &kfd_processes_list);
> +
> +	process->lead_thread = thread->group_leader;
> +
> +	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
> +
> +	INIT_LIST_HEAD(&process->per_device_data);
> +
> +	return process;
> +
> +err_alloc:
> +	kfree(process->queues);
> +	kfree(process);
> +	return ERR_PTR(err);
> +}
> +
> +struct kfd_process_device *
> +radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd;
> +
> +	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> +		if (pdd->dev == dev)
> +			return pdd;
> +
> +	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
> +	if (pdd != NULL) {
> +		pdd->dev = dev;
> +		list_add(&pdd->per_device_list, &p->per_device_data);
> +	}
> +
> +	return pdd;
> +}
> +
> +/* Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
> + * Unbinding occurs when the process dies or the device is removed.
> + *
> + * Assumes that the process lock is held.
> + */
> +struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
> +	int err;
> +
> +	if (pdd == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (pdd->bound)
> +		return pdd;
> +
> +	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);

Are we to assume that for eternity this will not work on iommu that do support
PASID/ATS but are not from AMD ? If it was an APU specific function i would
understand but it seems that the IOMMU API needs to grow. I am pretty sure
Intel will have an ATS/PASID IOMMU.

> +	if (err < 0)
> +		return ERR_PTR(err);
> +
> +	pdd->bound = true;
> +
> +	return pdd;
> +}
> +
> +void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
> +{
> +	struct kfd_process *p;
> +	struct kfd_process_device *pdd;
> +
> +	BUG_ON(dev == NULL);
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	list_for_each_entry(p, &kfd_processes_list, processes_list)
> +		if (p->pasid == pasid)
> +			break;
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	BUG_ON(p->pasid != pasid);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +
> +	BUG_ON(pdd == NULL);
> +
> +	mutex_lock(&p->mutex);
> +
> +	destroy_queues(p, dev);
> +
> +	/* All queues just got destroyed so this should be gone. */
> +	BUG_ON(pdd->scheduler_process != NULL);
> +
> +	/*
> +	 * Just mark pdd as unbound, because we still need it to call
> +	 * amd_iommu_unbind_pasid() in when the process exits.
> +	 * We don't call amd_iommu_unbind_pasid() here
> +	 * because the IOMMU called us.
> +	 */
> +	pdd->bound = false;
> +
> +	mutex_unlock(&p->mutex);
> +}
> +
> +/* Ensure that the process's queue array is large enough to hold the queue at queue_id.
> + * Assumes that the process lock is held. */
> +static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
> +{
> +	size_t desired_size;
> +	struct kfd_queue **new_queues;
> +
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
> +			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
> +	/* Ensure that doubling the current size won't ever overflow. */
> +	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
> +
> +	/*
> +	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
> +	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
> +	 */
> +	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
> +	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
> +			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
> +			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +
> +	if (queue_id < p->queue_array_size)
> +		return true;
> +
> +	if (queue_id >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	desired_size = p->queue_array_size;
> +	while (desired_size <= queue_id)
> +		desired_size *= 2;
> +
> +	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
> +	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
> +
> +	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
> +	if (!new_queues)
> +		return false;
> +
> +	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
> +
> +	kfree(p->queues);
> +	p->queues = new_queues;
> +	p->queue_array_size = desired_size;
> +
> +	return true;
> +}
> +
> +/* Assumes that the process lock is held. */
> +bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
> +{
> +	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
> +
> +	if (qid >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	if (!ensure_queue_array_size(p, qid))
> +		return false;
> +
> +	__set_bit(qid, p->allocated_queue_bitmap);
> +
> +	p->queues[qid] = NULL;
> +	*queue_id = qid;
> +
> +	return true;
> +}
> +
> +/* Install a queue into a previously-allocated queue id.
> + *  Assumes that the process lock is held. */
> +void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
> +{
> +	BUG_ON(queue_id >= p->queue_array_size); /* Have to call allocate_queue_id before install_queue. */
> +	BUG_ON(queue == NULL);
> +
> +	p->queues[queue_id] = queue;
> +}
> +
> +/* Remove a queue from the open queue list and deallocate the queue id.
> + * This can be called whether or not a queue was installed.
> + * Assumes that the process lock is held. */
> +void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
> +	BUG_ON(queue_id >= p->queue_array_size);
> +
> +	__clear_bit(queue_id, p->allocated_queue_bitmap);
> +}
> +
> +/* Assumes that the process lock is held. */
> +struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	/* test_bit because the contents of unallocated queue slots are undefined.
> +	 * Otherwise ensure_queue_array_size would have to clear new entries and
> +	 * remove_queue would have to NULL removed queues. */
> +	return (queue_id < p->queue_array_size &&
> +		test_bit(queue_id, p->allocated_queue_bitmap)) ?
> +			p->queues[queue_id] : NULL;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> new file mode 100644
> index 0000000..48a032f
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_SCHEDULER_H_INCLUDED
> +#define KFD_SCHEDULER_H_INCLUDED
> +
> +#include <linux/types.h>
> +struct kfd_process;
> +
> +/* Opaque types for scheduler private data. */
> +struct kfd_scheduler;
> +struct kfd_scheduler_process;
> +struct kfd_scheduler_queue;
> +
> +struct kfd_scheduler_class {
> +	const char *name;
> +
> +	int (*create)(struct kfd_dev *, struct kfd_scheduler **);
> +	void (*destroy)(struct kfd_scheduler *);
> +
> +	void (*start)(struct kfd_scheduler *);
> +	void (*stop)(struct kfd_scheduler *);
> +
> +	int (*register_process)(struct kfd_scheduler *, struct kfd_process *, struct kfd_scheduler_process **);
> +	void (*deregister_process)(struct kfd_scheduler *, struct kfd_scheduler_process *);
> +
> +	size_t queue_size;
> +
> +	int (*create_queue)(struct kfd_scheduler *scheduler,
> +			    struct kfd_scheduler_process *process,
> +			    struct kfd_scheduler_queue *queue,
> +			    void __user *ring_address,
> +			    uint64_t ring_size,
> +			    void __user *rptr_address,
> +			    void __user *wptr_address,
> +			    unsigned int doorbell);
> +
> +	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
> +};
> +
> +extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_topology.c b/drivers/gpu/hsa/radeon/kfd_topology.c
> new file mode 100644
> index 0000000..6acac25
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_topology.c
> @@ -0,0 +1,1201 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/errno.h>
> +#include <linux/acpi.h>
> +#include <linux/hash.h>
> +
> +#include "kfd_priv.h"
> +#include "kfd_crat.h"
> +#include "kfd_topology.h"
> +
> +static struct list_head topology_device_list;
> +static int topology_crat_parsed;
> +static struct kfd_system_properties sys_props;
> +
> +static DECLARE_RWSEM(topology_lock);
> +
> +
> +static uint8_t checksum_image(const void *buf, size_t len)
> +{
> +	uint8_t *p = (uint8_t *)buf;
> +	uint8_t sum = 0;
> +
> +	if (!buf)
> +		return 0;
> +
> +	while (len-- > 0)
> +		sum += *p++;
> +
> +	return sum;
> +		}
> +
> +struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu_id == gpu_id) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu->pdev == pdev) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
> +{
> +	struct acpi_table_header *crat_table;
> +	acpi_status status;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +/*
> +	 * Fetch the CRAT table from ACPI
> + */
> +	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
> +	if (status == AE_NOT_FOUND) {
> +		pr_warn("CRAT table not found\n");
> +		return -ENODATA;
> +	} else if (ACPI_FAILURE(status)) {
> +		const char *err = acpi_format_exception(status);
> +
> +		pr_err("CRAT table error: %s\n", err);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * The checksum of the table should be verified
> +	 */
> +	if (checksum_image(crat_table, crat_table->length) ==
> +		crat_table->checksum) {
> +		pr_err("Bad checksum for the CRAT table\n");
> +		return -EINVAL;
> +}
> +
> +
> +	if (*size >= crat_table->length && crat_image != 0)
> +		memcpy(crat_image, crat_table, crat_table->length);
> +
> +	*size = crat_table->length;
> +
> +	return 0;
> +}
> +
> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
> +	dev->node_props.cpu_core_id_base = cu->processor_id_low;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
> +		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
> +
> +	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
> +			cu->processor_id_low);
> +}
> +
> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.simd_id_base = cu->processor_id_low;
> +	dev->node_props.simd_count = cu->num_simd_cores;
> +	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
> +	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
> +	dev->node_props.wave_front_size = cu->wave_front_size;
> +	dev->node_props.mem_banks_count = cu->num_banks;
> +	dev->node_props.array_count = cu->num_arrays;
> +	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
> +	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
> +	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
> +		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
> +	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
> +				cu->processor_id_low);
> +}
> +
> +/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
> +{
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!cu);
> +
> +	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
> +			cu->proximity_domain, cu->hsa_capability);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (cu->proximity_domain == i) {
> +			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
> +				kfd_populated_cu_info_cpu(dev, cu);
> +
> +			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
> +				kfd_populated_cu_info_gpu(dev, cu);
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
> +{
> +	struct kfd_mem_properties *props;
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!mem);
> +
> +	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
> +			mem->promixity_domain);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (mem->promixity_domain == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			if (dev->node_props.cpu_cores_count == 0)
> +				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
> +			else
> +				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
> +
> +			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
> +				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
> +			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
> +				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
> +
> +			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
> +						mem->length_low;
> +			props->width = mem->width;
> +
> +			dev->mem_bank_count++;
> +			list_add_tail(&props->list, &dev->mem_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
> +{
> +	struct kfd_cache_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t id;
> +
> +	BUG_ON(!cache);
> +
> +	id = cache->processor_id_low;
> +
> +	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (id == dev->node_props.cpu_core_id_base ||
> +		    id == dev->node_props.simd_id_base) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->processor_id_low = id;
> +			props->cache_level = cache->cache_level;
> +			props->cache_size = cache->cache_size;
> +			props->cacheline_size = cache->cache_line_size;
> +			props->cachelines_per_tag = cache->lines_per_tag;
> +			props->cache_assoc = cache->associativity;
> +			props->cache_latency = cache->cache_latency;
> +
> +			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_DATA;
> +			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
> +			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_CPU;
> +			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_HSACU;
> +
> +			dev->cache_count++;
> +			dev->node_props.caches_count++;
> +			list_add_tail(&props->list, &dev->cache_props);
> +
> +			break;
> +		}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
> +{
> +	struct kfd_iolink_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +	uint32_t id_from;
> +	uint32_t id_to;
> +
> +	BUG_ON(!iolink);
> +
> +	id_from = iolink->proximity_domain_from;
> +	id_to = iolink->proximity_domain_to;
> +
> +	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (id_from == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->node_from = id_from;
> +			props->node_to = id_to;
> +			props->ver_maj = iolink->version_major;
> +			props->ver_min = iolink->version_minor;
> +
> +			/*
> +			 * weight factor (derived from CDIR), currently always 1
> +			 */
> +			props->weight = 1;
> +
> +			props->min_latency = iolink->minimum_latency;
> +			props->max_latency = iolink->maximum_latency;
> +			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
> +			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
> +			props->rec_transfer_size =
> +					iolink->recommended_transfer_size;
> +
> +			dev->io_link_count++;
> +			dev->node_props.io_links_count++;
> +			list_add_tail(&props->list, &dev->io_link_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
> +{
> +	struct crat_subtype_computeunit *cu;
> +	struct crat_subtype_memory *mem;
> +	struct crat_subtype_cache *cache;
> +	struct crat_subtype_iolink *iolink;
> +	int ret = 0;
> +
> +	BUG_ON(!sub_type_hdr);
> +
> +	switch (sub_type_hdr->type) {
> +	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
> +		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cu(cu);
> +		break;
> +	case CRAT_SUBTYPE_MEMORY_AFFINITY:
> +		mem = (struct crat_subtype_memory *)sub_type_hdr;
> +		ret = kfd_parse_subtype_mem(mem);
> +		break;
> +	case CRAT_SUBTYPE_CACHE_AFFINITY:
> +		cache = (struct crat_subtype_cache *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cache(cache);
> +		break;
> +	case CRAT_SUBTYPE_TLB_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found TLB entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_IOLINK_AFFINITY:
> +		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
> +		ret = kfd_parse_subtype_iolink(iolink);
> +		break;
> +	default:
> +		pr_warn("Unknown subtype (%d) in CRAT\n",
> +				sub_type_hdr->type);
> +	}
> +
> +	return ret;
> +}
> +
> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
> +{
> +	struct kfd_mem_properties *mem;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_iolink_properties *iolink;
> +
> +	BUG_ON(!dev);
> +
> +	list_del(&dev->list);
> +
> +	while (dev->mem_props.next != &dev->mem_props) {
> +		mem = container_of(dev->mem_props.next,
> +				struct kfd_mem_properties, list);
> +		list_del(&mem->list);
> +		kfree(mem);
> +	}
> +
> +	while (dev->cache_props.next != &dev->cache_props) {
> +		cache = container_of(dev->cache_props.next,
> +				struct kfd_cache_properties, list);
> +		list_del(&cache->list);
> +		kfree(cache);
> +	}
> +
> +	while (dev->io_link_props.next != &dev->io_link_props) {
> +		iolink = container_of(dev->io_link_props.next,
> +				struct kfd_iolink_properties, list);
> +		list_del(&iolink->list);
> +		kfree(iolink);
> +	}
> +
> +	kfree(dev);
> +
> +	sys_props.num_devices--;
> +}
> +
> +static void kfd_release_live_view(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	while (topology_device_list.next != &topology_device_list) {
> +		dev = container_of(topology_device_list.next,
> +				 struct kfd_topology_device, list);
> +		kfd_release_topology_device(dev);
> +}
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +}
> +
> +static struct kfd_topology_device *kfd_create_topology_device(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	dev = kfd_alloc_struct(dev);
> +	if (dev == 0) {
> +		pr_err("No memory to allocate a topology device");
> +		return 0;
> +	}
> +
> +	INIT_LIST_HEAD(&dev->mem_props);
> +	INIT_LIST_HEAD(&dev->cache_props);
> +	INIT_LIST_HEAD(&dev->io_link_props);
> +
> +	list_add_tail(&dev->list, &topology_device_list);
> +	sys_props.num_devices++;
> +
> +	return dev;
> +	}
> +
> +static int kfd_parse_crat_table(void *crat_image)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct crat_subtype_generic *sub_type_hdr;
> +	uint16_t node_id;
> +	int ret;
> +	struct crat_header *crat_table = (struct crat_header *)crat_image;
> +	uint16_t num_nodes;
> +	uint32_t image_len;
> +
> +	if (!crat_image)
> +		return -EINVAL;
> +
> +	num_nodes = crat_table->num_domains;
> +	image_len = crat_table->length;
> +
> +	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
> +
> +	for (node_id = 0; node_id < num_nodes; node_id++) {
> +		top_dev = kfd_create_topology_device();
> +		if (!top_dev) {
> +			kfd_release_live_view();
> +			return -ENOMEM;
> +	}
> +}
> +
> +	sys_props.platform_id = *((uint64_t *)crat_table->oem_id);
> +	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
> +	sys_props.platform_rev = crat_table->revision;
> +
> +	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
> +	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
> +			((char *)crat_image) + image_len) {
> +		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
> +			ret = kfd_parse_subtype(sub_type_hdr);
> +			if (ret != 0) {
> +				kfd_release_live_view();
> +				return ret;
> +			}
> +		}
> +
> +		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +				sub_type_hdr->length);
> +	}
> +
> +	sys_props.generation_count++;
> +	topology_crat_parsed = 1;
> +
> +	return 0;
> +}
> +
> +
> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
> +		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
> +#define sysfs_show_32bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
> +#define sysfs_show_64bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
> +#define sysfs_show_32bit_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%u\n", value)
> +#define sysfs_show_str_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%s\n", value)
> +
> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (attr == &sys_props.attr_genid) {
> +		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
> +	} else if (attr == &sys_props.attr_props) {
> +		sysfs_show_64bit_prop(buffer, "platform_oem",
> +				sys_props.platform_oem);
> +		sysfs_show_64bit_prop(buffer, "platform_id",
> +				sys_props.platform_id);
> +		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
> +				sys_props.platform_rev);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops sysprops_ops = {
> +	.show = sysprops_show,
> +};
> +
> +static struct kobj_type sysprops_type = {
> +	.sysfs_ops = &sysprops_ops,
> +};
> +
> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_iolink_properties *iolink;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	iolink = container_of(attr, struct kfd_iolink_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
> +	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
> +	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
> +	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
> +	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
> +	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
> +	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
> +	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
> +	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
> +			iolink->rec_transfer_size);
> +	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops iolink_ops = {
> +	.show = iolink_show,
> +};
> +
> +static struct kobj_type iolink_type = {
> +	.sysfs_ops = &iolink_ops,
> +};
> +
> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_mem_properties *mem;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	mem = container_of(attr, struct kfd_mem_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
> +	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
> +	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
> +	sysfs_show_32bit_prop(buffer, "width", mem->width);
> +	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops mem_ops = {
> +	.show = mem_show,
> +};
> +
> +static struct kobj_type mem_type = {
> +	.sysfs_ops = &mem_ops,
> +};
> +
> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	uint32_t i;
> +	struct kfd_cache_properties *cache;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	cache = container_of(attr, struct kfd_cache_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "processor_id_low",
> +			cache->processor_id_low);
> +	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
> +	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
> +	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
> +	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
> +			cache->cachelines_per_tag);
> +	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
> +	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
> +	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
> +	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
> +	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
> +		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
> +				buffer, cache->sibling_map[i],
> +				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
> +						"\n" : ",");
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops cache_ops = {
> +	.show = kfd_cache_show,
> +};
> +
> +static struct kobj_type cache_type = {
> +	.sysfs_ops = &cache_ops,
> +};
> +
> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_topology_device *dev;
> +	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +	uint32_t i;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (strcmp(attr->name, "gpu_id") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_gpuid);
> +		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
> +	} else if (strcmp(attr->name, "name") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_name);
> +		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
> +			public_name[i] =
> +					(char)dev->node_props.marketing_name[i];
> +			if (dev->node_props.marketing_name[i] == 0)
> +				break;
> +		}
> +		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
> +		ret = sysfs_show_str_val(buffer, public_name);
> +	} else {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_props);
> +		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
> +				dev->node_props.cpu_cores_count);
> +		sysfs_show_32bit_prop(buffer, "simd_count",
> +				dev->node_props.simd_count);
> +		sysfs_show_32bit_prop(buffer, "mem_banks_count",
> +				dev->node_props.mem_banks_count);
> +		sysfs_show_32bit_prop(buffer, "caches_count",
> +				dev->node_props.caches_count);
> +		sysfs_show_32bit_prop(buffer, "io_links_count",
> +				dev->node_props.io_links_count);
> +		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
> +				dev->node_props.cpu_core_id_base);
> +		sysfs_show_32bit_prop(buffer, "simd_id_base",
> +				dev->node_props.simd_id_base);
> +		sysfs_show_32bit_prop(buffer, "capability",
> +				dev->node_props.capability);
> +		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
> +				dev->node_props.max_waves_per_simd);
> +		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
> +				dev->node_props.lds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
> +				dev->node_props.gds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "wave_front_size",
> +				dev->node_props.wave_front_size);
> +		sysfs_show_32bit_prop(buffer, "array_count",
> +				dev->node_props.array_count);
> +		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
> +				dev->node_props.simd_arrays_per_engine);
> +		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
> +				dev->node_props.cu_per_simd_array);
> +		sysfs_show_32bit_prop(buffer, "simd_per_cu",
> +				dev->node_props.simd_per_cu);
> +		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
> +				dev->node_props.max_slots_scratch_cu);
> +		sysfs_show_32bit_prop(buffer, "engine_id",
> +				dev->node_props.engine_id);
> +		sysfs_show_32bit_prop(buffer, "vendor_id",
> +				dev->node_props.vendor_id);
> +		sysfs_show_32bit_prop(buffer, "device_id",
> +				dev->node_props.device_id);
> +		sysfs_show_32bit_prop(buffer, "location_id",
> +				dev->node_props.location_id);
> +		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
> +				dev->node_props.max_engine_clk_fcompute);
> +		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
> +				dev->node_props.max_engine_clk_ccompute);
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops node_ops = {
> +	.show = node_show,
> +};
> +
> +static struct kobj_type node_type = {
> +	.sysfs_ops = &node_ops,
> +};
> +
> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
> +{
> +	sysfs_remove_file(kobj, attr);
> +	kobject_del(kobj);
> +	kobject_put(kobj);
> +}
> +
> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +
> +	BUG_ON(!dev);
> +
> +	if (dev->kobj_iolink) {
> +		list_for_each_entry(iolink, &dev->io_link_props, list)
> +			if (iolink->kobj) {
> +				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
> +				iolink->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_iolink);
> +		kobject_put(dev->kobj_iolink);
> +		dev->kobj_iolink = 0;
> +	}
> +
> +	if (dev->kobj_cache) {
> +		list_for_each_entry(cache, &dev->cache_props, list)
> +			if (cache->kobj) {
> +				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
> +				cache->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_cache);
> +		kobject_put(dev->kobj_cache);
> +		dev->kobj_cache = 0;
> +	}
> +
> +	if (dev->kobj_mem) {
> +		list_for_each_entry(mem, &dev->mem_props, list)
> +			if (mem->kobj) {
> +				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
> +				mem->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_mem);
> +		kobject_put(dev->kobj_mem);
> +		dev->kobj_mem = 0;
> +	}
> +
> +	if (dev->kobj_node) {
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
> +		kobject_del(dev->kobj_node);
> +		kobject_put(dev->kobj_node);
> +		dev->kobj_node = 0;
> +	}
> +}
> +
> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
> +		uint32_t id)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +	int ret;
> +	uint32_t i;
> +
> +	BUG_ON(!dev);
> +
> +	/*
> +	 * Creating the sysfs folders
> +	 */
> +	BUG_ON(dev->kobj_node);
> +	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
> +	if (!dev->kobj_node)
> +		return -ENOMEM;
> +
> +	ret = kobject_init_and_add(dev->kobj_node, &node_type,
> +			sys_props.kobj_nodes, "%d", id);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
> +	if (!dev->kobj_mem)
> +		return -ENOMEM;
> +
> +	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
> +	if (!dev->kobj_cache)
> +		return -ENOMEM;
> +
> +	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
> +	if (!dev->kobj_iolink)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Creating sysfs files for node properties
> +	 */
> +	dev->attr_gpuid.name = "gpu_id";
> +	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_gpuid);
> +	dev->attr_name.name = "name";
> +	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_name);
> +	dev->attr_props.name = "properties";
> +	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_props);
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
> +	if (ret < 0)
> +		return ret;
> +
> +	i = 0;
> +	list_for_each_entry(mem, &dev->mem_props, list) {
> +		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!mem->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(mem->kobj, &mem_type,
> +				dev->kobj_mem, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		mem->attr.name = "properties";
> +		mem->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&mem->attr);
> +		ret = sysfs_create_file(mem->kobj, &mem->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(cache, &dev->cache_props, list) {
> +		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!cache->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(cache->kobj, &cache_type,
> +				dev->kobj_cache, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		cache->attr.name = "properties";
> +		cache->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&cache->attr);
> +		ret = sysfs_create_file(cache->kobj, &cache->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(iolink, &dev->io_link_props, list) {
> +		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!iolink->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
> +				dev->kobj_iolink, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		iolink->attr.name = "properties";
> +		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&iolink->attr);
> +		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +}
> +
> +	return 0;
> +}
> +
> +static int kfd_build_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +	int ret;
> +	uint32_t i = 0;
> +
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		ret = kfd_build_sysfs_node_entry(dev, 0);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static void kfd_remove_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		kfd_remove_sysfs_node_entry(dev);
> +}
> +
> +static int kfd_topology_update_sysfs(void)
> +{
> +	int ret;
> +
> +	pr_info("Creating topology SYSFS entries\n");
> +	if (sys_props.kobj_topology == 0) {
> +		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
> +		if (!sys_props.kobj_topology)
> +			return -ENOMEM;
> +
> +		ret = kobject_init_and_add(sys_props.kobj_topology,
> +				&sysprops_type,  &kfd_device->kobj,
> +				"topology");
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.kobj_nodes = kobject_create_and_add("nodes",
> +				sys_props.kobj_topology);
> +		if (!sys_props.kobj_nodes)
> +			return -ENOMEM;
> +
> +		sys_props.attr_genid.name = "generation_id";
> +		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_genid);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.attr_props.name = "system_properties";
> +		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_props);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	kfd_remove_sysfs_node_tree();
> +
> +	return kfd_build_sysfs_node_tree();
> +}
> +
> +static void kfd_topology_release_sysfs(void)
> +{
> +	kfd_remove_sysfs_node_tree();
> +	if (sys_props.kobj_topology) {
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (sys_props.kobj_nodes) {
> +			kobject_del(sys_props.kobj_nodes);
> +			kobject_put(sys_props.kobj_nodes);
> +			sys_props.kobj_nodes = 0;
> +		}
> +		kobject_del(sys_props.kobj_topology);
> +		kobject_put(sys_props.kobj_topology);
> +		sys_props.kobj_topology = 0;
> +	}
> +}
> +
> +int kfd_topology_init(void)
> +{
> +	void *crat_image = 0;
> +	size_t image_size = 0;
> +	int ret;
> +
> +	/*
> +	 * Initialize the head for the topology device list
> +	 */
> +	INIT_LIST_HEAD(&topology_device_list);
> +	init_rwsem(&topology_lock);
> +	topology_crat_parsed = 0;
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +
> +	/*
> +	 * Get the CRAT image from the ACPI
> +	 */
> +	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +	if (ret == 0 && image_size > 0) {
> +		pr_info("Found CRAT image with size=%zd\n", image_size);
> +		crat_image = kmalloc(image_size, GFP_KERNEL);
> +		if (!crat_image) {
> +			ret = -ENOMEM;
> +			pr_err("No memory for allocating CRAT image\n");
> +			goto err;
> +		}
> +		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +
> +		if (ret == 0) {
> +			down_write(&topology_lock);
> +			ret = kfd_parse_crat_table(crat_image);
> +			if (ret == 0)
> +				ret = kfd_topology_update_sysfs();
> +			up_write(&topology_lock);
> +		} else {
> +			pr_err("Couldn't get CRAT table size from ACPI\n");
> +		}
> +		kfree(crat_image);
> +	} else if (ret == -ENODATA) {
> +		ret = 0;
> +	} else {
> +		pr_err("Couldn't get CRAT table size from ACPI\n");
> +	}
> +
> +err:
> +	pr_info("Finished initializing topology ret=%d\n", ret);
> +	return ret;
> +}
> +
> +void kfd_topology_shutdown(void)
> +{
> +	kfd_topology_release_sysfs();
> +	kfd_release_live_view();
> +}
> +
> +static void kfd_debug_print_topology(void)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +
> +	pr_info("DEBUG PRINT OF TOPOLOGY:");
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		pr_info("Node: %d\n", i);
> +		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
> +		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
> +		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
> +		i++;
> +	}
> +}
> +
> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
> +{
> +	uint32_t hashout;
> +	uint32_t buf[7];
> +	int i;
> +
> +	if (!gpu)
> +		return 0;
> +
> +	buf[0] = gpu->pdev->devfn;
> +	buf[1] = gpu->pdev->subsystem_vendor;
> +	buf[2] = gpu->pdev->subsystem_device;
> +	buf[3] = gpu->pdev->device;
> +	buf[4] = gpu->pdev->bus->number;
> +	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
> +	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
> +
> +	for (i = 0, hashout = 0; i < 7; i++)
> +		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
> +
> +	return hashout;
> +}
> +
> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	struct kfd_topology_device *out_dev = 0;
> +
> +	BUG_ON(!gpu);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
> +			dev->gpu = gpu;
> +			out_dev = dev;
> +			break;
> +		}
> +
> +	return out_dev;
> +}
> +
> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
> +{
> +	/*
> +	 * TODO: Generate an event for thunk about the arrival/removal
> +	 * of the GPU
> +	 */
> +}
> +
> +int kfd_topology_add_device(struct kfd_dev *gpu)
> +{
> +	uint32_t gpu_id;
> +	struct kfd_topology_device *dev;
> +	int res;
> +
> +	BUG_ON(!gpu);
> +
> +	gpu_id = kfd_generate_gpu_id(gpu);
> +
> +	pr_info("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
> +
> +	down_write(&topology_lock);
> +	/*
> +	 * Try to assign the GPU to existing topology device (generated from
> +	 * CRAT table
> +	 */
> +	dev = kfd_assign_gpu(gpu);
> +	if (!dev) {
> +		pr_info("GPU was not found in the current topology. Extending.\n");
> +		kfd_debug_print_topology();
> +		dev = kfd_create_topology_device();
> +		if (!dev) {
> +			res = -ENOMEM;
> +			goto err;
> +		}
> +		dev->gpu = gpu;
> +
> +		/*
> +		 * TODO: Make a call to retrieve topology information from the
> +		 * GPU vBIOS
> +		 */
> +
> +		/*
> +		 * Update the SYSFS tree, since we added another topology device
> +		 */
> +		if (kfd_topology_update_sysfs() < 0)
> +			kfd_topology_release_sysfs();
> +
> +	}
> +
> +	dev->gpu_id = gpu_id;
> +	gpu->id = gpu_id;
> +	dev->node_props.vendor_id = gpu->pdev->vendor;
> +	dev->node_props.device_id = gpu->pdev->device;
> +	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
> +			(gpu->pdev->devfn & 0xffffff);
> +	/*
> +	 * TODO: Retrieve max engine clock values from KGD
> +	 */
> +
> +	res = 0;
> +
> +err:
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 1);
> +
> +	return res;
> +}
> +
> +int kfd_topology_remove_device(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t gpu_id;
> +	int res = -ENODEV;
> +
> +	BUG_ON(!gpu);
> +
> +	down_write(&topology_lock);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == gpu) {
> +			gpu_id = dev->gpu_id;
> +			kfd_remove_sysfs_node_entry(dev);
> +			kfd_release_topology_device(dev);
> +			res = 0;
> +			if (kfd_topology_update_sysfs() < 0)
> +				kfd_topology_release_sysfs();
> +			break;
> +		}
> +
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 0);
> +
> +	return res;
> +}

I am not convince that sysfs is the right place to expose this.
I need to think on that a bit.

> diff --git a/drivers/gpu/hsa/radeon/kfd_topology.h b/drivers/gpu/hsa/radeon/kfd_topology.h
> new file mode 100644
> index 0000000..989624b
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_topology.h
> @@ -0,0 +1,168 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __KFD_TOPOLOGY_H__
> +#define __KFD_TOPOLOGY_H__
> +
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include "kfd_priv.h"
> +
> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
> +
> +#define HSA_CAP_HOT_PLUGGABLE			0x00000001
> +#define HSA_CAP_ATS_PRESENT			0x00000002
> +#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
> +#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
> +#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
> +#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
> +#define HSA_CAP_VA_LIMIT			0x00000040
> +#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
> +#define HSA_CAP_RESERVED			0xfffff000
> +
> +struct kfd_node_properties {
> +	uint32_t cpu_cores_count;
> +	uint32_t simd_count;
> +	uint32_t mem_banks_count;
> +	uint32_t caches_count;
> +	uint32_t io_links_count;
> +	uint32_t cpu_core_id_base;
> +	uint32_t simd_id_base;
> +	uint32_t capability;
> +	uint32_t max_waves_per_simd;
> +	uint32_t lds_size_in_kb;
> +	uint32_t gds_size_in_kb;
> +	uint32_t wave_front_size;
> +	uint32_t array_count;
> +	uint32_t simd_arrays_per_engine;
> +	uint32_t cu_per_simd_array;
> +	uint32_t simd_per_cu;
> +	uint32_t max_slots_scratch_cu;
> +	uint32_t engine_id;
> +	uint32_t vendor_id;
> +	uint32_t device_id;
> +	uint32_t location_id;
> +	uint32_t max_engine_clk_fcompute;
> +	uint32_t max_engine_clk_ccompute;
> +	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +};
> +
> +#define HSA_MEM_HEAP_TYPE_SYSTEM	0
> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
> +#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
> +#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
> +
> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
> +#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
> +#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
> +
> +struct kfd_mem_properties {
> +	struct list_head	list;
> +	uint32_t		heap_type;
> +	uint64_t		size_in_bytes;
> +	uint32_t		flags;
> +	uint32_t		width;
> +	uint32_t		mem_clk_max;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
> +
> +#define HSA_CACHE_TYPE_DATA		0x00000001
> +#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
> +#define HSA_CACHE_TYPE_CPU		0x00000004
> +#define HSA_CACHE_TYPE_HSACU		0x00000008
> +#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
> +
> +struct kfd_cache_properties {
> +	struct list_head	list;
> +	uint32_t		processor_id_low;
> +	uint32_t		cache_level;
> +	uint32_t		cache_size;
> +	uint32_t		cacheline_size;
> +	uint32_t		cachelines_per_tag;
> +	uint32_t		cache_assoc;
> +	uint32_t		cache_latency;
> +	uint32_t		cache_type;
> +	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_iolink_properties {
> +	struct list_head	list;
> +	uint32_t		iolink_type;
> +	uint32_t		ver_maj;
> +	uint32_t		ver_min;
> +	uint32_t		node_from;
> +	uint32_t		node_to;
> +	uint32_t		weight;
> +	uint32_t		min_latency;
> +	uint32_t		max_latency;
> +	uint32_t		min_bandwidth;
> +	uint32_t		max_bandwidth;
> +	uint32_t		rec_transfer_size;
> +	uint32_t		flags;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_topology_device {
> +	struct list_head		list;
> +	uint32_t			gpu_id;
> +	struct kfd_node_properties	node_props;
> +	uint32_t			mem_bank_count;
> +	struct list_head		mem_props;
> +	uint32_t			cache_count;
> +	struct list_head		cache_props;
> +	uint32_t			io_link_count;
> +	struct list_head		io_link_props;
> +	struct kfd_dev			*gpu;
> +	struct kobject			*kobj_node;
> +	struct kobject			*kobj_mem;
> +	struct kobject			*kobj_cache;
> +	struct kobject			*kobj_iolink;
> +	struct attribute		attr_gpuid;
> +	struct attribute		attr_name;
> +	struct attribute		attr_props;
> +};
> +
> +struct kfd_system_properties {
> +	uint32_t		num_devices;     /* Number of H-NUMA nodes */
> +	uint32_t		generation_count;
> +	uint64_t		platform_oem;
> +	uint64_t		platform_id;
> +	uint64_t		platform_rev;
> +	struct kobject		*kobj_topology;
> +	struct kobject		*kobj_nodes;
> +	struct attribute	attr_genid;
> +	struct attribute	attr_props;
> +};
> +
> +
> +
> +#endif /* __KFD_TOPOLOGY_H__ */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 17:04     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 17:04 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Sandeep Nair, Andrew Lewycky, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Santosh Shilimkar, Srinivas Pandruvada, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> This patch adds the code base of the hsa driver for
> AMD's GPUs.
> 
> This driver is called kfd.
> 
> This initial version supports the first HSA chip, Kaveri.
> 
> This driver is located in a new directory structure under drivers/gpu.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

There is too coding style issues. While we have been lax on the enforcing the
scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
on the 80chars per line but others things needs fixing so we stay inline.

Also i am a bit worried about the license, given top comment in each of the
files i am not sure this is GPL2 compatible. I would need to ask lawyer to
review that.

Others comment inline.


> ---
>  drivers/Kconfig                        |    2 +
>  drivers/gpu/Makefile                   |    1 +
>  drivers/gpu/hsa/Kconfig                |   20 +
>  drivers/gpu/hsa/Makefile               |    1 +
>  drivers/gpu/hsa/radeon/Makefile        |    8 +
>  drivers/gpu/hsa/radeon/kfd_chardev.c   |  133 ++++
>  drivers/gpu/hsa/radeon/kfd_crat.h      |  292 ++++++++
>  drivers/gpu/hsa/radeon/kfd_device.c    |  162 +++++
>  drivers/gpu/hsa/radeon/kfd_module.c    |  117 ++++
>  drivers/gpu/hsa/radeon/kfd_pasid.c     |   92 +++
>  drivers/gpu/hsa/radeon/kfd_priv.h      |  232 ++++++
>  drivers/gpu/hsa/radeon/kfd_process.c   |  400 +++++++++++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h |   62 ++
>  drivers/gpu/hsa/radeon/kfd_topology.c  | 1201 ++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_topology.h  |  168 +++++
>  15 files changed, 2891 insertions(+)
>  create mode 100644 drivers/gpu/hsa/Kconfig
>  create mode 100644 drivers/gpu/hsa/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/Makefile
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_chardev.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_crat.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_device.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_module.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_pasid.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_priv.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_process.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_scheduler.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_topology.h
> 
> diff --git a/drivers/Kconfig b/drivers/Kconfig
> index 9b2dcc2..c1ac8f8 100644
> --- a/drivers/Kconfig
> +++ b/drivers/Kconfig
> @@ -178,4 +178,6 @@ source "drivers/mcb/Kconfig"
>  
>  source "drivers/thunderbolt/Kconfig"
>  
> +source "drivers/gpu/hsa/Kconfig"
> +
>  endmenu
> diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
> index 70da9eb..749a7ea 100644
> --- a/drivers/gpu/Makefile
> +++ b/drivers/gpu/Makefile
> @@ -1,3 +1,4 @@
>  obj-y			+= drm/ vga/
>  obj-$(CONFIG_TEGRA_HOST1X)	+= host1x/
>  obj-$(CONFIG_IMX_IPUV3_CORE)	+= ipu-v3/
> +obj-$(CONFIG_HSA)	+= hsa/
> \ No newline at end of file
> diff --git a/drivers/gpu/hsa/Kconfig b/drivers/gpu/hsa/Kconfig
> new file mode 100644
> index 0000000..ee7bb28
> --- /dev/null
> +++ b/drivers/gpu/hsa/Kconfig
> @@ -0,0 +1,20 @@
> +#
> +# Heterogenous system architecture configuration
> +#
> +
> +menuconfig HSA
> +	bool "Heterogenous System Architecture"
> +	default y
> +	help
> +	  Say Y here if you want Heterogenous System Architecture support.

Maybe a bit more chatty here, there is already enough kernel option that
are cryptic even to kernel developer. Not everyone is well aware of all
the fence 3 letter accronym GPU uses :)

> +
> +if HSA
> +
> +config HSA_RADEON
> +	tristate "HSA kernel driver for AMD Radeon devices"
> +	depends on HSA && AMD_IOMMU_V2 && X86_64
> +	default m
> +	help
> +	  Enable this if you want to support HSA on AMD Radeon devices.
> +
> +endif # HSA
> diff --git a/drivers/gpu/hsa/Makefile b/drivers/gpu/hsa/Makefile
> new file mode 100644
> index 0000000..0951584
> --- /dev/null
> +++ b/drivers/gpu/hsa/Makefile
> @@ -0,0 +1 @@
> +obj-$(CONFIG_HSA_RADEON)	+= radeon/
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> new file mode 100644
> index 0000000..ba16a09
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -0,0 +1,8 @@
> +#
> +# Makefile for Heterogenous System Architecture support for AMD Radeon devices
> +#
> +
> +radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
> +		kfd_pasid.o kfd_topology.o kfd_process.o
> +
> +obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> new file mode 100644
> index 0000000..7a56a8f
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -0,0 +1,133 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/device.h>
> +#include <linux/export.h>
> +#include <linux/err.h>
> +#include <linux/fs.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);

Nitpick, avoid unsigned int just use unsigned.

> +static int kfd_open(struct inode *, struct file *);
> +
> +static const char kfd_dev_name[] = "kfd";
> +
> +static const struct file_operations kfd_fops = {
> +	.owner = THIS_MODULE,
> +	.unlocked_ioctl = kfd_ioctl,
> +	.open = kfd_open,
> +};
> +
> +static int kfd_char_dev_major = -1;
> +static struct class *kfd_class;
> +struct device *kfd_device;
> +
> +int
> +radeon_kfd_chardev_init(void)
> +{
> +	int err = 0;
> +
> +	kfd_char_dev_major = register_chrdev(0, kfd_dev_name, &kfd_fops);
> +	err = kfd_char_dev_major;
> +	if (err < 0)
> +		goto err_register_chrdev;
> +
> +	kfd_class = class_create(THIS_MODULE, kfd_dev_name);
> +	err = PTR_ERR(kfd_class);
> +	if (IS_ERR(kfd_class))
> +		goto err_class_create;
> +
> +	kfd_device = device_create(kfd_class, NULL, MKDEV(kfd_char_dev_major, 0), NULL, kfd_dev_name);
> +	err = PTR_ERR(kfd_device);
> +	if (IS_ERR(kfd_device))
> +		goto err_device_create;
> +
> +	return 0;
> +
> +err_device_create:
> +	class_destroy(kfd_class);
> +err_class_create:
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +err_register_chrdev:
> +	return err;
> +}
> +
> +void
> +radeon_kfd_chardev_exit(void)
> +{
> +	device_destroy(kfd_class, MKDEV(kfd_char_dev_major, 0));
> +	class_destroy(kfd_class);
> +	unregister_chrdev(kfd_char_dev_major, kfd_dev_name);
> +}
> +
> +struct device*
> +radeon_kfd_chardev(void)
> +{
> +	return kfd_device;
> +}
> +
> +
> +static int
> +kfd_open(struct inode *inode, struct file *filep)
> +{
> +	struct kfd_process *process;
> +
> +	if (iminor(inode) != 0)
> +		return -ENODEV;
> +
> +	process = radeon_kfd_create_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	pr_debug("\nkfd: process %d opened dev/kfd", process->pasid);
> +
> +	return 0;
> +}
> +
> +
> +static long
> +kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
> +{
> +	long err = -EINVAL;
> +
> +	dev_info(kfd_device,
> +		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
> +		 cmd, _IOC_NR(cmd), arg);
> +
> +	switch (cmd) {
> +	default:
> +		dev_err(kfd_device,
> +			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> +			cmd, arg);
> +		err = -EINVAL;
> +		break;
> +	}
> +
> +	if (err < 0)
> +		dev_err(kfd_device, "ioctl error %ld\n", err);
> +
> +	return err;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_crat.h b/drivers/gpu/hsa/radeon/kfd_crat.h
> new file mode 100644
> index 0000000..587455d
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_crat.h
> @@ -0,0 +1,292 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_CRAT_H_INCLUDED
> +#define KFD_CRAT_H_INCLUDED
> +
> +#include <linux/types.h>
> +
> +#pragma pack(1)
> +
> +/*
> + * 4CC signature values for the CRAT and CDIT ACPI tables
> + */
> +
> +#define CRAT_SIGNATURE	"CRAT"
> +#define CDIT_SIGNATURE	"CDIT"
> +
> +/*
> + * Component Resource Association Table (CRAT)
> + */
> +
> +#define CRAT_OEMID_LENGTH	6
> +#define CRAT_OEMTABLEID_LENGTH	8
> +#define CRAT_RESERVED_LENGTH	6
> +
> +struct crat_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CRAT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CRAT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		reserved[CRAT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * The header structure is immediately followed by total_entries of the
> + * data definitions
> + */
> +
> +/*
> + * The currently defined subtype entries in the CRAT
> + */
> +#define CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY	0
> +#define CRAT_SUBTYPE_MEMORY_AFFINITY		1
> +#define CRAT_SUBTYPE_CACHE_AFFINITY		2
> +#define CRAT_SUBTYPE_TLB_AFFINITY		3
> +#define CRAT_SUBTYPE_CCOMPUTE_AFFINITY		4
> +#define CRAT_SUBTYPE_IOLINK_AFFINITY		5
> +#define CRAT_SUBTYPE_MAX			6
> +
> +#define CRAT_SIBLINGMAP_SIZE	32
> +
> +/*
> + * ComputeUnit Affinity structure and definitions
> + */
> +#define CRAT_CU_FLAGS_ENABLED		0x00000001
> +#define CRAT_CU_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_CU_FLAGS_CPU_PRESENT	0x00000004
> +#define CRAT_CU_FLAGS_GPU_PRESENT	0x00000008
> +#define CRAT_CU_FLAGS_IOMMU_PRESENT	0x00000010
> +#define CRAT_CU_FLAGS_RESERVED		0xffffffe0
> +
> +#define CRAT_COMPUTEUNIT_RESERVED_LENGTH 4
> +
> +struct crat_subtype_computeunit {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain;
> +	uint32_t	processor_id_low;
> +	uint16_t	num_cpu_cores;
> +	uint16_t	num_simd_cores;
> +	uint16_t	max_waves_simd;
> +	uint16_t	io_count;
> +	uint16_t	hsa_capability;
> +	uint16_t	lds_size_in_kb;
> +	uint8_t		wave_front_size;
> +	uint8_t		num_banks;
> +	uint16_t	micro_engine_id;
> +	uint8_t		num_arrays;
> +	uint8_t		num_cu_per_array;
> +	uint8_t		num_simd_per_cu;
> +	uint8_t		max_slots_scatch_cu;
> +	uint8_t		reserved2[CRAT_COMPUTEUNIT_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Memory Affinity structure and definitions
> + */
> +#define CRAT_MEM_FLAGS_ENABLED		0x00000001
> +#define CRAT_MEM_FLAGS_HOT_PLUGGABLE	0x00000002
> +#define CRAT_MEM_FLAGS_NON_VOLATILE	0x00000004
> +#define CRAT_MEM_FLAGS_RESERVED		0xfffffff8
> +
> +#define CRAT_MEMORY_RESERVED_LENGTH 8
> +
> +struct crat_subtype_memory {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	promixity_domain;
> +	uint32_t	base_addr_low;
> +	uint32_t	base_addr_high;
> +	uint32_t	length_low;
> +	uint32_t	length_high;
> +	uint32_t	width;
> +	uint8_t		reserved2[CRAT_MEMORY_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA Cache Affinity structure and definitions
> + */
> +#define CRAT_CACHE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CACHE_FLAGS_DATA_CACHE	0x00000002
> +#define CRAT_CACHE_FLAGS_INST_CACHE	0x00000004
> +#define CRAT_CACHE_FLAGS_CPU_CACHE	0x00000008
> +#define CRAT_CACHE_FLAGS_SIMD_CACHE	0x00000010
> +#define CRAT_CACHE_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_CACHE_RESERVED_LENGTH 8
> +
> +struct crat_subtype_cache {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	cache_size;
> +	uint8_t		cache_level;
> +	uint8_t		lines_per_tag;
> +	uint16_t	cache_line_size;
> +	uint8_t		associativity;
> +	uint8_t		cache_properties;
> +	uint16_t	cache_latency;
> +	uint8_t		reserved2[CRAT_CACHE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA TLB Affinity structure and definitions
> + */
> +#define CRAT_TLB_FLAGS_ENABLED	0x00000001
> +#define CRAT_TLB_FLAGS_DATA_TLB	0x00000002
> +#define CRAT_TLB_FLAGS_INST_TLB	0x00000004
> +#define CRAT_TLB_FLAGS_CPU_TLB	0x00000008
> +#define CRAT_TLB_FLAGS_SIMD_TLB	0x00000010
> +#define CRAT_TLB_FLAGS_RESERVED	0xffffffe0
> +
> +#define CRAT_TLB_RESERVED_LENGTH 4
> +
> +struct crat_subtype_tlb {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	tlb_level;
> +	uint8_t		data_tlb_associativity_2mb;
> +	uint8_t		data_tlb_size_2mb;
> +	uint8_t		instruction_tlb_associativity_2mb;
> +	uint8_t		instruction_tlb_size_2mb;
> +	uint8_t		data_tlb_associativity_4k;
> +	uint8_t		data_tlb_size_4k;
> +	uint8_t		instruction_tlb_associativity_4k;
> +	uint8_t		instruction_tlb_size_4k;
> +	uint8_t		data_tlb_associativity_1gb;
> +	uint8_t		data_tlb_size_1gb;
> +	uint8_t		instruction_tlb_associativity_1gb;
> +	uint8_t		instruction_tlb_size_1gb;
> +	uint8_t		reserved2[CRAT_TLB_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA CCompute/APU Affinity structure and definitions
> + */
> +#define CRAT_CCOMPUTE_FLAGS_ENABLED	0x00000001
> +#define CRAT_CCOMPUTE_FLAGS_RESERVED	0xfffffffe
> +
> +#define CRAT_CCOMPUTE_RESERVED_LENGTH 16
> +
> +struct crat_subtype_ccompute {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	processor_id_low;
> +	uint8_t		sibling_map[CRAT_SIBLINGMAP_SIZE];
> +	uint32_t	apu_size;
> +	uint8_t		reserved2[CRAT_CCOMPUTE_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA IO Link Affinity structure and definitions
> + */
> +#define CRAT_IOLINK_FLAGS_ENABLED	0x00000001
> +#define CRAT_IOLINK_FLAGS_COHERENCY	0x00000002
> +#define CRAT_IOLINK_FLAGS_RESERVED	0xfffffffc
> +
> +/*
> + * IO interface types
> + */
> +#define CRAT_IOLINK_TYPE_UNDEFINED	0
> +#define CRAT_IOLINK_TYPE_HYPERTRANSPORT	1
> +#define CRAT_IOLINK_TYPE_PCIEXPRESS	2
> +#define CRAT_IOLINK_TYPE_OTHER		3
> +#define CRAT_IOLINK_TYPE_MAX		255
> +
> +#define CRAT_IOLINK_RESERVED_LENGTH 24
> +
> +struct crat_subtype_iolink {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +	uint32_t	proximity_domain_from;
> +	uint32_t	proximity_domain_to;
> +	uint8_t		io_interface_type;
> +	uint8_t		version_major;
> +	uint16_t	version_minor;
> +	uint32_t	minimum_latency;
> +	uint32_t	maximum_latency;
> +	uint32_t	minimum_bandwidth_mbs;
> +	uint32_t	maximum_bandwidth_mbs;
> +	uint32_t	recommended_transfer_size;
> +	uint8_t		reserved2[CRAT_IOLINK_RESERVED_LENGTH];
> +};
> +
> +/*
> + * HSA generic sub-type header
> + */
> +
> +#define CRAT_SUBTYPE_FLAGS_ENABLED 0x00000001
> +
> +struct crat_subtype_generic {
> +	uint8_t		type;
> +	uint8_t		length;
> +	uint16_t	reserved;
> +	uint32_t	flags;
> +};
> +
> +/*
> + * Component Locality Distance Information Table (CDIT)
> + */
> +#define CDIT_OEMID_LENGTH	6
> +#define CDIT_OEMTABLEID_LENGTH	8
> +
> +struct cdit_header {
> +	uint32_t	signature;
> +	uint32_t	length;
> +	uint8_t		revision;
> +	uint8_t		checksum;
> +	uint8_t		oem_id[CDIT_OEMID_LENGTH];
> +	uint8_t		oem_table_id[CDIT_OEMTABLEID_LENGTH];
> +	uint32_t	oem_revision;
> +	uint32_t	creator_id;
> +	uint32_t	creator_revision;
> +	uint32_t	total_entries;
> +	uint16_t	num_domains;
> +	uint8_t		entry[1];
> +};
> +
> +#pragma pack()
> +
> +#endif /* KFD_CRAT_H_INCLUDED */
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> new file mode 100644
> index 0000000..d122920
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -0,0 +1,162 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/amd-iommu.h>
> +#include <linux/bsearch.h>
> +#include <linux/pci.h>
> +#include <linux/slab.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +static const struct kfd_device_info bonaire_device_info = {
> +	.max_pasid_bits = 16,
> +};
> +
> +struct kfd_deviceid {
> +	unsigned short did;
> +	const struct kfd_device_info *device_info;
> +};
> +
> +/* Please keep this sorted by increasing device id. */
> +static const struct kfd_deviceid supported_devices[] = {
> +	{ 0x1305, &bonaire_device_info },	/* Kaveri */
> +	{ 0x1307, &bonaire_device_info },	/* Kaveri */
> +	{ 0x130F, &bonaire_device_info },	/* Kaveri */
> +	{ 0x665C, &bonaire_device_info },	/* Bonaire */
> +};
> +
> +static const struct kfd_device_info *
> +lookup_device_info(unsigned short did)
> +{
> +	size_t i;
> +
> +	for (i = 0; i < ARRAY_SIZE(supported_devices); i++) {
> +		if (supported_devices[i].did == did) {
> +			BUG_ON(supported_devices[i].device_info == NULL);
> +			return supported_devices[i].device_info;
> +		}
> +	}
> +
> +	return NULL;
> +}
> +
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev)
> +{
> +	struct kfd_dev *kfd;
> +
> +	const struct kfd_device_info *device_info = lookup_device_info(pdev->device);
> +
> +	if (!device_info)
> +		return NULL;
> +
> +	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
> +	kfd->kgd = kgd;
> +	kfd->device_info = device_info;
> +	kfd->pdev = pdev;
> +
> +	return kfd;
> +}
> +
> +static bool
> +device_iommu_pasid_init(struct kfd_dev *kfd)
> +{
> +	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP | AMD_IOMMU_DEVICE_FLAG_PRI_SUP
> +					| AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +
> +	struct amd_iommu_device_info iommu_info;
> +	pasid_t pasid_limit;
> +	int err;
> +
> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +	if (err < 0)
> +		return false;
> +
> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
> +		return false;
> +
> +	pasid_limit = min_t(pasid_t, (pasid_t)1 << kfd->device_info->max_pasid_bits, iommu_info.max_pasids);
> +	pasid_limit = min_t(pasid_t, pasid_limit, kfd->doorbell_process_limit);
> +
> +	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +	if (err < 0)
> +		return false;
> +
> +	if (!radeon_kfd_set_pasid_limit(pasid_limit)) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	return true;
> +}
> +
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> +{
> +	struct kfd_dev *dev = radeon_kfd_device_by_pci_dev(pdev);
> +
> +	if (dev)
> +		radeon_kfd_unbind_process_from_device(dev, pasid);
> +}
> +
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources)
> +{
> +	kfd->shared_resources = *gpu_resources;
> +
> +	kfd->regs = gpu_resources->mmio_registers;
> +
> +	if (!device_iommu_pasid_init(kfd))
> +		return false;
> +
> +	if (kfd_topology_add_device(kfd) != 0) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, iommu_pasid_shutdown_callback);
> +
> +	if (kfd->device_info->scheduler_class->create(kfd, &kfd->scheduler)) {
> +		amd_iommu_free_device(kfd->pdev);
> +		return false;
> +	}
> +
> +	kfd->device_info->scheduler_class->start(kfd->scheduler);
> +
> +	kfd->init_complete = true;
> +
> +	return true;
> +}
> +
> +void kgd2kfd_device_exit(struct kfd_dev *kfd)
> +{
> +	int err = kfd_topology_remove_device(kfd);
> +
> +	BUG_ON(err != 0);
> +
> +	if (kfd->init_complete) {
> +		kfd->device_info->scheduler_class->stop(kfd->scheduler);
> +		kfd->device_info->scheduler_class->destroy(kfd->scheduler);
> +
> +		amd_iommu_free_device(kfd->pdev);
> +	}
> +
> +	kfree(kfd);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_module.c b/drivers/gpu/hsa/radeon/kfd_module.c
> new file mode 100644
> index 0000000..6978bc0
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_module.c
> @@ -0,0 +1,117 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/module.h>
> +#include <linux/sched.h>
> +#include <linux/notifier.h>
> +
> +#include "kfd_priv.h"
> +
> +#define DRIVER_AUTHOR		"Andrew Lewycky, Oded Gabbay, Evgeny Pinchuk, others."
> +
> +#define DRIVER_NAME		"kfd"
> +#define DRIVER_DESC		"AMD HSA Kernel Fusion Driver"
> +#define DRIVER_DATE		"20140127"
> +
> +const struct kfd2kgd_calls *kfd2kgd;
> +static const struct kgd2kfd_calls kgd2kfd = {
> +	.exit		= kgd2kfd_exit,
> +	.probe		= kgd2kfd_probe,
> +	.device_init	= kgd2kfd_device_init,
> +	.device_exit	= kgd2kfd_device_exit,
> +};
> +
> +bool kgd2kfd_init(unsigned interface_version,
> +		  const struct kfd2kgd_calls *f2g,
> +		  const struct kgd2kfd_calls **g2f)
> +{
> +	/* Only one interface version is supported, no kfd/kgd version skew allowed. */
> +	if (interface_version != KFD_INTERFACE_VERSION)
> +		return false;
> +
> +	kfd2kgd = f2g;
> +	*g2f = &kgd2kfd;
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(kgd2kfd_init);
> +
> +void kgd2kfd_exit(void)
> +{
> +}
> +
> +extern int kfd_process_exit(struct notifier_block *nb,
> +				unsigned long action, void *data);
> +
> +static struct notifier_block kfd_mmput_nb = {
> +	.notifier_call		= kfd_process_exit,
> +	.priority		= 3,
> +};
> +
> +static int __init kfd_module_init(void)
> +{
> +	int err;
> +
> +	err = radeon_kfd_pasid_init();
> +	if (err < 0)
> +		goto err_pasid;
> +
> +	err = radeon_kfd_chardev_init();
> +	if (err < 0)
> +		goto err_ioctl;
> +
> +	err = mmput_register_notifier(&kfd_mmput_nb);
> +	if (err)
> +		goto err_mmu_notifier;
> +
> +	err = kfd_topology_init();
> +	if (err < 0)
> +		goto err_topology;
> +
> +	pr_info("[hsa] Initialized kfd module");
> +
> +	return 0;
> +err_topology:
> +	mmput_unregister_notifier(&kfd_mmput_nb);
> +err_mmu_notifier:
> +	radeon_kfd_chardev_exit();
> +err_ioctl:
> +	radeon_kfd_pasid_exit();
> +err_pasid:
> +	return err;
> +}
> +
> +static void __exit kfd_module_exit(void)
> +{
> +	kfd_topology_shutdown();
> +	mmput_unregister_notifier(&kfd_mmput_nb);
> +	radeon_kfd_chardev_exit();
> +	radeon_kfd_pasid_exit();
> +	pr_info("[hsa] Removed kfd module");
> +}
> +
> +module_init(kfd_module_init);
> +module_exit(kfd_module_exit);
> +
> +MODULE_AUTHOR(DRIVER_AUTHOR);
> +MODULE_DESCRIPTION(DRIVER_DESC);
> +MODULE_LICENSE("GPL");

If it is GPL then comment at the top of all files must reflect that
and not use some special worded license.

> diff --git a/drivers/gpu/hsa/radeon/kfd_pasid.c b/drivers/gpu/hsa/radeon/kfd_pasid.c
> new file mode 100644
> index 0000000..d78bd00
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_pasid.c
> @@ -0,0 +1,92 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include "kfd_priv.h"
> +
> +#define INITIAL_PASID_LIMIT (1<<20)
> +
> +static unsigned long *pasid_bitmap;
> +static pasid_t pasid_limit;
> +static DEFINE_MUTEX(pasid_mutex);
> +
> +int radeon_kfd_pasid_init(void)
> +{
> +	pasid_limit = INITIAL_PASID_LIMIT;
> +
> +	pasid_bitmap = kzalloc(DIV_ROUND_UP(INITIAL_PASID_LIMIT, BITS_PER_BYTE), GFP_KERNEL);
> +	if (!pasid_bitmap)
> +		return -ENOMEM;
> +
> +	set_bit(0, pasid_bitmap); /* PASID 0 is reserved. */
> +
> +	return 0;
> +}
> +
> +void radeon_kfd_pasid_exit(void)
> +{
> +	kfree(pasid_bitmap);
> +}
> +
> +bool radeon_kfd_set_pasid_limit(pasid_t new_limit)
> +{
> +	if (new_limit < pasid_limit) {
> +		bool ok;
> +
> +		mutex_lock(&pasid_mutex);
> +
> +		/* ensure that no pasids >= new_limit are in-use */
> +		ok = (find_next_bit(pasid_bitmap, pasid_limit, new_limit) == pasid_limit);
> +		if (ok)
> +			pasid_limit = new_limit;
> +
> +		mutex_unlock(&pasid_mutex);
> +
> +		return ok;
> +	}
> +
> +	return true;
> +}
> +
> +pasid_t radeon_kfd_pasid_alloc(void)
> +{
> +	pasid_t found;
> +
> +	mutex_lock(&pasid_mutex);
> +
> +	found = find_first_zero_bit(pasid_bitmap, pasid_limit);
> +	if (found == pasid_limit)
> +		found = 0;
> +	else
> +		set_bit(found, pasid_bitmap);
> +
> +	mutex_unlock(&pasid_mutex);
> +
> +	return found;
> +}
> +
> +void radeon_kfd_pasid_free(pasid_t pasid)
> +{
> +	BUG_ON(pasid == 0 || pasid >= pasid_limit);
> +	clear_bit(pasid, pasid_bitmap);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
> new file mode 100644
> index 0000000..1d1dbcf
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_priv.h
> @@ -0,0 +1,232 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_PRIV_H_INCLUDED
> +#define KFD_PRIV_H_INCLUDED
> +
> +#include <linux/hashtable.h>
> +#include <linux/mmu_notifier.h>
> +#include <linux/mutex.h>
> +#include <linux/radeon_kfd.h>
> +#include <linux/types.h>
> +
> +struct kfd_scheduler_class;
> +
> +#define MAX_KFD_DEVICES 16	/* Global limit - only MAX_KFD_DEVICES will be supported by KFD. */
> +
> +/*
> + * Per-process limit. Each process can only
> + * create MAX_PROCESS_QUEUES across all devices
> + */
> +#define MAX_PROCESS_QUEUES 1024
> +
> +#define MAX_DOORBELL_INDEX MAX_PROCESS_QUEUES
> +#define KFD_SYSFS_FILE_MODE 0444
> +
> +/* We multiplex different sorts of mmap-able memory onto /dev/kfd.
> +** We figure out what type of memory the caller wanted by comparing the mmap page offset to known ranges. */
> +#define KFD_MMAP_DOORBELL_START	(((1ULL << 32)*1) >> PAGE_SHIFT)
> +#define KFD_MMAP_DOORBELL_END	(((1ULL << 32)*2) >> PAGE_SHIFT)
> +
> +/* GPU ID hash width in bits */
> +#define KFD_GPU_ID_HASH_WIDTH 16
> +
> +/* Macro for allocating structures */
> +#define kfd_alloc_struct(ptr_to_struct)	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
> +
> +/* Large enough to hold the maximum usable pasid + 1.
> +** It must also be able to store the number of doorbells reported by a KFD device. */
> +typedef unsigned int pasid_t;

Same on unsigned int.

> +
> +/* Type that represents a HW doorbell slot. */
> +typedef u32 doorbell_t;
> +
> +struct kfd_device_info {
> +	const struct kfd_scheduler_class *scheduler_class;
> +	unsigned int max_pasid_bits;
> +};
> +
> +struct kfd_dev {
> +	struct kgd_dev *kgd;
> +
> +	const struct kfd_device_info *device_info;
> +	struct pci_dev *pdev;
> +
> +	void __iomem *regs;
> +
> +	bool init_complete;
> +
> +	unsigned int id;		/* topology stub index */
> +
> +	phys_addr_t doorbell_base;	/* Start of actual doorbells used by
> +					 * KFD. It is aligned for mapping
> +					 * into user mode
> +					 */
> +	size_t doorbell_id_offset;	/* Doorbell offset (from KFD doorbell
> +					 * to HW doorbell, GFX reserved some
> +					 * at the start)
> +					 */
> +	size_t doorbell_process_limit;	/* Number of processes we have doorbell space for. */
> +
> +	struct kgd2kfd_shared_resources shared_resources;
> +
> +	struct kfd_scheduler *scheduler;
> +};
> +
> +/* KGD2KFD callbacks */
> +void kgd2kfd_exit(void);
> +struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd, struct pci_dev *pdev);
> +bool kgd2kfd_device_init(struct kfd_dev *kfd,
> +			 const struct kgd2kfd_shared_resources *gpu_resources);
> +void kgd2kfd_device_exit(struct kfd_dev *kfd);
> +
> +extern const struct kfd2kgd_calls *kfd2kgd;
> +
> +
> +/* KFD2KGD callback wrappers */
> +void radeon_kfd_lock_srbm_index(struct kfd_dev *kfd);
> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *kfd);
> +
> +enum kfd_mempool {
> +	KFD_MEMPOOL_SYSTEM_CACHEABLE = 1,
> +	KFD_MEMPOOL_SYSTEM_WRITECOMBINE = 2,
> +	KFD_MEMPOOL_FRAMEBUFFER = 3,
> +};
> +
> +struct kfd_mem_obj_s; /* Dummy struct just to make kfd_mem_obj* a unique pointer type. */
> +typedef struct kfd_mem_obj_s *kfd_mem_obj;
> +
> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj);
> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, uint64_t *vmid0_address);
> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr);
> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj);
> +
> +/* Character device interface */
> +int radeon_kfd_chardev_init(void);
> +void radeon_kfd_chardev_exit(void);
> +struct device *radeon_kfd_chardev(void);
> +
> +/* Scheduler */
> +struct kfd_scheduler;
> +struct kfd_scheduler_process;
> +struct kfd_scheduler_queue {
> +	uint64_t dummy;
> +};
> +
> +struct kfd_queue {
> +	struct kfd_dev *dev;
> +
> +	/* scheduler_queue must be last. It is variable sized (dev->device_info->scheduler_class->queue_size) */
> +	struct kfd_scheduler_queue scheduler_queue;
> +};
> +
> +/* Data that is per-process-per device. */
> +struct kfd_process_device {
> +	/* List of all per-device data for a process. Starts from kfd_process.per_device_data. */
> +	struct list_head per_device_list;
> +
> +	/* The device that owns this data. */
> +	struct kfd_dev *dev;
> +
> +	/* The user-mode address of the doorbell mapping for this device. */
> +	doorbell_t __user *doorbell_mapping;
> +
> +	/* The number of queues created by this process for this device. */
> +	uint32_t queue_count;
> +
> +	/* Scheduler process data for this device. */
> +	struct kfd_scheduler_process *scheduler_process;
> +
> +	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
> +	bool bound;
> +};
> +
> +/* Process data */
> +struct kfd_process {
> +	struct list_head processes_list;
> +
> +	struct mm_struct *mm;
> +
> +	struct mutex mutex;
> +
> +	/* In any process, the thread that started main() is the lead thread and outlives the rest.
> +	 * It is here because amd_iommu_bind_pasid wants a task_struct. */
> +	struct task_struct *lead_thread;
> +
> +	pasid_t pasid;
> +
> +	/* List of kfd_process_device structures, one for each device the process is using. */
> +	struct list_head per_device_data;
> +
> +	/* The process's queues. */
> +	size_t queue_array_size;
> +	struct kfd_queue **queues;	/* Size is queue_array_size, up to MAX_PROCESS_QUEUES. */
> +	unsigned long allocated_queue_bitmap[DIV_ROUND_UP(MAX_PROCESS_QUEUES, BITS_PER_LONG)];
> +};
> +
> +struct kfd_process *radeon_kfd_create_process(const struct task_struct *);
> +struct kfd_process *radeon_kfd_get_process(const struct task_struct *);
> +
> +struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p);
> +void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid);
> +struct kfd_process_device *radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p);
> +
> +bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id);
> +void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue);
> +void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id);
> +struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id);
> +
> +
> +/* PASIDs */
> +int radeon_kfd_pasid_init(void);
> +void radeon_kfd_pasid_exit(void);
> +bool radeon_kfd_set_pasid_limit(pasid_t new_limit);
> +pasid_t radeon_kfd_pasid_alloc(void);
> +void radeon_kfd_pasid_free(pasid_t pasid);
> +
> +/* Doorbells */
> +void radeon_kfd_doorbell_init(struct kfd_dev *kfd);
> +int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
> +doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
> +					   unsigned int doorbell_index);
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id);
> +
> +extern struct device *kfd_device;
> +
> +/* Topology */
> +int kfd_topology_init(void);
> +void kfd_topology_shutdown(void);
> +int kfd_topology_add_device(struct kfd_dev *gpu);
> +int kfd_topology_remove_device(struct kfd_dev *gpu);
> +struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id);
> +struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
> +
> +/* MMIO registers */
> +#define WRITE_REG(dev, reg, value) radeon_kfd_write_reg((dev), (reg), (value))
> +#define READ_REG(dev, reg) radeon_kfd_read_reg((dev), (reg))
> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_process.c b/drivers/gpu/hsa/radeon/kfd_process.c
> new file mode 100644
> index 0000000..145ee38
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_process.c
> @@ -0,0 +1,400 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/mutex.h>
> +#include <linux/log2.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/amd-iommu.h>
> +#include <linux/notifier.h>
> +struct mm_struct;
> +
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +/* Initial size for the array of queues.
> + * The allocated size is doubled each time it is exceeded up to MAX_PROCESS_QUEUES. */
> +#define INITIAL_QUEUE_ARRAY_SIZE 16
> +
> +/* List of struct kfd_process */
> +static struct list_head kfd_processes_list = LIST_HEAD_INIT(kfd_processes_list);
> +
> +static DEFINE_MUTEX(kfd_processes_mutex);
> +
> +static struct kfd_process *create_process(const struct task_struct *thread);
> +
> +struct kfd_process*
> +radeon_kfd_create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	/*
> +	 * take kfd processes mutex before starting of process creation
> +	 * so there won't be a case where two threads of the same process
> +	 * create two kfd_process structures
> +	 */
> +	mutex_lock(&kfd_processes_mutex);

Given that this is to protect mm->kfd_process i would rather that you
use some mm lock so that if another non kfd code ever need to check
this variable in a sensible way then it could protect itself with a
mm lock.

But again i believe that mm_struct should not have a new kfd field but
rather some generic iommu pasid field that can then forward through
generic iommu code things to kfd.

> +
> +	/* A prior open of /dev/kfd could have already created the process. */
> +	process = thread->mm->kfd_process;
> +	if (process)
> +		pr_debug("kfd: process already found\n");
> +
> +	if (!process)
> +		process = create_process(thread);
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return process;
> +}
> +
> +struct kfd_process*
> +radeon_kfd_get_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +
> +	if (thread->mm == NULL)
> +		return ERR_PTR(-EINVAL);
> +
> +	/* Only the pthreads threading model is supported. */
> +	if (thread->group_leader->mm != thread->mm)
> +		return ERR_PTR(-EINVAL);
> +
> +	process = thread->mm->kfd_process;
> +
> +	return process;
> +}
> +
> +/* Assumes that the kfd_process mutex is held.
> + * (Or that it doesn't need to be held because the process is exiting.)
> + *
> + * dev_filter can be set to only destroy queues for one device.
> + * Otherwise all queues for the process are destroyed.
> + */
> +static void
> +destroy_queues(struct kfd_process *p, struct kfd_dev *dev_filter)
> +{
> +	unsigned long queue_id;
> +
> +	for_each_set_bit(queue_id, p->allocated_queue_bitmap, MAX_PROCESS_QUEUES) {
> +
> +		struct kfd_queue *queue = radeon_kfd_get_queue(p, queue_id);
> +		struct kfd_dev *dev;
> +
> +		BUG_ON(queue == NULL);
> +
> +		dev = queue->dev;
> +
> +		if (!dev_filter || dev == dev_filter) {
> +			struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
> +
> +			BUG_ON(pdd == NULL); /* A queue exists so pdd must. */
> +
> +			radeon_kfd_remove_queue(p, queue_id);
> +			dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +			kfree(queue);
> +
> +			BUG_ON(pdd->queue_count == 0);
> +			BUG_ON(pdd->scheduler_process == NULL);
> +
> +			if (--pdd->queue_count == 0) {
> +				dev->device_info->scheduler_class->deregister_process(dev->scheduler,
> +							pdd->scheduler_process);
> +				pdd->scheduler_process = NULL;
> +			}
> +		}
> +	}
> +}
> +
> +static void free_process(struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd, *temp;
> +
> +	BUG_ON(p == NULL);
> +
> +	destroy_queues(p, NULL);
> +
> +	/* doorbell mappings: automatic */
> +
> +	list_for_each_entry_safe(pdd, temp, &p->per_device_data, per_device_list) {
> +		amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> +		list_del(&pdd->per_device_list);
> +		kfree(pdd);
> +	}
> +
> +	radeon_kfd_pasid_free(p->pasid);
> +
> +	mutex_destroy(&p->mutex);
> +
> +	kfree(p->queues);
> +
> +	list_del(&p->processes_list);
> +
> +	kfree(p);
> +}
> +
> +int kfd_process_exit(struct notifier_block *nb,
> +			unsigned long action, void *data)
> +{
> +	struct mm_struct *mm = data;
> +	struct kfd_process *p;
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	p = mm->kfd_process;
> +	if (p) {
> +		free_process(p);
> +		mm->kfd_process = NULL;
> +	}
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	return 0;
> +}
> +
> +static struct kfd_process *create_process(const struct task_struct *thread)
> +{
> +	struct kfd_process *process;
> +	int err = -ENOMEM;
> +
> +	process = kzalloc(sizeof(*process), GFP_KERNEL);
> +
> +	if (!process)
> +		goto err_alloc;
> +
> +	process->queues = kmalloc_array(INITIAL_QUEUE_ARRAY_SIZE, sizeof(process->queues[0]), GFP_KERNEL);
> +	if (!process->queues)
> +		goto err_alloc;
> +
> +	process->pasid = radeon_kfd_pasid_alloc();
> +	if (process->pasid == 0)
> +		goto err_alloc;
> +
> +	mutex_init(&process->mutex);
> +
> +	process->mm = thread->mm;
> +	thread->mm->kfd_process = process;
> +	list_add_tail(&process->processes_list, &kfd_processes_list);
> +
> +	process->lead_thread = thread->group_leader;
> +
> +	process->queue_array_size = INITIAL_QUEUE_ARRAY_SIZE;
> +
> +	INIT_LIST_HEAD(&process->per_device_data);
> +
> +	return process;
> +
> +err_alloc:
> +	kfree(process->queues);
> +	kfree(process);
> +	return ERR_PTR(err);
> +}
> +
> +struct kfd_process_device *
> +radeon_kfd_get_process_device_data(struct kfd_dev *dev, struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd;
> +
> +	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> +		if (pdd->dev == dev)
> +			return pdd;
> +
> +	pdd = kzalloc(sizeof(*pdd), GFP_KERNEL);
> +	if (pdd != NULL) {
> +		pdd->dev = dev;
> +		list_add(&pdd->per_device_list, &p->per_device_data);
> +	}
> +
> +	return pdd;
> +}
> +
> +/* Direct the IOMMU to bind the process (specifically the pasid->mm) to the device.
> + * Unbinding occurs when the process dies or the device is removed.
> + *
> + * Assumes that the process lock is held.
> + */
> +struct kfd_process_device *radeon_kfd_bind_process_to_device(struct kfd_dev *dev, struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, p);
> +	int err;
> +
> +	if (pdd == NULL)
> +		return ERR_PTR(-ENOMEM);
> +
> +	if (pdd->bound)
> +		return pdd;
> +
> +	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);

Are we to assume that for eternity this will not work on iommu that do support
PASID/ATS but are not from AMD ? If it was an APU specific function i would
understand but it seems that the IOMMU API needs to grow. I am pretty sure
Intel will have an ATS/PASID IOMMU.

> +	if (err < 0)
> +		return ERR_PTR(err);
> +
> +	pdd->bound = true;
> +
> +	return pdd;
> +}
> +
> +void radeon_kfd_unbind_process_from_device(struct kfd_dev *dev, pasid_t pasid)
> +{
> +	struct kfd_process *p;
> +	struct kfd_process_device *pdd;
> +
> +	BUG_ON(dev == NULL);
> +
> +	mutex_lock(&kfd_processes_mutex);
> +
> +	list_for_each_entry(p, &kfd_processes_list, processes_list)
> +		if (p->pasid == pasid)
> +			break;
> +
> +	mutex_unlock(&kfd_processes_mutex);
> +
> +	BUG_ON(p->pasid != pasid);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +
> +	BUG_ON(pdd == NULL);
> +
> +	mutex_lock(&p->mutex);
> +
> +	destroy_queues(p, dev);
> +
> +	/* All queues just got destroyed so this should be gone. */
> +	BUG_ON(pdd->scheduler_process != NULL);
> +
> +	/*
> +	 * Just mark pdd as unbound, because we still need it to call
> +	 * amd_iommu_unbind_pasid() in when the process exits.
> +	 * We don't call amd_iommu_unbind_pasid() here
> +	 * because the IOMMU called us.
> +	 */
> +	pdd->bound = false;
> +
> +	mutex_unlock(&p->mutex);
> +}
> +
> +/* Ensure that the process's queue array is large enough to hold the queue at queue_id.
> + * Assumes that the process lock is held. */
> +static bool ensure_queue_array_size(struct kfd_process *p, unsigned int queue_id)
> +{
> +	size_t desired_size;
> +	struct kfd_queue **new_queues;
> +
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE > 0, "INITIAL_QUEUE_ARRAY_SIZE must not be 0");
> +	compiletime_assert(INITIAL_QUEUE_ARRAY_SIZE <= MAX_PROCESS_QUEUES,
> +			   "INITIAL_QUEUE_ARRAY_SIZE must be less than MAX_PROCESS_QUEUES");
> +	/* Ensure that doubling the current size won't ever overflow. */
> +	compiletime_assert(MAX_PROCESS_QUEUES < SIZE_MAX / 2, "MAX_PROCESS_QUEUES must be less than SIZE_MAX/2");
> +
> +	/*
> +	 * These & queue_id < MAX_PROCESS_QUEUES guarantee that
> +	 * the desired_size calculation will end up <= MAX_PROCESS_QUEUES
> +	 */
> +	compiletime_assert(is_power_of_2(INITIAL_QUEUE_ARRAY_SIZE), "INITIAL_QUEUE_ARRAY_SIZE must be power of 2.");
> +	compiletime_assert(MAX_PROCESS_QUEUES % INITIAL_QUEUE_ARRAY_SIZE == 0,
> +			   "MAX_PROCESS_QUEUES must be multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +	compiletime_assert(is_power_of_2(MAX_PROCESS_QUEUES / INITIAL_QUEUE_ARRAY_SIZE),
> +			   "MAX_PROCESS_QUEUES must be a power-of-2 multiple of INITIAL_QUEUE_ARRAY_SIZE.");
> +
> +	if (queue_id < p->queue_array_size)
> +		return true;
> +
> +	if (queue_id >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	desired_size = p->queue_array_size;
> +	while (desired_size <= queue_id)
> +		desired_size *= 2;
> +
> +	BUG_ON(desired_size < queue_id || desired_size > MAX_PROCESS_QUEUES);
> +	BUG_ON(desired_size % INITIAL_QUEUE_ARRAY_SIZE != 0 || !is_power_of_2(desired_size / INITIAL_QUEUE_ARRAY_SIZE));
> +
> +	new_queues = kmalloc_array(desired_size, sizeof(p->queues[0]), GFP_KERNEL);
> +	if (!new_queues)
> +		return false;
> +
> +	memcpy(new_queues, p->queues, p->queue_array_size * sizeof(p->queues[0]));
> +
> +	kfree(p->queues);
> +	p->queues = new_queues;
> +	p->queue_array_size = desired_size;
> +
> +	return true;
> +}
> +
> +/* Assumes that the process lock is held. */
> +bool radeon_kfd_allocate_queue_id(struct kfd_process *p, unsigned int *queue_id)
> +{
> +	unsigned int qid = find_first_zero_bit(p->allocated_queue_bitmap, MAX_PROCESS_QUEUES);
> +
> +	if (qid >= MAX_PROCESS_QUEUES)
> +		return false;
> +
> +	if (!ensure_queue_array_size(p, qid))
> +		return false;
> +
> +	__set_bit(qid, p->allocated_queue_bitmap);
> +
> +	p->queues[qid] = NULL;
> +	*queue_id = qid;
> +
> +	return true;
> +}
> +
> +/* Install a queue into a previously-allocated queue id.
> + *  Assumes that the process lock is held. */
> +void radeon_kfd_install_queue(struct kfd_process *p, unsigned int queue_id, struct kfd_queue *queue)
> +{
> +	BUG_ON(queue_id >= p->queue_array_size); /* Have to call allocate_queue_id before install_queue. */
> +	BUG_ON(queue == NULL);
> +
> +	p->queues[queue_id] = queue;
> +}
> +
> +/* Remove a queue from the open queue list and deallocate the queue id.
> + * This can be called whether or not a queue was installed.
> + * Assumes that the process lock is held. */
> +void radeon_kfd_remove_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	BUG_ON(!test_bit(queue_id, p->allocated_queue_bitmap));
> +	BUG_ON(queue_id >= p->queue_array_size);
> +
> +	__clear_bit(queue_id, p->allocated_queue_bitmap);
> +}
> +
> +/* Assumes that the process lock is held. */
> +struct kfd_queue *radeon_kfd_get_queue(struct kfd_process *p, unsigned int queue_id)
> +{
> +	/* test_bit because the contents of unallocated queue slots are undefined.
> +	 * Otherwise ensure_queue_array_size would have to clear new entries and
> +	 * remove_queue would have to NULL removed queues. */
> +	return (queue_id < p->queue_array_size &&
> +		test_bit(queue_id, p->allocated_queue_bitmap)) ?
> +			p->queues[queue_id] : NULL;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> new file mode 100644
> index 0000000..48a032f
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> @@ -0,0 +1,62 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_SCHEDULER_H_INCLUDED
> +#define KFD_SCHEDULER_H_INCLUDED
> +
> +#include <linux/types.h>
> +struct kfd_process;
> +
> +/* Opaque types for scheduler private data. */
> +struct kfd_scheduler;
> +struct kfd_scheduler_process;
> +struct kfd_scheduler_queue;
> +
> +struct kfd_scheduler_class {
> +	const char *name;
> +
> +	int (*create)(struct kfd_dev *, struct kfd_scheduler **);
> +	void (*destroy)(struct kfd_scheduler *);
> +
> +	void (*start)(struct kfd_scheduler *);
> +	void (*stop)(struct kfd_scheduler *);
> +
> +	int (*register_process)(struct kfd_scheduler *, struct kfd_process *, struct kfd_scheduler_process **);
> +	void (*deregister_process)(struct kfd_scheduler *, struct kfd_scheduler_process *);
> +
> +	size_t queue_size;
> +
> +	int (*create_queue)(struct kfd_scheduler *scheduler,
> +			    struct kfd_scheduler_process *process,
> +			    struct kfd_scheduler_queue *queue,
> +			    void __user *ring_address,
> +			    uint64_t ring_size,
> +			    void __user *rptr_address,
> +			    void __user *wptr_address,
> +			    unsigned int doorbell);
> +
> +	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
> +};
> +
> +extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_topology.c b/drivers/gpu/hsa/radeon/kfd_topology.c
> new file mode 100644
> index 0000000..6acac25
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_topology.c
> @@ -0,0 +1,1201 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/errno.h>
> +#include <linux/acpi.h>
> +#include <linux/hash.h>
> +
> +#include "kfd_priv.h"
> +#include "kfd_crat.h"
> +#include "kfd_topology.h"
> +
> +static struct list_head topology_device_list;
> +static int topology_crat_parsed;
> +static struct kfd_system_properties sys_props;
> +
> +static DECLARE_RWSEM(topology_lock);
> +
> +
> +static uint8_t checksum_image(const void *buf, size_t len)
> +{
> +	uint8_t *p = (uint8_t *)buf;
> +	uint8_t sum = 0;
> +
> +	if (!buf)
> +		return 0;
> +
> +	while (len-- > 0)
> +		sum += *p++;
> +
> +	return sum;
> +		}
> +
> +struct kfd_dev *radeon_kfd_device_by_id(uint32_t gpu_id)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu_id == gpu_id) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct kfd_dev *device = NULL;
> +
> +	down_read(&topology_lock);
> +
> +	list_for_each_entry(top_dev, &topology_device_list, list)
> +		if (top_dev->gpu->pdev == pdev) {
> +			device = top_dev->gpu;
> +			break;
> +		}
> +
> +	up_read(&topology_lock);
> +
> +	return device;
> +}
> +
> +static int kfd_topology_get_crat_acpi(void *crat_image, size_t *size)
> +{
> +	struct acpi_table_header *crat_table;
> +	acpi_status status;
> +
> +	if (!size)
> +		return -EINVAL;
> +
> +/*
> +	 * Fetch the CRAT table from ACPI
> + */
> +	status = acpi_get_table(CRAT_SIGNATURE, 0, &crat_table);
> +	if (status == AE_NOT_FOUND) {
> +		pr_warn("CRAT table not found\n");
> +		return -ENODATA;
> +	} else if (ACPI_FAILURE(status)) {
> +		const char *err = acpi_format_exception(status);
> +
> +		pr_err("CRAT table error: %s\n", err);
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * The checksum of the table should be verified
> +	 */
> +	if (checksum_image(crat_table, crat_table->length) ==
> +		crat_table->checksum) {
> +		pr_err("Bad checksum for the CRAT table\n");
> +		return -EINVAL;
> +}
> +
> +
> +	if (*size >= crat_table->length && crat_image != 0)
> +		memcpy(crat_image, crat_table, crat_table->length);
> +
> +	*size = crat_table->length;
> +
> +	return 0;
> +}
> +
> +static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.cpu_cores_count = cu->num_cpu_cores;
> +	dev->node_props.cpu_core_id_base = cu->processor_id_low;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_IOMMU_PRESENT)
> +		dev->node_props.capability |= HSA_CAP_ATS_PRESENT;
> +
> +	pr_info("CU CPU: cores=%d id_base=%d\n", cu->num_cpu_cores,
> +			cu->processor_id_low);
> +}
> +
> +static void kfd_populated_cu_info_gpu(struct kfd_topology_device *dev,
> +		struct crat_subtype_computeunit *cu)
> +{
> +	BUG_ON(!dev);
> +	BUG_ON(!cu);
> +
> +	dev->node_props.simd_id_base = cu->processor_id_low;
> +	dev->node_props.simd_count = cu->num_simd_cores;
> +	dev->node_props.lds_size_in_kb = cu->lds_size_in_kb;
> +	dev->node_props.max_waves_per_simd = cu->max_waves_simd;
> +	dev->node_props.wave_front_size = cu->wave_front_size;
> +	dev->node_props.mem_banks_count = cu->num_banks;
> +	dev->node_props.array_count = cu->num_arrays;
> +	dev->node_props.cu_per_simd_array = cu->num_cu_per_array;
> +	dev->node_props.simd_per_cu = cu->num_simd_per_cu;
> +	dev->node_props.max_slots_scratch_cu = cu->max_slots_scatch_cu;
> +	if (cu->hsa_capability & CRAT_CU_FLAGS_HOT_PLUGGABLE)
> +		dev->node_props.capability |= HSA_CAP_HOT_PLUGGABLE;
> +	pr_info("CU GPU: simds=%d id_base=%d\n", cu->num_simd_cores,
> +				cu->processor_id_low);
> +}
> +
> +/* kfd_parse_subtype_cu is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cu(struct crat_subtype_computeunit *cu)
> +{
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!cu);
> +
> +	pr_info("Found CU entry in CRAT table with proximity_domain=%d caps=%x\n",
> +			cu->proximity_domain, cu->hsa_capability);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (cu->proximity_domain == i) {
> +			if (cu->flags & CRAT_CU_FLAGS_CPU_PRESENT)
> +				kfd_populated_cu_info_cpu(dev, cu);
> +
> +			if (cu->flags & CRAT_CU_FLAGS_GPU_PRESENT)
> +				kfd_populated_cu_info_gpu(dev, cu);
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_mem is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_mem(struct crat_subtype_memory *mem)
> +{
> +	struct kfd_mem_properties *props;
> +	struct kfd_topology_device *dev;
> +	int i = 0;
> +
> +	BUG_ON(!mem);
> +
> +	pr_info("Found memory entry in CRAT table with proximity_domain=%d\n",
> +			mem->promixity_domain);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (mem->promixity_domain == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			if (dev->node_props.cpu_cores_count == 0)
> +				props->heap_type = HSA_MEM_HEAP_TYPE_FB_PRIVATE;
> +			else
> +				props->heap_type = HSA_MEM_HEAP_TYPE_SYSTEM;
> +
> +			if (mem->flags & CRAT_MEM_FLAGS_HOT_PLUGGABLE)
> +				props->flags |= HSA_MEM_FLAGS_HOT_PLUGGABLE;
> +			if (mem->flags & CRAT_MEM_FLAGS_NON_VOLATILE)
> +				props->flags |= HSA_MEM_FLAGS_NON_VOLATILE;
> +
> +			props->size_in_bytes = ((uint64_t)mem->length_high << 32) +
> +						mem->length_low;
> +			props->width = mem->width;
> +
> +			dev->mem_bank_count++;
> +			list_add_tail(&props->list, &dev->mem_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_cache is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_cache(struct crat_subtype_cache *cache)
> +{
> +	struct kfd_cache_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t id;
> +
> +	BUG_ON(!cache);
> +
> +	id = cache->processor_id_low;
> +
> +	pr_info("Found cache entry in CRAT table with processor_id=%d\n", id);
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (id == dev->node_props.cpu_core_id_base ||
> +		    id == dev->node_props.simd_id_base) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->processor_id_low = id;
> +			props->cache_level = cache->cache_level;
> +			props->cache_size = cache->cache_size;
> +			props->cacheline_size = cache->cache_line_size;
> +			props->cachelines_per_tag = cache->lines_per_tag;
> +			props->cache_assoc = cache->associativity;
> +			props->cache_latency = cache->cache_latency;
> +
> +			if (cache->flags & CRAT_CACHE_FLAGS_DATA_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_DATA;
> +			if (cache->flags & CRAT_CACHE_FLAGS_INST_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_INSTRUCTION;
> +			if (cache->flags & CRAT_CACHE_FLAGS_CPU_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_CPU;
> +			if (cache->flags & CRAT_CACHE_FLAGS_SIMD_CACHE)
> +				props->cache_type |= HSA_CACHE_TYPE_HSACU;
> +
> +			dev->cache_count++;
> +			dev->node_props.caches_count++;
> +			list_add_tail(&props->list, &dev->cache_props);
> +
> +			break;
> +		}
> +
> +	return 0;
> +}
> +
> +/* kfd_parse_subtype_iolink is called when the topology mutex is already acquired */
> +static int kfd_parse_subtype_iolink(struct crat_subtype_iolink *iolink)
> +{
> +	struct kfd_iolink_properties *props;
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +	uint32_t id_from;
> +	uint32_t id_to;
> +
> +	BUG_ON(!iolink);
> +
> +	id_from = iolink->proximity_domain_from;
> +	id_to = iolink->proximity_domain_to;
> +
> +	pr_info("Found IO link entry in CRAT table with id_from=%d\n", id_from);
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		if (id_from == i) {
> +			props = kfd_alloc_struct(props);
> +			if (props == 0)
> +				return -ENOMEM;
> +
> +			props->node_from = id_from;
> +			props->node_to = id_to;
> +			props->ver_maj = iolink->version_major;
> +			props->ver_min = iolink->version_minor;
> +
> +			/*
> +			 * weight factor (derived from CDIR), currently always 1
> +			 */
> +			props->weight = 1;
> +
> +			props->min_latency = iolink->minimum_latency;
> +			props->max_latency = iolink->maximum_latency;
> +			props->min_bandwidth = iolink->minimum_bandwidth_mbs;
> +			props->max_bandwidth = iolink->maximum_bandwidth_mbs;
> +			props->rec_transfer_size =
> +					iolink->recommended_transfer_size;
> +
> +			dev->io_link_count++;
> +			dev->node_props.io_links_count++;
> +			list_add_tail(&props->list, &dev->io_link_props);
> +
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static int kfd_parse_subtype(struct crat_subtype_generic *sub_type_hdr)
> +{
> +	struct crat_subtype_computeunit *cu;
> +	struct crat_subtype_memory *mem;
> +	struct crat_subtype_cache *cache;
> +	struct crat_subtype_iolink *iolink;
> +	int ret = 0;
> +
> +	BUG_ON(!sub_type_hdr);
> +
> +	switch (sub_type_hdr->type) {
> +	case CRAT_SUBTYPE_COMPUTEUNIT_AFFINITY:
> +		cu = (struct crat_subtype_computeunit *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cu(cu);
> +		break;
> +	case CRAT_SUBTYPE_MEMORY_AFFINITY:
> +		mem = (struct crat_subtype_memory *)sub_type_hdr;
> +		ret = kfd_parse_subtype_mem(mem);
> +		break;
> +	case CRAT_SUBTYPE_CACHE_AFFINITY:
> +		cache = (struct crat_subtype_cache *)sub_type_hdr;
> +		ret = kfd_parse_subtype_cache(cache);
> +		break;
> +	case CRAT_SUBTYPE_TLB_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found TLB entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_CCOMPUTE_AFFINITY:
> +		/*
> +		 * For now, nothing to do here
> +		 */
> +		pr_info("Found CCOMPUTE entry in CRAT table (not processing)\n");
> +		break;
> +	case CRAT_SUBTYPE_IOLINK_AFFINITY:
> +		iolink = (struct crat_subtype_iolink *)sub_type_hdr;
> +		ret = kfd_parse_subtype_iolink(iolink);
> +		break;
> +	default:
> +		pr_warn("Unknown subtype (%d) in CRAT\n",
> +				sub_type_hdr->type);
> +	}
> +
> +	return ret;
> +}
> +
> +static void kfd_release_topology_device(struct kfd_topology_device *dev)
> +{
> +	struct kfd_mem_properties *mem;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_iolink_properties *iolink;
> +
> +	BUG_ON(!dev);
> +
> +	list_del(&dev->list);
> +
> +	while (dev->mem_props.next != &dev->mem_props) {
> +		mem = container_of(dev->mem_props.next,
> +				struct kfd_mem_properties, list);
> +		list_del(&mem->list);
> +		kfree(mem);
> +	}
> +
> +	while (dev->cache_props.next != &dev->cache_props) {
> +		cache = container_of(dev->cache_props.next,
> +				struct kfd_cache_properties, list);
> +		list_del(&cache->list);
> +		kfree(cache);
> +	}
> +
> +	while (dev->io_link_props.next != &dev->io_link_props) {
> +		iolink = container_of(dev->io_link_props.next,
> +				struct kfd_iolink_properties, list);
> +		list_del(&iolink->list);
> +		kfree(iolink);
> +	}
> +
> +	kfree(dev);
> +
> +	sys_props.num_devices--;
> +}
> +
> +static void kfd_release_live_view(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	while (topology_device_list.next != &topology_device_list) {
> +		dev = container_of(topology_device_list.next,
> +				 struct kfd_topology_device, list);
> +		kfd_release_topology_device(dev);
> +}
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +}
> +
> +static struct kfd_topology_device *kfd_create_topology_device(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	dev = kfd_alloc_struct(dev);
> +	if (dev == 0) {
> +		pr_err("No memory to allocate a topology device");
> +		return 0;
> +	}
> +
> +	INIT_LIST_HEAD(&dev->mem_props);
> +	INIT_LIST_HEAD(&dev->cache_props);
> +	INIT_LIST_HEAD(&dev->io_link_props);
> +
> +	list_add_tail(&dev->list, &topology_device_list);
> +	sys_props.num_devices++;
> +
> +	return dev;
> +	}
> +
> +static int kfd_parse_crat_table(void *crat_image)
> +{
> +	struct kfd_topology_device *top_dev;
> +	struct crat_subtype_generic *sub_type_hdr;
> +	uint16_t node_id;
> +	int ret;
> +	struct crat_header *crat_table = (struct crat_header *)crat_image;
> +	uint16_t num_nodes;
> +	uint32_t image_len;
> +
> +	if (!crat_image)
> +		return -EINVAL;
> +
> +	num_nodes = crat_table->num_domains;
> +	image_len = crat_table->length;
> +
> +	pr_info("Parsing CRAT table with %d nodes\n", num_nodes);
> +
> +	for (node_id = 0; node_id < num_nodes; node_id++) {
> +		top_dev = kfd_create_topology_device();
> +		if (!top_dev) {
> +			kfd_release_live_view();
> +			return -ENOMEM;
> +	}
> +}
> +
> +	sys_props.platform_id = *((uint64_t *)crat_table->oem_id);
> +	sys_props.platform_oem = *((uint64_t *)crat_table->oem_table_id);
> +	sys_props.platform_rev = crat_table->revision;
> +
> +	sub_type_hdr = (struct crat_subtype_generic *)(crat_table+1);
> +	while ((char *)sub_type_hdr + sizeof(struct crat_subtype_generic) <
> +			((char *)crat_image) + image_len) {
> +		if (sub_type_hdr->flags & CRAT_SUBTYPE_FLAGS_ENABLED) {
> +			ret = kfd_parse_subtype(sub_type_hdr);
> +			if (ret != 0) {
> +				kfd_release_live_view();
> +				return ret;
> +			}
> +		}
> +
> +		sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
> +				sub_type_hdr->length);
> +	}
> +
> +	sys_props.generation_count++;
> +	topology_crat_parsed = 1;
> +
> +	return 0;
> +}
> +
> +
> +#define sysfs_show_gen_prop(buffer, fmt, ...) \
> +		snprintf(buffer, PAGE_SIZE, "%s"fmt, buffer, __VA_ARGS__)
> +#define sysfs_show_32bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %u\n", name, value)
> +#define sysfs_show_64bit_prop(buffer, name, value) \
> +		sysfs_show_gen_prop(buffer, "%s %llu\n", name, value)
> +#define sysfs_show_32bit_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%u\n", value)
> +#define sysfs_show_str_val(buffer, value) \
> +		sysfs_show_gen_prop(buffer, "%s\n", value)
> +
> +static ssize_t sysprops_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (attr == &sys_props.attr_genid) {
> +		ret = sysfs_show_32bit_val(buffer, sys_props.generation_count);
> +	} else if (attr == &sys_props.attr_props) {
> +		sysfs_show_64bit_prop(buffer, "platform_oem",
> +				sys_props.platform_oem);
> +		sysfs_show_64bit_prop(buffer, "platform_id",
> +				sys_props.platform_id);
> +		ret = sysfs_show_64bit_prop(buffer, "platform_rev",
> +				sys_props.platform_rev);
> +	} else {
> +		ret = -EINVAL;
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops sysprops_ops = {
> +	.show = sysprops_show,
> +};
> +
> +static struct kobj_type sysprops_type = {
> +	.sysfs_ops = &sysprops_ops,
> +};
> +
> +static ssize_t iolink_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_iolink_properties *iolink;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	iolink = container_of(attr, struct kfd_iolink_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "type", iolink->iolink_type);
> +	sysfs_show_32bit_prop(buffer, "version_major", iolink->ver_maj);
> +	sysfs_show_32bit_prop(buffer, "version_minor", iolink->ver_min);
> +	sysfs_show_32bit_prop(buffer, "node_from", iolink->node_from);
> +	sysfs_show_32bit_prop(buffer, "node_to", iolink->node_to);
> +	sysfs_show_32bit_prop(buffer, "weight", iolink->weight);
> +	sysfs_show_32bit_prop(buffer, "min_latency", iolink->min_latency);
> +	sysfs_show_32bit_prop(buffer, "max_latency", iolink->max_latency);
> +	sysfs_show_32bit_prop(buffer, "min_bandwidth", iolink->min_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "max_bandwidth", iolink->max_bandwidth);
> +	sysfs_show_32bit_prop(buffer, "recommended_transfer_size",
> +			iolink->rec_transfer_size);
> +	ret = sysfs_show_32bit_prop(buffer, "flags", iolink->flags);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops iolink_ops = {
> +	.show = iolink_show,
> +};
> +
> +static struct kobj_type iolink_type = {
> +	.sysfs_ops = &iolink_ops,
> +};
> +
> +static ssize_t mem_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_mem_properties *mem;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	mem = container_of(attr, struct kfd_mem_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "heap_type", mem->heap_type);
> +	sysfs_show_64bit_prop(buffer, "size_in_bytes", mem->size_in_bytes);
> +	sysfs_show_32bit_prop(buffer, "flags", mem->flags);
> +	sysfs_show_32bit_prop(buffer, "width", mem->width);
> +	ret = sysfs_show_32bit_prop(buffer, "mem_clk_max", mem->mem_clk_max);
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops mem_ops = {
> +	.show = mem_show,
> +};
> +
> +static struct kobj_type mem_type = {
> +	.sysfs_ops = &mem_ops,
> +};
> +
> +static ssize_t kfd_cache_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	uint32_t i;
> +	struct kfd_cache_properties *cache;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	cache = container_of(attr, struct kfd_cache_properties, attr);
> +	sysfs_show_32bit_prop(buffer, "processor_id_low",
> +			cache->processor_id_low);
> +	sysfs_show_32bit_prop(buffer, "level", cache->cache_level);
> +	sysfs_show_32bit_prop(buffer, "size", cache->cache_size);
> +	sysfs_show_32bit_prop(buffer, "cache_line_size", cache->cacheline_size);
> +	sysfs_show_32bit_prop(buffer, "cache_lines_per_tag",
> +			cache->cachelines_per_tag);
> +	sysfs_show_32bit_prop(buffer, "association", cache->cache_assoc);
> +	sysfs_show_32bit_prop(buffer, "latency", cache->cache_latency);
> +	sysfs_show_32bit_prop(buffer, "type", cache->cache_type);
> +	snprintf(buffer, PAGE_SIZE, "%ssibling_map ", buffer);
> +	for (i = 0; i < KFD_TOPOLOGY_CPU_SIBLINGS; i++)
> +		ret = snprintf(buffer, PAGE_SIZE, "%s%d%s",
> +				buffer, cache->sibling_map[i],
> +				(i == KFD_TOPOLOGY_CPU_SIBLINGS-1) ?
> +						"\n" : ",");
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops cache_ops = {
> +	.show = kfd_cache_show,
> +};
> +
> +static struct kobj_type cache_type = {
> +	.sysfs_ops = &cache_ops,
> +};
> +
> +static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
> +		char *buffer)
> +{
> +	ssize_t ret;
> +	struct kfd_topology_device *dev;
> +	char public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +	uint32_t i;
> +
> +	/* Making sure that the buffer is an empty string */
> +	buffer[0] = 0;
> +
> +	if (strcmp(attr->name, "gpu_id") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_gpuid);
> +		ret = sysfs_show_32bit_val(buffer, dev->gpu_id);
> +	} else if (strcmp(attr->name, "name") == 0) {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_name);
> +		for (i = 0; i < KFD_TOPOLOGY_PUBLIC_NAME_SIZE; i++) {
> +			public_name[i] =
> +					(char)dev->node_props.marketing_name[i];
> +			if (dev->node_props.marketing_name[i] == 0)
> +				break;
> +		}
> +		public_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE-1] = 0x0;
> +		ret = sysfs_show_str_val(buffer, public_name);
> +	} else {
> +		dev = container_of(attr, struct kfd_topology_device,
> +				attr_props);
> +		sysfs_show_32bit_prop(buffer, "cpu_cores_count",
> +				dev->node_props.cpu_cores_count);
> +		sysfs_show_32bit_prop(buffer, "simd_count",
> +				dev->node_props.simd_count);
> +		sysfs_show_32bit_prop(buffer, "mem_banks_count",
> +				dev->node_props.mem_banks_count);
> +		sysfs_show_32bit_prop(buffer, "caches_count",
> +				dev->node_props.caches_count);
> +		sysfs_show_32bit_prop(buffer, "io_links_count",
> +				dev->node_props.io_links_count);
> +		sysfs_show_32bit_prop(buffer, "cpu_core_id_base",
> +				dev->node_props.cpu_core_id_base);
> +		sysfs_show_32bit_prop(buffer, "simd_id_base",
> +				dev->node_props.simd_id_base);
> +		sysfs_show_32bit_prop(buffer, "capability",
> +				dev->node_props.capability);
> +		sysfs_show_32bit_prop(buffer, "max_waves_per_simd",
> +				dev->node_props.max_waves_per_simd);
> +		sysfs_show_32bit_prop(buffer, "lds_size_in_kb",
> +				dev->node_props.lds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "gds_size_in_kb",
> +				dev->node_props.gds_size_in_kb);
> +		sysfs_show_32bit_prop(buffer, "wave_front_size",
> +				dev->node_props.wave_front_size);
> +		sysfs_show_32bit_prop(buffer, "array_count",
> +				dev->node_props.array_count);
> +		sysfs_show_32bit_prop(buffer, "simd_arrays_per_engine",
> +				dev->node_props.simd_arrays_per_engine);
> +		sysfs_show_32bit_prop(buffer, "cu_per_simd_array",
> +				dev->node_props.cu_per_simd_array);
> +		sysfs_show_32bit_prop(buffer, "simd_per_cu",
> +				dev->node_props.simd_per_cu);
> +		sysfs_show_32bit_prop(buffer, "max_slots_scratch_cu",
> +				dev->node_props.max_slots_scratch_cu);
> +		sysfs_show_32bit_prop(buffer, "engine_id",
> +				dev->node_props.engine_id);
> +		sysfs_show_32bit_prop(buffer, "vendor_id",
> +				dev->node_props.vendor_id);
> +		sysfs_show_32bit_prop(buffer, "device_id",
> +				dev->node_props.device_id);
> +		sysfs_show_32bit_prop(buffer, "location_id",
> +				dev->node_props.location_id);
> +		sysfs_show_32bit_prop(buffer, "max_engine_clk_fcompute",
> +				dev->node_props.max_engine_clk_fcompute);
> +		ret = sysfs_show_32bit_prop(buffer, "max_engine_clk_ccompute",
> +				dev->node_props.max_engine_clk_ccompute);
> +	}
> +
> +	return ret;
> +}
> +
> +static const struct sysfs_ops node_ops = {
> +	.show = node_show,
> +};
> +
> +static struct kobj_type node_type = {
> +	.sysfs_ops = &node_ops,
> +};
> +
> +static void kfd_remove_sysfs_file(struct kobject *kobj, struct attribute *attr)
> +{
> +	sysfs_remove_file(kobj, attr);
> +	kobject_del(kobj);
> +	kobject_put(kobj);
> +}
> +
> +static void kfd_remove_sysfs_node_entry(struct kfd_topology_device *dev)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +
> +	BUG_ON(!dev);
> +
> +	if (dev->kobj_iolink) {
> +		list_for_each_entry(iolink, &dev->io_link_props, list)
> +			if (iolink->kobj) {
> +				kfd_remove_sysfs_file(iolink->kobj, &iolink->attr);
> +				iolink->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_iolink);
> +		kobject_put(dev->kobj_iolink);
> +		dev->kobj_iolink = 0;
> +	}
> +
> +	if (dev->kobj_cache) {
> +		list_for_each_entry(cache, &dev->cache_props, list)
> +			if (cache->kobj) {
> +				kfd_remove_sysfs_file(cache->kobj, &cache->attr);
> +				cache->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_cache);
> +		kobject_put(dev->kobj_cache);
> +		dev->kobj_cache = 0;
> +	}
> +
> +	if (dev->kobj_mem) {
> +		list_for_each_entry(mem, &dev->mem_props, list)
> +			if (mem->kobj) {
> +				kfd_remove_sysfs_file(mem->kobj, &mem->attr);
> +				mem->kobj = 0;
> +			}
> +		kobject_del(dev->kobj_mem);
> +		kobject_put(dev->kobj_mem);
> +		dev->kobj_mem = 0;
> +	}
> +
> +	if (dev->kobj_node) {
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_gpuid);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_name);
> +		sysfs_remove_file(dev->kobj_node, &dev->attr_props);
> +		kobject_del(dev->kobj_node);
> +		kobject_put(dev->kobj_node);
> +		dev->kobj_node = 0;
> +	}
> +}
> +
> +static int kfd_build_sysfs_node_entry(struct kfd_topology_device *dev,
> +		uint32_t id)
> +{
> +	struct kfd_iolink_properties *iolink;
> +	struct kfd_cache_properties *cache;
> +	struct kfd_mem_properties *mem;
> +	int ret;
> +	uint32_t i;
> +
> +	BUG_ON(!dev);
> +
> +	/*
> +	 * Creating the sysfs folders
> +	 */
> +	BUG_ON(dev->kobj_node);
> +	dev->kobj_node = kfd_alloc_struct(dev->kobj_node);
> +	if (!dev->kobj_node)
> +		return -ENOMEM;
> +
> +	ret = kobject_init_and_add(dev->kobj_node, &node_type,
> +			sys_props.kobj_nodes, "%d", id);
> +	if (ret < 0)
> +		return ret;
> +
> +	dev->kobj_mem = kobject_create_and_add("mem_banks", dev->kobj_node);
> +	if (!dev->kobj_mem)
> +		return -ENOMEM;
> +
> +	dev->kobj_cache = kobject_create_and_add("caches", dev->kobj_node);
> +	if (!dev->kobj_cache)
> +		return -ENOMEM;
> +
> +	dev->kobj_iolink = kobject_create_and_add("io_links", dev->kobj_node);
> +	if (!dev->kobj_iolink)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Creating sysfs files for node properties
> +	 */
> +	dev->attr_gpuid.name = "gpu_id";
> +	dev->attr_gpuid.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_gpuid);
> +	dev->attr_name.name = "name";
> +	dev->attr_name.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_name);
> +	dev->attr_props.name = "properties";
> +	dev->attr_props.mode = KFD_SYSFS_FILE_MODE;
> +	sysfs_attr_init(&dev->attr_props);
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_gpuid);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_name);
> +	if (ret < 0)
> +		return ret;
> +	ret = sysfs_create_file(dev->kobj_node, &dev->attr_props);
> +	if (ret < 0)
> +		return ret;
> +
> +	i = 0;
> +	list_for_each_entry(mem, &dev->mem_props, list) {
> +		mem->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!mem->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(mem->kobj, &mem_type,
> +				dev->kobj_mem, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		mem->attr.name = "properties";
> +		mem->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&mem->attr);
> +		ret = sysfs_create_file(mem->kobj, &mem->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(cache, &dev->cache_props, list) {
> +		cache->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!cache->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(cache->kobj, &cache_type,
> +				dev->kobj_cache, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		cache->attr.name = "properties";
> +		cache->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&cache->attr);
> +		ret = sysfs_create_file(cache->kobj, &cache->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	i = 0;
> +	list_for_each_entry(iolink, &dev->io_link_props, list) {
> +		iolink->kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
> +		if (!iolink->kobj)
> +			return -ENOMEM;
> +		ret = kobject_init_and_add(iolink->kobj, &iolink_type,
> +				dev->kobj_iolink, "%d", i);
> +		if (ret < 0)
> +			return ret;
> +
> +		iolink->attr.name = "properties";
> +		iolink->attr.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&iolink->attr);
> +		ret = sysfs_create_file(iolink->kobj, &iolink->attr);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +}
> +
> +	return 0;
> +}
> +
> +static int kfd_build_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +	int ret;
> +	uint32_t i = 0;
> +
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		ret = kfd_build_sysfs_node_entry(dev, 0);
> +		if (ret < 0)
> +			return ret;
> +		i++;
> +	}
> +
> +	return 0;
> +}
> +
> +static void kfd_remove_sysfs_node_tree(void)
> +{
> +	struct kfd_topology_device *dev;
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		kfd_remove_sysfs_node_entry(dev);
> +}
> +
> +static int kfd_topology_update_sysfs(void)
> +{
> +	int ret;
> +
> +	pr_info("Creating topology SYSFS entries\n");
> +	if (sys_props.kobj_topology == 0) {
> +		sys_props.kobj_topology = kfd_alloc_struct(sys_props.kobj_topology);
> +		if (!sys_props.kobj_topology)
> +			return -ENOMEM;
> +
> +		ret = kobject_init_and_add(sys_props.kobj_topology,
> +				&sysprops_type,  &kfd_device->kobj,
> +				"topology");
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.kobj_nodes = kobject_create_and_add("nodes",
> +				sys_props.kobj_topology);
> +		if (!sys_props.kobj_nodes)
> +			return -ENOMEM;
> +
> +		sys_props.attr_genid.name = "generation_id";
> +		sys_props.attr_genid.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_genid);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		if (ret < 0)
> +			return ret;
> +
> +		sys_props.attr_props.name = "system_properties";
> +		sys_props.attr_props.mode = KFD_SYSFS_FILE_MODE;
> +		sysfs_attr_init(&sys_props.attr_props);
> +		ret = sysfs_create_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (ret < 0)
> +			return ret;
> +	}
> +
> +	kfd_remove_sysfs_node_tree();
> +
> +	return kfd_build_sysfs_node_tree();
> +}
> +
> +static void kfd_topology_release_sysfs(void)
> +{
> +	kfd_remove_sysfs_node_tree();
> +	if (sys_props.kobj_topology) {
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_genid);
> +		sysfs_remove_file(sys_props.kobj_topology,
> +				&sys_props.attr_props);
> +		if (sys_props.kobj_nodes) {
> +			kobject_del(sys_props.kobj_nodes);
> +			kobject_put(sys_props.kobj_nodes);
> +			sys_props.kobj_nodes = 0;
> +		}
> +		kobject_del(sys_props.kobj_topology);
> +		kobject_put(sys_props.kobj_topology);
> +		sys_props.kobj_topology = 0;
> +	}
> +}
> +
> +int kfd_topology_init(void)
> +{
> +	void *crat_image = 0;
> +	size_t image_size = 0;
> +	int ret;
> +
> +	/*
> +	 * Initialize the head for the topology device list
> +	 */
> +	INIT_LIST_HEAD(&topology_device_list);
> +	init_rwsem(&topology_lock);
> +	topology_crat_parsed = 0;
> +
> +	memset(&sys_props, 0, sizeof(sys_props));
> +
> +	/*
> +	 * Get the CRAT image from the ACPI
> +	 */
> +	ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +	if (ret == 0 && image_size > 0) {
> +		pr_info("Found CRAT image with size=%zd\n", image_size);
> +		crat_image = kmalloc(image_size, GFP_KERNEL);
> +		if (!crat_image) {
> +			ret = -ENOMEM;
> +			pr_err("No memory for allocating CRAT image\n");
> +			goto err;
> +		}
> +		ret = kfd_topology_get_crat_acpi(crat_image, &image_size);
> +
> +		if (ret == 0) {
> +			down_write(&topology_lock);
> +			ret = kfd_parse_crat_table(crat_image);
> +			if (ret == 0)
> +				ret = kfd_topology_update_sysfs();
> +			up_write(&topology_lock);
> +		} else {
> +			pr_err("Couldn't get CRAT table size from ACPI\n");
> +		}
> +		kfree(crat_image);
> +	} else if (ret == -ENODATA) {
> +		ret = 0;
> +	} else {
> +		pr_err("Couldn't get CRAT table size from ACPI\n");
> +	}
> +
> +err:
> +	pr_info("Finished initializing topology ret=%d\n", ret);
> +	return ret;
> +}
> +
> +void kfd_topology_shutdown(void)
> +{
> +	kfd_topology_release_sysfs();
> +	kfd_release_live_view();
> +}
> +
> +static void kfd_debug_print_topology(void)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t i = 0;
> +
> +	pr_info("DEBUG PRINT OF TOPOLOGY:");
> +	list_for_each_entry(dev, &topology_device_list, list) {
> +		pr_info("Node: %d\n", i);
> +		pr_info("\tGPU assigned: %s\n", (dev->gpu ? "yes" : "no"));
> +		pr_info("\tCPU count: %d\n", dev->node_props.cpu_cores_count);
> +		pr_info("\tSIMD count: %d", dev->node_props.simd_count);
> +		i++;
> +	}
> +}
> +
> +static uint32_t kfd_generate_gpu_id(struct kfd_dev *gpu)
> +{
> +	uint32_t hashout;
> +	uint32_t buf[7];
> +	int i;
> +
> +	if (!gpu)
> +		return 0;
> +
> +	buf[0] = gpu->pdev->devfn;
> +	buf[1] = gpu->pdev->subsystem_vendor;
> +	buf[2] = gpu->pdev->subsystem_device;
> +	buf[3] = gpu->pdev->device;
> +	buf[4] = gpu->pdev->bus->number;
> +	buf[5] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) & 0xffffffff);
> +	buf[6] = (uint32_t)(kfd2kgd->get_vmem_size(gpu->kgd) >> 32);
> +
> +	for (i = 0, hashout = 0; i < 7; i++)
> +		hashout ^= hash_32(buf[i], KFD_GPU_ID_HASH_WIDTH);
> +
> +	return hashout;
> +}
> +
> +static struct kfd_topology_device *kfd_assign_gpu(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	struct kfd_topology_device *out_dev = 0;
> +
> +	BUG_ON(!gpu);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == 0 && dev->node_props.simd_count > 0) {
> +			dev->gpu = gpu;
> +			out_dev = dev;
> +			break;
> +		}
> +
> +	return out_dev;
> +}
> +
> +static void kfd_notify_gpu_change(uint32_t gpu_id, int arrival)
> +{
> +	/*
> +	 * TODO: Generate an event for thunk about the arrival/removal
> +	 * of the GPU
> +	 */
> +}
> +
> +int kfd_topology_add_device(struct kfd_dev *gpu)
> +{
> +	uint32_t gpu_id;
> +	struct kfd_topology_device *dev;
> +	int res;
> +
> +	BUG_ON(!gpu);
> +
> +	gpu_id = kfd_generate_gpu_id(gpu);
> +
> +	pr_info("Adding new GPU (ID: 0x%x) to topology\n", gpu_id);
> +
> +	down_write(&topology_lock);
> +	/*
> +	 * Try to assign the GPU to existing topology device (generated from
> +	 * CRAT table
> +	 */
> +	dev = kfd_assign_gpu(gpu);
> +	if (!dev) {
> +		pr_info("GPU was not found in the current topology. Extending.\n");
> +		kfd_debug_print_topology();
> +		dev = kfd_create_topology_device();
> +		if (!dev) {
> +			res = -ENOMEM;
> +			goto err;
> +		}
> +		dev->gpu = gpu;
> +
> +		/*
> +		 * TODO: Make a call to retrieve topology information from the
> +		 * GPU vBIOS
> +		 */
> +
> +		/*
> +		 * Update the SYSFS tree, since we added another topology device
> +		 */
> +		if (kfd_topology_update_sysfs() < 0)
> +			kfd_topology_release_sysfs();
> +
> +	}
> +
> +	dev->gpu_id = gpu_id;
> +	gpu->id = gpu_id;
> +	dev->node_props.vendor_id = gpu->pdev->vendor;
> +	dev->node_props.device_id = gpu->pdev->device;
> +	dev->node_props.location_id = (gpu->pdev->bus->number << 24) +
> +			(gpu->pdev->devfn & 0xffffff);
> +	/*
> +	 * TODO: Retrieve max engine clock values from KGD
> +	 */
> +
> +	res = 0;
> +
> +err:
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 1);
> +
> +	return res;
> +}
> +
> +int kfd_topology_remove_device(struct kfd_dev *gpu)
> +{
> +	struct kfd_topology_device *dev;
> +	uint32_t gpu_id;
> +	int res = -ENODEV;
> +
> +	BUG_ON(!gpu);
> +
> +	down_write(&topology_lock);
> +
> +	list_for_each_entry(dev, &topology_device_list, list)
> +		if (dev->gpu == gpu) {
> +			gpu_id = dev->gpu_id;
> +			kfd_remove_sysfs_node_entry(dev);
> +			kfd_release_topology_device(dev);
> +			res = 0;
> +			if (kfd_topology_update_sysfs() < 0)
> +				kfd_topology_release_sysfs();
> +			break;
> +		}
> +
> +	up_write(&topology_lock);
> +
> +	if (res == 0)
> +		kfd_notify_gpu_change(gpu_id, 0);
> +
> +	return res;
> +}

I am not convince that sysfs is the right place to expose this.
I need to think on that a bit.

> diff --git a/drivers/gpu/hsa/radeon/kfd_topology.h b/drivers/gpu/hsa/radeon/kfd_topology.h
> new file mode 100644
> index 0000000..989624b
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_topology.h
> @@ -0,0 +1,168 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __KFD_TOPOLOGY_H__
> +#define __KFD_TOPOLOGY_H__
> +
> +#include <linux/types.h>
> +#include <linux/list.h>
> +#include "kfd_priv.h"
> +
> +#define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
> +
> +#define HSA_CAP_HOT_PLUGGABLE			0x00000001
> +#define HSA_CAP_ATS_PRESENT			0x00000002
> +#define HSA_CAP_SHARED_WITH_GRAPHICS		0x00000004
> +#define HSA_CAP_QUEUE_SIZE_POW2			0x00000008
> +#define HSA_CAP_QUEUE_SIZE_32BIT		0x00000010
> +#define HSA_CAP_QUEUE_IDLE_EVENT		0x00000020
> +#define HSA_CAP_VA_LIMIT			0x00000040
> +#define HSA_CAP_WATCH_POINTS_SUPPORTED		0x00000080
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_MASK	0x00000f00
> +#define HSA_CAP_WATCH_POINTS_TOTALBITS_SHIFT	8
> +#define HSA_CAP_RESERVED			0xfffff000
> +
> +struct kfd_node_properties {
> +	uint32_t cpu_cores_count;
> +	uint32_t simd_count;
> +	uint32_t mem_banks_count;
> +	uint32_t caches_count;
> +	uint32_t io_links_count;
> +	uint32_t cpu_core_id_base;
> +	uint32_t simd_id_base;
> +	uint32_t capability;
> +	uint32_t max_waves_per_simd;
> +	uint32_t lds_size_in_kb;
> +	uint32_t gds_size_in_kb;
> +	uint32_t wave_front_size;
> +	uint32_t array_count;
> +	uint32_t simd_arrays_per_engine;
> +	uint32_t cu_per_simd_array;
> +	uint32_t simd_per_cu;
> +	uint32_t max_slots_scratch_cu;
> +	uint32_t engine_id;
> +	uint32_t vendor_id;
> +	uint32_t device_id;
> +	uint32_t location_id;
> +	uint32_t max_engine_clk_fcompute;
> +	uint32_t max_engine_clk_ccompute;
> +	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
> +};
> +
> +#define HSA_MEM_HEAP_TYPE_SYSTEM	0
> +#define HSA_MEM_HEAP_TYPE_FB_PUBLIC	1
> +#define HSA_MEM_HEAP_TYPE_FB_PRIVATE	2
> +#define HSA_MEM_HEAP_TYPE_GPU_GDS	3
> +#define HSA_MEM_HEAP_TYPE_GPU_LDS	4
> +#define HSA_MEM_HEAP_TYPE_GPU_SCRATCH	5
> +
> +#define HSA_MEM_FLAGS_HOT_PLUGGABLE	0x00000001
> +#define HSA_MEM_FLAGS_NON_VOLATILE	0x00000002
> +#define HSA_MEM_FLAGS_RESERVED		0xfffffffc
> +
> +struct kfd_mem_properties {
> +	struct list_head	list;
> +	uint32_t		heap_type;
> +	uint64_t		size_in_bytes;
> +	uint32_t		flags;
> +	uint32_t		width;
> +	uint32_t		mem_clk_max;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +#define KFD_TOPOLOGY_CPU_SIBLINGS 256
> +
> +#define HSA_CACHE_TYPE_DATA		0x00000001
> +#define HSA_CACHE_TYPE_INSTRUCTION	0x00000002
> +#define HSA_CACHE_TYPE_CPU		0x00000004
> +#define HSA_CACHE_TYPE_HSACU		0x00000008
> +#define HSA_CACHE_TYPE_RESERVED		0xfffffff0
> +
> +struct kfd_cache_properties {
> +	struct list_head	list;
> +	uint32_t		processor_id_low;
> +	uint32_t		cache_level;
> +	uint32_t		cache_size;
> +	uint32_t		cacheline_size;
> +	uint32_t		cachelines_per_tag;
> +	uint32_t		cache_assoc;
> +	uint32_t		cache_latency;
> +	uint32_t		cache_type;
> +	uint8_t			sibling_map[KFD_TOPOLOGY_CPU_SIBLINGS];
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_iolink_properties {
> +	struct list_head	list;
> +	uint32_t		iolink_type;
> +	uint32_t		ver_maj;
> +	uint32_t		ver_min;
> +	uint32_t		node_from;
> +	uint32_t		node_to;
> +	uint32_t		weight;
> +	uint32_t		min_latency;
> +	uint32_t		max_latency;
> +	uint32_t		min_bandwidth;
> +	uint32_t		max_bandwidth;
> +	uint32_t		rec_transfer_size;
> +	uint32_t		flags;
> +	struct kobject		*kobj;
> +	struct attribute	attr;
> +};
> +
> +struct kfd_topology_device {
> +	struct list_head		list;
> +	uint32_t			gpu_id;
> +	struct kfd_node_properties	node_props;
> +	uint32_t			mem_bank_count;
> +	struct list_head		mem_props;
> +	uint32_t			cache_count;
> +	struct list_head		cache_props;
> +	uint32_t			io_link_count;
> +	struct list_head		io_link_props;
> +	struct kfd_dev			*gpu;
> +	struct kobject			*kobj_node;
> +	struct kobject			*kobj_mem;
> +	struct kobject			*kobj_cache;
> +	struct kobject			*kobj_iolink;
> +	struct attribute		attr_gpuid;
> +	struct attribute		attr_name;
> +	struct attribute		attr_props;
> +};
> +
> +struct kfd_system_properties {
> +	uint32_t		num_devices;     /* Number of H-NUMA nodes */
> +	uint32_t		generation_count;
> +	uint64_t		platform_oem;
> +	uint64_t		platform_id;
> +	uint64_t		platform_rev;
> +	struct kobject		*kobj_topology;
> +	struct kobject		*kobj_nodes;
> +	struct attribute	attr_genid;
> +	struct attribute	attr_props;
> +};
> +
> +
> +
> +#endif /* __KFD_TOPOLOGY_H__ */
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 16:22       ` Alex Deucher
@ 2014-07-11 17:07         ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 17:07 UTC (permalink / raw)
  To: Alex Deucher, Koenig, Christian
  Cc: Oded Gabbay, Lewycky, Andrew, LKML, Maling list - DRI developers,
	Deucher, Alexander

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6123 bytes --]



>-----Original Message-----
>From: dri-devel [mailto:dri-devel-bounces@lists.freedesktop.org] On Behalf
>Of Alex Deucher
>Sent: Friday, July 11, 2014 12:23 PM
>To: Koenig, Christian
>Cc: Oded Gabbay; Lewycky, Andrew; LKML; Maling list - DRI developers;
>Deucher, Alexander
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and
>pipes in KV
>
>On Fri, Jul 11, 2014 at 12:18 PM, Christian König <christian.koenig@amd.com>
>wrote:
>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>
>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>>
>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>> that are available for radeon's use with KV.
>>>>
>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with
>>>> a single pipe in it
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
>>
>> At least fro the VMIDs on demand allocation should be trivial to
>> implement, so I would rather prefer this instead of a fixed assignment.
>
>IIRC, the way the CP hw scheduler works you have to give it a range of vmids
>and it assigns them dynamically as queues are mapped so effectively they
>are potentially in use once the CP scheduler is set up.
>
>Alex

Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then the scheduler uses the allocated VMIDs to support a potentially larger number of user processes by dynamically mapping PASIDs to VMIDs and memory queue descriptors (MQDs) to HW queues.

BTW Oded I think we have some duplicated defines at the end of kfd_pm4_headers.h, if they are really duplicates it would be great to remove those before the pull request.

Thanks,
JB

>
>
>>
>> Christian.
>>
>>
>>>
>>>> ---
>>>>   drivers/gpu/drm/radeon/cik.c | 48
>>>> ++++++++++++++++++++++----------------------
>>>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/cik.c
>>>> b/drivers/gpu/drm/radeon/cik.c index 4bfc2c0..e0c8052 100644
>>>> --- a/drivers/gpu/drm/radeon/cik.c
>>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>>> *rdev)
>>>>         /*
>>>>          * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>>          * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues
>>>> total
>>>> +        * Nonetheless, we assign only 1 pipe because all other
>>>> + pipes
>>>> will
>>>> +        * be handled by KFD
>>>>          */
>>>> -       if (rdev->family == CHIP_KAVERI)
>>>> -               rdev->mec.num_mec = 2;
>>>> -       else
>>>> -               rdev->mec.num_mec = 1;
>>>> -       rdev->mec.num_pipe = 4;
>>>> +       rdev->mec.num_mec = 1;
>>>> +       rdev->mec.num_pipe = 1;
>>>>         rdev->mec.num_queue = rdev->mec.num_mec * rdev-
>>mec.num_pipe * 8;
>>>>         if (rdev->mec.hpd_eop_obj == NULL) { @@ -4809,28 +4808,24 @@
>>>> static int cik_cp_compute_resume(struct radeon_device *rdev)
>>>>         /* init the pipes */
>>>>         mutex_lock(&rdev->srbm_mutex);
>>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>>> -               int me = (i < 4) ? 1 : 2;
>>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>>   -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>>> MEC_HPD_SIZE * 2);
>>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>>   -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>   -             /* write the EOP addr */
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>>> upper_32_bits(eop_gpu_addr) >> 8);
>>>> +       /* write the EOP addr */
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>upper_32_bits(eop_gpu_addr)
>>>> + >>
>>>> 8);
>>>>   -             /* set the VMID assigned */
>>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>>> +       /* set the VMID assigned */
>>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>>> +
>>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> +       tmp &= ~EOP_SIZE_MASK;
>>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>>   -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>>> dwords */
>>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> -               tmp &= ~EOP_SIZE_MASK;
>>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>> -       }
>>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>         mutex_unlock(&rdev->srbm_mutex);
>>>>         /* init the queues.  Just two for now. */ @@ -5876,8
>>>> +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct
>>>> radeon_ib *ib)
>>>>    */
>>>>   int cik_vm_init(struct radeon_device *rdev)
>>>>   {
>>>> -       /* number of VMs */
>>>> -       rdev->vm_manager.nvm = 16;
>>>> +       /*
>>>> +        * number of VMs
>>>> +        * VMID 0 is reserved for Graphics
>>>> +        * radeon compute will use VMIDs 1-7
>>>> +        * KFD will use VMIDs 8-15
>>>> +        */
>>>> +       rdev->vm_manager.nvm = 8;
>>>>         /* base offset of vram pages */
>>>>         if (rdev->flags & RADEON_IS_IGP) {
>>>>                 u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>>> --
>>>> 1.9.1
>>>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>_______________________________________________
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-11 17:07         ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 17:07 UTC (permalink / raw)
  To: Alex Deucher, Koenig, Christian
  Cc: Oded Gabbay, Lewycky, Andrew, LKML, Maling list - DRI developers,
	Deucher, Alexander



>-----Original Message-----
>From: dri-devel [mailto:dri-devel-bounces@lists.freedesktop.org] On Behalf
>Of Alex Deucher
>Sent: Friday, July 11, 2014 12:23 PM
>To: Koenig, Christian
>Cc: Oded Gabbay; Lewycky, Andrew; LKML; Maling list - DRI developers;
>Deucher, Alexander
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and
>pipes in KV
>
>On Fri, Jul 11, 2014 at 12:18 PM, Christian König <christian.koenig@amd.com>
>wrote:
>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>
>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>>
>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>> that are available for radeon's use with KV.
>>>>
>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with
>>>> a single pipe in it
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
>>
>> At least fro the VMIDs on demand allocation should be trivial to
>> implement, so I would rather prefer this instead of a fixed assignment.
>
>IIRC, the way the CP hw scheduler works you have to give it a range of vmids
>and it assigns them dynamically as queues are mapped so effectively they
>are potentially in use once the CP scheduler is set up.
>
>Alex

Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then the scheduler uses the allocated VMIDs to support a potentially larger number of user processes by dynamically mapping PASIDs to VMIDs and memory queue descriptors (MQDs) to HW queues.

BTW Oded I think we have some duplicated defines at the end of kfd_pm4_headers.h, if they are really duplicates it would be great to remove those before the pull request.

Thanks,
JB

>
>
>>
>> Christian.
>>
>>
>>>
>>>> ---
>>>>   drivers/gpu/drm/radeon/cik.c | 48
>>>> ++++++++++++++++++++++----------------------
>>>>   1 file changed, 24 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/cik.c
>>>> b/drivers/gpu/drm/radeon/cik.c index 4bfc2c0..e0c8052 100644
>>>> --- a/drivers/gpu/drm/radeon/cik.c
>>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>>> *rdev)
>>>>         /*
>>>>          * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>>          * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues
>>>> total
>>>> +        * Nonetheless, we assign only 1 pipe because all other
>>>> + pipes
>>>> will
>>>> +        * be handled by KFD
>>>>          */
>>>> -       if (rdev->family == CHIP_KAVERI)
>>>> -               rdev->mec.num_mec = 2;
>>>> -       else
>>>> -               rdev->mec.num_mec = 1;
>>>> -       rdev->mec.num_pipe = 4;
>>>> +       rdev->mec.num_mec = 1;
>>>> +       rdev->mec.num_pipe = 1;
>>>>         rdev->mec.num_queue = rdev->mec.num_mec * rdev-
>>mec.num_pipe * 8;
>>>>         if (rdev->mec.hpd_eop_obj == NULL) { @@ -4809,28 +4808,24 @@
>>>> static int cik_cp_compute_resume(struct radeon_device *rdev)
>>>>         /* init the pipes */
>>>>         mutex_lock(&rdev->srbm_mutex);
>>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>>> -               int me = (i < 4) ? 1 : 2;
>>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>>   -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>>> MEC_HPD_SIZE * 2);
>>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>>   -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>   -             /* write the EOP addr */
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>>> upper_32_bits(eop_gpu_addr) >> 8);
>>>> +       /* write the EOP addr */
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>upper_32_bits(eop_gpu_addr)
>>>> + >>
>>>> 8);
>>>>   -             /* set the VMID assigned */
>>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>>> +       /* set the VMID assigned */
>>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>>> +
>>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> +       tmp &= ~EOP_SIZE_MASK;
>>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>>   -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>>> dwords */
>>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> -               tmp &= ~EOP_SIZE_MASK;
>>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>> -       }
>>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>         mutex_unlock(&rdev->srbm_mutex);
>>>>         /* init the queues.  Just two for now. */ @@ -5876,8
>>>> +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct
>>>> radeon_ib *ib)
>>>>    */
>>>>   int cik_vm_init(struct radeon_device *rdev)
>>>>   {
>>>> -       /* number of VMs */
>>>> -       rdev->vm_manager.nvm = 16;
>>>> +       /*
>>>> +        * number of VMs
>>>> +        * VMID 0 is reserved for Graphics
>>>> +        * radeon compute will use VMIDs 1-7
>>>> +        * KFD will use VMIDs 8-15
>>>> +        */
>>>> +       rdev->vm_manager.nvm = 8;
>>>>         /* base offset of vram pages */
>>>>         if (rdev->flags & RADEON_IS_IGP) {
>>>>                 u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>>> --
>>>> 1.9.1
>>>>
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>_______________________________________________
>dri-devel mailing list
>dri-devel@lists.freedesktop.org
>http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 17:04     ` Jerome Glisse
@ 2014-07-11 17:28       ` Joe Perches
  -1 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-11 17:28 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel

On Fri, 2014-07-11 at 13:04 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
[]
> > +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
> 
> Nitpick, avoid unsigned int just use unsigned.

I suggest unsigned int is much more common (and better)
than just unsigned.

$ git grep -P '\bunsigned\s+(?!long|int|short|char)' -- "*.[ch]" | wc -l
20778

$ git grep -P "\bunsigned\s+int\b" -- "*.[ch]" | wc -l
98068

> > +static int kfd_open(struct inode *, struct file *);

It's also generally better to use types and names tno
improve how a human reads and understands the code.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 17:28       ` Joe Perches
  0 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-11 17:28 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Andrew Lewycky, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada,
	Alex Deucher

On Fri, 2014-07-11 at 13:04 -0400, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
[]
> > +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
> 
> Nitpick, avoid unsigned int just use unsigned.

I suggest unsigned int is much more common (and better)
than just unsigned.

$ git grep -P '\bunsigned\s+(?!long|int|short|char)' -- "*.[ch]" | wc -l
20778

$ git grep -P "\bunsigned\s+int\b" -- "*.[ch]" | wc -l
98068

> > +static int kfd_open(struct inode *, struct file *);

It's also generally better to use types and names tno
improve how a human reads and understands the code.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 17:04     ` Jerome Glisse
@ 2014-07-11 17:40       ` Daniel Vetter
  -1 siblings, 0 replies; 116+ messages in thread
From: Daniel Vetter @ 2014-07-11 17:40 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Sandeep Nair, Andrew Lewycky, Greg Kroah-Hartman,
	Rafael J. Wysocki, Linux Kernel Mailing List, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Santosh Shilimkar, Srinivas Pandruvada, Alex Deucher

On Fri, Jul 11, 2014 at 7:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> Are we to assume that for eternity this will not work on iommu that do support
> PASID/ATS but are not from AMD ? If it was an APU specific function i would
> understand but it seems that the IOMMU API needs to grow. I am pretty sure
> Intel will have an ATS/PASID IOMMU.

Also this isn't just for gpus - I hear noises that it e.g. could also
be used to virtualize a single ethernet NIC to different guest OS
directly. Adding ats/pasid support to the linux iommu interfaces
sounds like the right approach to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 17:40       ` Daniel Vetter
  0 siblings, 0 replies; 116+ messages in thread
From: Daniel Vetter @ 2014-07-11 17:40 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Andrew Lewycky, Greg Kroah-Hartman,
	Rafael J. Wysocki, Sandeep Nair, dri-devel,
	Linux Kernel Mailing List, Alex Deucher, Kenneth Heitke,
	Santosh Shilimkar, Srinivas Pandruvada, Andreas Noever,
	Kishon Vijay Abraham I

On Fri, Jul 11, 2014 at 7:04 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> Are we to assume that for eternity this will not work on iommu that do support
> PASID/ATS but are not from AMD ? If it was an APU specific function i would
> understand but it seems that the IOMMU API needs to grow. I am pretty sure
> Intel will have an ATS/PASID IOMMU.

Also this isn't just for gpus - I hear noises that it e.g. could also
be used to virtualize a single ethernet NIC to different guest OS
directly. Adding ats/pasid support to the linux iommu interfaces
sounds like the right approach to me.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  2014-07-11 16:34     ` Jerome Glisse
@ 2014-07-11 17:48       ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 17:48 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Deucher, Alexander, linux-kernel, dri-devel,
	Lewycky, Andrew, Joerg Roedel, Gabbay, Oded, Koenig, Christian

Checking... we shouldn't need to call the lock from kfd any more.We should be able to do any required locking in radeon kgd code.

>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 12:35 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Koenig, Christian
>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>> This patch adds a new interface to kfd2kgd_calls structure, which
>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>
>Why does kfd needs to lock this register if kfd can not access any of those
>register ? This sounds broken to me, exposing a driver internal mutex to
>another driver is not something i am fan of.
>
>Cheers,
>Jérôme
>
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>  include/linux/radeon_kfd.h          |  4 ++++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>> index 66ee36b..594020e 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct
>> kgd_mem *mem);
>>
>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>> +
>> +
>>  static const struct kfd2kgd_calls kfd2kgd = {
>>  	.allocate_mem = allocate_mem,
>>  	.free_mem = free_mem,
>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>  	.kmap_mem = kmap_mem,
>>  	.unkmap_mem = unkmap_mem,
>>  	.get_vmem_size = get_vmem_size,
>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>  };
>>
>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>
>>  	return rdev->mc.real_vram_size;
>>  }
>> +
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	mutex_lock(&rdev->srbm_mutex);
>> +}
>> +
>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	mutex_unlock(&rdev->srbm_mutex);
>> +}
>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>> index c7997d4..40b691c 100644
>> --- a/include/linux/radeon_kfd.h
>> +++ b/include/linux/radeon_kfd.h
>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>
>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>> +
>> +	/* SRBM_GFX_CNTL mutex */
>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>  };
>>
>>  bool kgd2kfd_init(unsigned interface_version,
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
@ 2014-07-11 17:48       ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 17:48 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander,
	Koenig, Christian

Checking... we shouldn't need to call the lock from kfd any more.We should be able to do any required locking in radeon kgd code.

>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 12:35 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Koenig, Christian
>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>> This patch adds a new interface to kfd2kgd_calls structure, which
>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>
>Why does kfd needs to lock this register if kfd can not access any of those
>register ? This sounds broken to me, exposing a driver internal mutex to
>another driver is not something i am fan of.
>
>Cheers,
>Jérôme
>
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> ---
>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>  include/linux/radeon_kfd.h          |  4 ++++
>>  2 files changed, 24 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>> index 66ee36b..594020e 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd, struct
>> kgd_mem *mem);
>>
>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>> +
>> +
>>  static const struct kfd2kgd_calls kfd2kgd = {
>>  	.allocate_mem = allocate_mem,
>>  	.free_mem = free_mem,
>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>  	.kmap_mem = kmap_mem,
>>  	.unkmap_mem = unkmap_mem,
>>  	.get_vmem_size = get_vmem_size,
>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>  };
>>
>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>
>>  	return rdev->mc.real_vram_size;
>>  }
>> +
>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	mutex_lock(&rdev->srbm_mutex);
>> +}
>> +
>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>> +
>> +	mutex_unlock(&rdev->srbm_mutex);
>> +}
>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>> index c7997d4..40b691c 100644
>> --- a/include/linux/radeon_kfd.h
>> +++ b/include/linux/radeon_kfd.h
>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>
>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>> +
>> +	/* SRBM_GFX_CNTL mutex */
>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>  };
>>
>>  bool kgd2kfd_init(unsigned interface_version,
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 17:07         ` Bridgman, John
  (?)
@ 2014-07-11 17:59         ` Ilyes Gouta
  2014-07-11 22:54             ` Bridgman, John
  -1 siblings, 1 reply; 116+ messages in thread
From: Ilyes Gouta @ 2014-07-11 17:59 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, Lewycky, Andrew, LKML, Maling list - DRI developers,
	Deucher, Alexander, Koenig, Christian


[-- Attachment #1.1: Type: text/plain, Size: 557 bytes --]

Hi,

Just a side question (for information),

On Fri, Jul 11, 2014 at 6:07 PM, Bridgman, John <John.Bridgman@amd.com>
wrote:

>
> Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49)
> allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then the
> scheduler uses the allocated VMIDs to support a potentially larger number
> of user processes by dynamically mapping PASIDs to VMIDs and memory queue
> descriptors (MQDs) to HW queues.
>

Are there any documentation/specifications online describing these
mechanisms?

Thanks,

[-- Attachment #1.2: Type: text/html, Size: 1064 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 17:04     ` Jerome Glisse
@ 2014-07-11 18:02       ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:02 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Deucher, Alexander, linux-kernel, dri-devel,
	Lewycky, Andrew, Joerg Roedel, Gabbay, Oded, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kishon Vijay Abraham I, Sandeep Nair,
	Kenneth Heitke, Srinivas Pandruvada, Santosh Shilimkar,
	Andreas Noever, Lucas Stach, Philipp Zabel



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 1:04 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon Vijay
>Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada; Santosh
>Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> This patch adds the code base of the hsa driver for
>> AMD's GPUs.
>>
>> This driver is called kfd.
>>
>> This initial version supports the first HSA chip, Kaveri.
>>
>> This driver is located in a new directory structure under drivers/gpu.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
>There is too coding style issues. While we have been lax on the enforcing the
>scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
>on the 80chars per line but others things needs fixing so we stay inline.
>
>Also i am a bit worried about the license, given top comment in each of the
>files i am not sure this is GPL2 compatible. I would need to ask lawyer to
>review that.
>

Hi Jerome,

Which line in the license are you concerned about ? In theory we're using the same license as the initial code pushes for radeon, and I just did a side-by side compare with the license header on cik.c in the radeon tree and confirmed that the two licenses are identical. 

The cik.c header has an additional "Authors:" line which the kfd files do not, but AFAIK that is not part of the license text proper.

JB

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 18:02       ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:02 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Sandeep Nair, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Santosh Shilimkar, Srinivas Pandruvada, Deucher, Alexander



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 1:04 PM
>To: Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
>Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon Vijay
>Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada; Santosh
>Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> This patch adds the code base of the hsa driver for
>> AMD's GPUs.
>>
>> This driver is called kfd.
>>
>> This initial version supports the first HSA chip, Kaveri.
>>
>> This driver is located in a new directory structure under drivers/gpu.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
>There is too coding style issues. While we have been lax on the enforcing the
>scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
>on the 80chars per line but others things needs fixing so we stay inline.
>
>Also i am a bit worried about the license, given top comment in each of the
>files i am not sure this is GPL2 compatible. I would need to ask lawyer to
>review that.
>

Hi Jerome,

Which line in the license are you concerned about ? In theory we're using the same license as the initial code pushes for radeon, and I just did a side-by side compare with the license header on cik.c in the radeon tree and confirmed that the two licenses are identical. 

The cik.c header has an additional "Authors:" line which the kfd files do not, but AFAIK that is not part of the license text proper.

JB

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 18:02       ` Bridgman, John
@ 2014-07-11 18:10         ` Jerome Glisse
  -1 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:10 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, David Airlie, Deucher, Alexander, linux-kernel,
	dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay, Oded,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel

On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 1:04 PM
> >To: Oded Gabbay
> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
> >devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
> >Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon Vijay
> >Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada; Santosh
> >Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> This patch adds the code base of the hsa driver for
> >> AMD's GPUs.
> >>
> >> This driver is called kfd.
> >>
> >> This initial version supports the first HSA chip, Kaveri.
> >>
> >> This driver is located in a new directory structure under drivers/gpu.
> >>
> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >
> >There is too coding style issues. While we have been lax on the enforcing the
> >scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
> >on the 80chars per line but others things needs fixing so we stay inline.
> >
> >Also i am a bit worried about the license, given top comment in each of the
> >files i am not sure this is GPL2 compatible. I would need to ask lawyer to
> >review that.
> >
> 
> Hi Jerome,
> 
> Which line in the license are you concerned about ? In theory we're using the same license as the initial code pushes for radeon, and I just did a side-by side compare with the license header on cik.c in the radeon tree and confirmed that the two licenses are identical. 
> 
> The cik.c header has an additional "Authors:" line which the kfd files do not, but AFAIK that is not part of the license text proper.
> 

You can not claim GPL if you want to use this license. radeon is weird
best for historical reasons as we wanted to share code with BSD thus it
is dual licensed and this is reflected with :
MODULE_LICENSE("GPL and additional rights");

inside radeon_drv.c

So if you want to have MODULE_LICENSE(GPL) then you should have header
that use the GPL license wording and no wording from BSD like license.
Otherwise change the MODULE_LICENSE and it would also be good to say
dual licensed at top of each files (or least next to each license) so
that it is clear this is BSD & GPL license.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 18:10         ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:10 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander

On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 1:04 PM
> >To: Oded Gabbay
> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
> >devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew; Joerg
> >Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon Vijay
> >Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada; Santosh
> >Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> This patch adds the code base of the hsa driver for
> >> AMD's GPUs.
> >>
> >> This driver is called kfd.
> >>
> >> This initial version supports the first HSA chip, Kaveri.
> >>
> >> This driver is located in a new directory structure under drivers/gpu.
> >>
> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >
> >There is too coding style issues. While we have been lax on the enforcing the
> >scripts/checkpatch.pl rules i think there is a limit to that. I am not strict
> >on the 80chars per line but others things needs fixing so we stay inline.
> >
> >Also i am a bit worried about the license, given top comment in each of the
> >files i am not sure this is GPL2 compatible. I would need to ask lawyer to
> >review that.
> >
> 
> Hi Jerome,
> 
> Which line in the license are you concerned about ? In theory we're using the same license as the initial code pushes for radeon, and I just did a side-by side compare with the license header on cik.c in the radeon tree and confirmed that the two licenses are identical. 
> 
> The cik.c header has an additional "Authors:" line which the kfd files do not, but AFAIK that is not part of the license text proper.
> 

You can not claim GPL if you want to use this license. radeon is weird
best for historical reasons as we wanted to share code with BSD thus it
is dual licensed and this is reflected with :
MODULE_LICENSE("GPL and additional rights");

inside radeon_drv.c

So if you want to have MODULE_LICENSE(GPL) then you should have header
that use the GPL license wording and no wording from BSD like license.
Otherwise change the MODULE_LICENSE and it would also be good to say
dual licensed at top of each files (or least next to each license) so
that it is clear this is BSD & GPL license.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 11/83] hsa/radeon: Add scheduler code
  2014-07-10 21:50 ` [PATCH 11/83] hsa/radeon: Add scheduler code Oded Gabbay
@ 2014-07-11 18:25     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:25 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay

On Fri, Jul 11, 2014 at 12:50:11AM +0300, Oded Gabbay wrote:
> This patch adds the code base of the scheduler, which handles queue
> creation, deletion and scheduling on the CP of the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

I would rather see all this squashed, this gave feeling that driver
can access register which is latter remove. I know jungling with
patch squashing can be daunting but really it makes reviewing hard
here because i have to jump back and forth to see if thing i am looking
at really matter in the final version.

Cheers,
Jérôme

> ---
>  drivers/gpu/hsa/radeon/Makefile               |   3 +-
>  drivers/gpu/hsa/radeon/cik_regs.h             | 213 +++++++
>  drivers/gpu/hsa/radeon/kfd_device.c           |   1 +
>  drivers/gpu/hsa/radeon/kfd_registers.c        |  50 ++
>  drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 800 ++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_vidmem.c           |  61 ++
>  6 files changed, 1127 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_registers.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> 
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> index 989518a..28da10c 100644
> --- a/drivers/gpu/hsa/radeon/Makefile
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -4,6 +4,7 @@
>  
>  radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>  		kfd_pasid.o kfd_topology.o kfd_process.o \
> -		kfd_doorbell.o
> +		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
> +		kfd_vidmem.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
> new file mode 100644
> index 0000000..d0cdc57
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/cik_regs.h
> @@ -0,0 +1,213 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef CIK_REGS_H
> +#define CIK_REGS_H
> +
> +#define BIF_DOORBELL_CNTL				0x530Cu
> +
> +#define	SRBM_GFX_CNTL					0xE44
> +#define	PIPEID(x)					((x) << 0)
> +#define	MEID(x)						((x) << 2)
> +#define	VMID(x)						((x) << 4)
> +#define	QUEUEID(x)					((x) << 8)
> +
> +#define	SQ_CONFIG					0x8C00
> +
> +#define	SH_MEM_BASES					0x8C28
> +/* if PTR32, these are the bases for scratch and lds */
> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
> +#define	SH_MEM_APE1_BASE				0x8C2C
> +/* if PTR32, this is the base location of GPUVM */
> +#define	SH_MEM_APE1_LIMIT				0x8C30
> +/* if PTR32, this is the upper limit of GPUVM */
> +#define	SH_MEM_CONFIG					0x8C34
> +#define	PTR32						(1 << 0)
> +#define	ALIGNMENT_MODE(x)				((x) << 2)
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
> +#define	DEFAULT_MTYPE(x)				((x) << 4)
> +#define	APE1_MTYPE(x)					((x) << 7)
> +
> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
> +#define	MTYPE_NONCACHED					3
> +
> +
> +#define SH_STATIC_MEM_CONFIG				0x9604u
> +
> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
> +#define	TC_CFG_L1_VOLATILE				0xAC88
> +#define	TC_CFG_L2_VOLATILE				0xAC8C
> +
> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
> +#define	WPTR_POLL_EN					(1 << 31)
> +
> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
> +#define PRIV_REG_INT_ENABLE				(1 << 23)
> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
> +#define GENERIC2_INT_ENABLE				(1 << 29)
> +#define GENERIC1_INT_ENABLE				(1 << 30)
> +#define GENERIC0_INT_ENABLE				(1 << 31)
> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
> +#define PRIV_REG_INT_STATUS				(1 << 23)
> +#define TIME_STAMP_INT_STATUS				(1 << 26)
> +#define GENERIC2_INT_STATUS				(1 << 29)
> +#define GENERIC1_INT_STATUS				(1 << 30)
> +#define GENERIC0_INT_STATUS				(1 << 31)
> +
> +#define CP_HPD_EOP_BASE_ADDR				0xC904
> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
> +#define CP_HPD_EOP_VMID					0xC90C
> +#define CP_HPD_EOP_CONTROL				0xC910
> +#define	EOP_SIZE(x)					((x) << 0)
> +#define	EOP_SIZE_MASK					(0x3f << 0)
> +#define CP_MQD_BASE_ADDR				0xC914
> +#define CP_MQD_BASE_ADDR_HI				0xC918
> +#define CP_HQD_ACTIVE					0xC91C
> +#define CP_HQD_VMID					0xC920
> +
> +#define CP_HQD_PERSISTENT_STATE				0xC924u
> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
> +
> +#define CP_HQD_PIPE_PRIORITY				0xC928u
> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
> +#define CP_HQD_QUANTUM					0xC930u
> +#define	QUANTUM_EN					1U
> +#define	QUANTUM_SCALE_1MS				(1U << 4)
> +#define	QUANTUM_DURATION(x)				((x) << 8)
> +
> +#define CP_HQD_PQ_BASE					0xC934
> +#define CP_HQD_PQ_BASE_HI				0xC938
> +#define CP_HQD_PQ_RPTR					0xC93C
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
> +#define	DOORBELL_OFFSET(x)				((x) << 2)
> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
> +#define	DOORBELL_SOURCE					(1 << 28)
> +#define	DOORBELL_SCHD_HIT				(1 << 29)
> +#define	DOORBELL_EN					(1 << 30)
> +#define	DOORBELL_HIT					(1 << 31)
> +#define CP_HQD_PQ_WPTR					0xC954
> +#define CP_HQD_PQ_CONTROL				0xC958
> +#define	QUEUE_SIZE(x)					((x) << 0)
> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
> +#define	PQ_ATC_EN					(1 << 23)
> +#define	PQ_VOLATILE					(1 << 26)
> +#define	NO_UPDATE_RPTR					(1 << 27)
> +#define	UNORD_DISPATCH					(1 << 28)
> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
> +#define	PRIV_STATE					(1 << 30)
> +#define	KMD_QUEUE					(1 << 31)
> +
> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
> +
> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
> +#define CP_HQD_IB_RPTR					0xC964u
> +#define CP_HQD_IB_CONTROL				0xC968u
> +#define	IB_ATC_EN					(1U << 23)
> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
> +
> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
> +#define	DEQUEUE_REQUEST_DRAIN				1
> +
> +#define CP_HQD_SEMA_CMD					0xC97Cu
> +#define CP_HQD_MSG_TYPE					0xC980u
> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
> +
> +
> +#define CP_MQD_CONTROL					0xC99C
> +#define	MQD_VMID(x)					((x) << 0)
> +#define	MQD_VMID_MASK					(0xf << 0)
> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
> +
> +#define GRBM_GFX_INDEX					0x30800
> +#define	INSTANCE_INDEX(x)				((x) << 0)
> +#define	SH_INDEX(x)					((x) << 8)
> +#define	SE_INDEX(x)					((x) << 16)
> +#define	SH_BROADCAST_WRITES				(1 << 29)
> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
> +#define	SE_BROADCAST_WRITES				(1 << 31)
> +
> +#define SQC_CACHES					0x30d20
> +#define SQC_POLICY					0x8C38u
> +#define SQC_VOLATILE					0x8C3Cu
> +
> +#define CP_PERFMON_CNTL					0x36020
> +
> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
> +
> +#define ATC_VM_APERTURE0_CNTL				0x3310u
> +#define	ATS_ACCESS_MODE_NEVER				0
> +#define	ATS_ACCESS_MODE_ALWAYS				1
> +
> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
> +#define ATC_VM_APERTURE1_CNTL				0x3314u
> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> index 4e9fe6c..465c822 100644
> --- a/drivers/gpu/hsa/radeon/kfd_device.c
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -28,6 +28,7 @@
>  #include "kfd_scheduler.h"
>  
>  static const struct kfd_device_info bonaire_device_info = {
> +	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>  	.max_pasid_bits = 16,
>  };
>  
> diff --git a/drivers/gpu/hsa/radeon/kfd_registers.c b/drivers/gpu/hsa/radeon/kfd_registers.c
> new file mode 100644
> index 0000000..223debd
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_registers.c
> @@ -0,0 +1,50 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/io.h>
> +#include "kfd_priv.h"
> +
> +/* In KFD, "reg" is the byte offset of the register. */
> +static void __iomem *reg_address(struct kfd_dev *dev, uint32_t reg)
> +{
> +	return dev->regs + reg;
> +}
> +
> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value)
> +{
> +	writel(value, reg_address(dev, reg));
> +}
> +
> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg)
> +{
> +	return readl(reg_address(dev, reg));
> +}
> +
> +void radeon_kfd_lock_srbm_index(struct kfd_dev *dev)
> +{
> +	kfd2kgd->lock_srbm_gfx_cntl(dev->kgd);
> +}
> +
> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *dev)
> +{
> +	kfd2kgd->unlock_srbm_gfx_cntl(dev->kgd);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
> new file mode 100644
> index 0000000..b986ff9
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
> @@ -0,0 +1,800 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/log2.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +#include "cik_regs.h"
> +
> +/* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
> + * The first MEC is ME 1 with the GFX ME as ME 0.
> + * We split the CP with the KGD, they take the first N pipes and we take the rest.
> + */
> +#define CIK_QUEUES_PER_PIPE 8
> +#define CIK_PIPES_PER_MEC 4
> +
> +#define CIK_MAX_PIPES (2 * CIK_PIPES_PER_MEC)
> +
> +#define CIK_NUM_VMID 16
> +
> +#define CIK_HPD_SIZE_LOG2 11
> +#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
> +#define CIK_HPD_ALIGNMENT 256
> +#define CIK_MQD_ALIGNMENT 4
> +
> +#pragma pack(push, 4)
> +
> +struct cik_hqd_registers {
> +	u32 cp_mqd_base_addr;
> +	u32 cp_mqd_base_addr_hi;
> +	u32 cp_hqd_active;
> +	u32 cp_hqd_vmid;
> +	u32 cp_hqd_persistent_state;
> +	u32 cp_hqd_pipe_priority;
> +	u32 cp_hqd_queue_priority;
> +	u32 cp_hqd_quantum;
> +	u32 cp_hqd_pq_base;
> +	u32 cp_hqd_pq_base_hi;
> +	u32 cp_hqd_pq_rptr;
> +	u32 cp_hqd_pq_rptr_report_addr;
> +	u32 cp_hqd_pq_rptr_report_addr_hi;
> +	u32 cp_hqd_pq_wptr_poll_addr;
> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
> +	u32 cp_hqd_pq_doorbell_control;
> +	u32 cp_hqd_pq_wptr;
> +	u32 cp_hqd_pq_control;
> +	u32 cp_hqd_ib_base_addr;
> +	u32 cp_hqd_ib_base_addr_hi;
> +	u32 cp_hqd_ib_rptr;
> +	u32 cp_hqd_ib_control;
> +	u32 cp_hqd_iq_timer;
> +	u32 cp_hqd_iq_rptr;
> +	u32 cp_hqd_dequeue_request;
> +	u32 cp_hqd_dma_offload;
> +	u32 cp_hqd_sema_cmd;
> +	u32 cp_hqd_msg_type;
> +	u32 cp_hqd_atomic0_preop_lo;
> +	u32 cp_hqd_atomic0_preop_hi;
> +	u32 cp_hqd_atomic1_preop_lo;
> +	u32 cp_hqd_atomic1_preop_hi;
> +	u32 cp_hqd_hq_scheduler0;
> +	u32 cp_hqd_hq_scheduler1;
> +	u32 cp_mqd_control;
> +};
> +
> +struct cik_mqd {
> +	u32 header;
> +	u32 dispatch_initiator;
> +	u32 dimensions[3];
> +	u32 start_idx[3];
> +	u32 num_threads[3];
> +	u32 pipeline_stat_enable;
> +	u32 perf_counter_enable;
> +	u32 pgm[2];
> +	u32 tba[2];
> +	u32 tma[2];
> +	u32 pgm_rsrc[2];
> +	u32 vmid;
> +	u32 resource_limits;
> +	u32 static_thread_mgmt01[2];
> +	u32 tmp_ring_size;
> +	u32 static_thread_mgmt23[2];
> +	u32 restart[3];
> +	u32 thread_trace_enable;
> +	u32 reserved1;
> +	u32 user_data[16];
> +	u32 vgtcs_invoke_count[2];
> +	struct cik_hqd_registers queue_state;
> +	u32 dequeue_cntr;
> +	u32 interrupt_queue[64];
> +};
> +
> +struct cik_mqd_padded {
> +	struct cik_mqd mqd;
> +	u8 padding[1024 - sizeof(struct cik_mqd)]; /* Pad MQD out to 1KB. (HW requires 4-byte alignment.) */
> +};
> +
> +#pragma pack(pop)
> +
> +struct cik_static_private {
> +	struct kfd_dev *dev;
> +
> +	struct mutex mutex;
> +
> +	unsigned int first_pipe;
> +	unsigned int num_pipes;
> +
> +	unsigned long free_vmid_mask; /* unsigned long to make set/clear_bit happy */
> +
> +	/* Everything below here is offset by first_pipe. E.g. bit 0 in
> +	 * free_queues is queue 0 in pipe first_pipe
> +	 */
> +
> +	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
> +	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
> +
> +	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
> +	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
> +				 * pipes. This is actually struct cik_mqd_padded. */
> +	uint64_t hpd_addr;	/* GPU address for hpd_mem. */
> +	uint64_t mqd_addr;	/* GPU address for mqd_mem. */
> +	 /*
> +	  * Pointer for mqd_mem.
> +	  * We keep this mapped because multiple processes may need to access it
> +	  * in parallel and this is simpler than controlling concurrent kmaps
> +	  */
> +	struct cik_mqd_padded *mqds;
> +};
> +
> +struct cik_static_process {
> +	unsigned int vmid;
> +	pasid_t pasid;
> +};
> +
> +struct cik_static_queue {
> +	unsigned int queue; /* + first_pipe * QUEUES_PER_PIPE */
> +
> +	uint64_t mqd_addr;
> +	struct cik_mqd *mqd;
> +
> +	void __user *pq_addr;
> +	void __user *rptr_address;
> +	doorbell_t __user *wptr_address;
> +	uint32_t doorbell_index;
> +
> +	uint32_t queue_size_encoded; /* CP_HQD_PQ_CONTROL.QUEUE_SIZE takes the queue size as log2(size) - 3. */
> +};
> +
> +static uint32_t lower_32(uint64_t x)
> +{
> +	return (uint32_t)x;
> +}
> +
> +static uint32_t upper_32(uint64_t x)
> +{
> +	return (uint32_t)(x >> 32);
> +}
> +
> +/* SRBM_GFX_CNTL provides the MEC/pipe/queue and vmid for many registers that are
> + * In particular, CP_HQD_* and CP_MQD_* are instanced for each queue. CP_HPD_* are instanced for each pipe.
> + * SH_MEM_* are instanced per-VMID.
> + *
> + * We provide queue_select, pipe_select and vmid_select helpers that should be used before accessing
> + * registers from those groups. Note that these overwrite each other, e.g. after vmid_select the current
> + * selected MEC/pipe/queue is undefined.
> + *
> + * SRBM_GFX_CNTL and the registers it indexes are shared with KGD. You must be holding the srbm_gfx_cntl
> + * lock via lock_srbm_index before setting SRBM_GFX_CNTL or accessing any of the instanced registers.
> + */
> +static uint32_t make_srbm_gfx_cntl_mpqv(unsigned int me, unsigned int pipe, unsigned int queue, unsigned int vmid)
> +{
> +	return QUEUEID(queue) | VMID(vmid) | MEID(me) | PIPEID(pipe);
> +}
> +
> +static void pipe_select(struct cik_static_private *priv, unsigned int pipe)
> +{
> +	unsigned int pipe_in_mec = (pipe + priv->first_pipe) % CIK_PIPES_PER_MEC;
> +	unsigned int mec = (pipe + priv->first_pipe) / CIK_PIPES_PER_MEC;
> +
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, 0, 0));
> +}
> +
> +static void queue_select(struct cik_static_private *priv, unsigned int queue)
> +{
> +	unsigned int queue_in_pipe = queue % CIK_QUEUES_PER_PIPE;
> +	unsigned int pipe = queue / CIK_QUEUES_PER_PIPE + priv->first_pipe;
> +	unsigned int pipe_in_mec = pipe % CIK_PIPES_PER_MEC;
> +	unsigned int mec = pipe / CIK_PIPES_PER_MEC;
> +
> +#if 0
> +	dev_err(radeon_kfd_chardev(), "queue select %d = %u/%u/%u = 0x%08x\n", queue, mec+1, pipe_in_mec, queue_in_pipe,
> +		make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
> +#endif
> +
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
> +}
> +
> +static void vmid_select(struct cik_static_private *priv, unsigned int vmid)
> +{
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(0, 0, 0, vmid));
> +}
> +
> +static void lock_srbm_index(struct cik_static_private *priv)
> +{
> +	radeon_kfd_lock_srbm_index(priv->dev);
> +}
> +
> +static void unlock_srbm_index(struct cik_static_private *priv)
> +{
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, 0);	/* Be nice to KGD, reset indexed CP registers to the GFX pipe. */
> +	radeon_kfd_unlock_srbm_index(priv->dev);
> +}
> +
> +/* One-time setup for all compute pipes. They need to be programmed with the address & size of the HPD EOP buffer. */
> +static void init_pipes(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	lock_srbm_index(priv);
> +
> +	for (i = 0; i < priv->num_pipes; i++) {
> +		uint64_t pipe_hpd_addr = priv->hpd_addr + i * CIK_HPD_SIZE;
> +
> +		pipe_select(priv, i);
> +
> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR, lower_32(pipe_hpd_addr >> 8));
> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR_HI, upper_32(pipe_hpd_addr >> 8));
> +		WRITE_REG(priv->dev, CP_HPD_EOP_VMID, 0);
> +		WRITE_REG(priv->dev, CP_HPD_EOP_CONTROL, CIK_HPD_SIZE_LOG2 - 1);
> +	}
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +/* Program the VMID -> PASID mapping for one VMID.
> + * PASID 0 is special: it means to associate no PASID with that VMID.
> + * This function waits for the VMID/PASID mapping to complete.
> + */
> +static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int vmid, pasid_t pasid)
> +{
> +	/* We have to assume that there is no outstanding mapping.
> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
> +	 * is in progress or because a mapping finished and the SW cleared it.
> +	 * So the protocol is to always wait & clear.
> +	 */
> +
> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
> +
> +	WRITE_REG(priv->dev, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
> +
> +	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
> +		cpu_relax();
> +	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
> +}
> +
> +static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
> +{
> +	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
> +	 * The hardware fills in the remaining 59 bits according to the following pattern:
> +	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
> +	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
> +	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
> +	 *
> +	 * (where X/Y is the configurable nybble with the low-bit 0)
> +	 *
> +	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
> +	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
> +	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
> +	 */
> +
> +	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE);
> +
> +	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
> +}
> +
> +/* Initial programming for all ATS registers.
> + * - enable ATS for all compute VMIDs
> + * - clear the VMID/PASID mapping for all compute VMIDS
> + * - program the shader core flat address settings:
> + * -- 64-bit mode
> + * -- unaligned access allowed
> + * -- noncached (this is the only CPU-coherent mode in CIK)
> + * -- APE 1 disabled
> + */
> +static void init_ats(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	/* Enable self-ringing doorbell recognition and direct the BIF to send
> +	 * untranslated writes to the IOMMU before comparing to the aperture.*/
> +	WRITE_REG(priv->dev, BIF_DOORBELL_CNTL, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_ALWAYS);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, priv->free_vmid_mask);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_LOW_ADDR, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_HIGH_ADDR, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL2, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_LOW_ADDR, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_HIGH_ADDR, 0);
> +
> +	lock_srbm_index(priv);
> +
> +	for (i = 0; i < CIK_NUM_VMID; i++) {
> +		if (priv->free_vmid_mask & (1U << i)) {
> +			uint32_t sh_mem_config;
> +
> +			set_vmid_pasid_mapping(priv, i, 0);
> +
> +			vmid_select(priv, i);
> +
> +			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
> +			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
> +
> +			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
> +
> +			/* Configure apertures:
> +			 * LDS:		0x60000000'00000000 - 0x60000001'00000000 (4GB)
> +			 * Scratch:	0x60000001'00000000 - 0x60000002'00000000 (4GB)
> +			 * GPUVM:	0x60010000'00000000 - 0x60020000'00000000 (1TB)
> +			 */
> +			WRITE_REG(priv->dev, SH_MEM_BASES, compute_sh_mem_bases_64bit(6));
> +
> +			/* Scratch aperture is not supported for now. */
> +			WRITE_REG(priv->dev, SH_STATIC_MEM_CONFIG, 0);
> +
> +			/* APE1 disabled for now. */
> +			WRITE_REG(priv->dev, SH_MEM_APE1_BASE, 1);
> +			WRITE_REG(priv->dev, SH_MEM_APE1_LIMIT, 0);
> +		}
> +	}
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +static void exit_ats(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < CIK_NUM_VMID; i++)
> +		if (priv->free_vmid_mask & (1U << i))
> +			set_vmid_pasid_mapping(priv, i, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_NEVER);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, 0);
> +}
> +
> +static struct cik_static_private *kfd_scheduler_to_private(struct kfd_scheduler *scheduler)
> +{
> +	return (struct cik_static_private *)scheduler;
> +}
> +
> +static struct cik_static_process *kfd_process_to_private(struct kfd_scheduler_process *process)
> +{
> +	return (struct cik_static_process *)process;
> +}
> +
> +static struct cik_static_queue *kfd_queue_to_private(struct kfd_scheduler_queue *queue)
> +{
> +	return (struct cik_static_queue *)queue;
> +}
> +
> +static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **scheduler)
> +{
> +	struct cik_static_private *priv;
> +	unsigned int i;
> +	int err;
> +	void *hpdptr;
> +
> +	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
> +	if (priv == NULL)
> +		return -ENOMEM;
> +
> +	mutex_init(&priv->mutex);
> +
> +	priv->dev = dev;
> +
> +	priv->first_pipe = dev->shared_resources.first_compute_pipe;
> +	priv->num_pipes = dev->shared_resources.compute_pipe_count;
> +
> +	for (i = 0; i < priv->num_pipes * CIK_QUEUES_PER_PIPE; i++)
> +		__set_bit(i, priv->free_queues);
> +
> +	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
> +
> +	/*
> +	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
> +	 * The driver never accesses this memory after zeroing it. It doesn't even have
> +	 * to be saved/restored on suspend/resume because it contains no data when there
> +	 * are no active queues.
> +	 */
> +	err = radeon_kfd_vidmem_alloc(dev,
> +				      CIK_HPD_SIZE * priv->num_pipes * 2,
> +				      PAGE_SIZE,
> +				      KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
> +				      &priv->hpd_mem);
> +	if (err)
> +		goto err_hpd_alloc;
> +
> +	err = radeon_kfd_vidmem_kmap(dev, priv->hpd_mem, &hpdptr);
> +	if (err)
> +		goto err_hpd_kmap;
> +	memset(hpdptr, 0, CIK_HPD_SIZE * priv->num_pipes);
> +	radeon_kfd_vidmem_unkmap(dev, priv->hpd_mem);
> +
> +	/*
> +	 * Allocate memory for all the MQDs.
> +	 * These are per-queue data that is hardware owned but with driver init.
> +	 * The driver has to copy this data into HQD registers when a
> +	 * pipe is (re)activated.
> +	 */
> +	err = radeon_kfd_vidmem_alloc(dev,
> +				      sizeof(struct cik_mqd_padded) * priv->num_pipes * CIK_QUEUES_PER_PIPE,
> +				      PAGE_SIZE,
> +				      KFD_MEMPOOL_SYSTEM_CACHEABLE,
> +				      &priv->mqd_mem);
> +	if (err)
> +		goto err_mqd_alloc;
> +	radeon_kfd_vidmem_kmap(dev, priv->mqd_mem, (void **)&priv->mqds);
> +	if (err)
> +		goto err_mqd_kmap;
> +
> +	*scheduler = (struct kfd_scheduler *)priv;
> +
> +	return 0;
> +
> +err_mqd_kmap:
> +	radeon_kfd_vidmem_free(dev, priv->mqd_mem);
> +err_mqd_alloc:
> +err_hpd_kmap:
> +	radeon_kfd_vidmem_free(dev, priv->hpd_mem);
> +err_hpd_alloc:
> +	mutex_destroy(&priv->mutex);
> +	kfree(priv);
> +	return err;
> +}
> +
> +static void cik_static_destroy(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	radeon_kfd_vidmem_unkmap(priv->dev, priv->mqd_mem);
> +	radeon_kfd_vidmem_free(priv->dev, priv->mqd_mem);
> +	radeon_kfd_vidmem_free(priv->dev, priv->hpd_mem);
> +
> +	mutex_destroy(&priv->mutex);
> +
> +	kfree(priv);
> +}
> +
> +static void cik_static_start(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->hpd_mem, &priv->hpd_addr);
> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->mqd_mem, &priv->mqd_addr);
> +
> +	init_pipes(priv);
> +	init_ats(priv);
> +}
> +
> +static void cik_static_stop(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	exit_ats(priv);
> +
> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
> +}
> +
> +static bool allocate_vmid(struct cik_static_private *priv, unsigned int *vmid)
> +{
> +	bool ok = false;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	if (priv->free_vmid_mask != 0) {
> +		unsigned int v = __ffs64(priv->free_vmid_mask);
> +
> +		clear_bit(v, &priv->free_vmid_mask);
> +		*vmid = v;
> +
> +		ok = true;
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +
> +	return ok;
> +}
> +
> +static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
> +{
> +	/* It's okay to race against allocate_vmid because this only adds bits to free_vmid_mask.
> +	 * And set_bit/clear_bit are atomic wrt each other. */
> +	set_bit(vmid, &priv->free_vmid_mask);
> +}
> +
> +static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
> +{
> +	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
> +
> +	/*
> +	 * SH_MEM_CONFIG and others need to be programmed differently
> +	 * for 32/64-bit processes. And maybe other reasons.
> +	 */
> +}
> +
> +static int
> +cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process *process,
> +			    struct kfd_scheduler_process **scheduler_process)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	struct cik_static_process *hwp;
> +
> +	hwp = kmalloc(sizeof(*hwp), GFP_KERNEL);
> +	if (hwp == NULL)
> +		return -ENOMEM;
> +
> +	if (!allocate_vmid(priv, &hwp->vmid)) {
> +		kfree(hwp);
> +		return -ENOMEM;
> +	}
> +
> +	hwp->pasid = process->pasid;
> +
> +	setup_vmid_for_process(priv, hwp);
> +
> +	*scheduler_process = (struct kfd_scheduler_process *)hwp;
> +
> +	return 0;
> +}
> +
> +static void cik_static_deregister_process(struct kfd_scheduler *scheduler,
> +				struct kfd_scheduler_process *scheduler_process)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_process *pp = kfd_process_to_private(scheduler_process);
> +
> +	release_vmid(priv, pp->vmid);
> +	kfree(pp);
> +}
> +
> +static bool allocate_hqd(struct cik_static_private *priv, unsigned int *queue)
> +{
> +	bool ok = false;
> +	unsigned int q;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	q = find_first_bit(priv->free_queues, priv->num_pipes * CIK_QUEUES_PER_PIPE);
> +
> +	if (q != priv->num_pipes * CIK_QUEUES_PER_PIPE) {
> +		clear_bit(q, priv->free_queues);
> +		*queue = q;
> +
> +		ok = true;
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +
> +	return ok;
> +}
> +
> +static void release_hqd(struct cik_static_private *priv, unsigned int queue)
> +{
> +	/* It's okay to race against allocate_hqd because this only adds bits to free_queues.
> +	 * And set_bit/clear_bit are atomic wrt each other. */
> +	set_bit(queue, priv->free_queues);
> +}
> +
> +static void init_mqd(const struct cik_static_queue *queue, const struct cik_static_process *process)
> +{
> +	struct cik_mqd *mqd = queue->mqd;
> +
> +	memset(mqd, 0, sizeof(*mqd));
> +
> +	mqd->header = 0xC0310800;
> +	mqd->pipeline_stat_enable = 1;
> +	mqd->static_thread_mgmt01[0] = 0xffffffff;
> +	mqd->static_thread_mgmt01[1] = 0xffffffff;
> +	mqd->static_thread_mgmt23[0] = 0xffffffff;
> +	mqd->static_thread_mgmt23[1] = 0xffffffff;
> +
> +	mqd->queue_state.cp_mqd_base_addr = lower_32(queue->mqd_addr);
> +	mqd->queue_state.cp_mqd_base_addr_hi = upper_32(queue->mqd_addr);
> +	mqd->queue_state.cp_mqd_control = MQD_CONTROL_PRIV_STATE_EN;
> +
> +	mqd->queue_state.cp_hqd_pq_base = lower_32((uintptr_t)queue->pq_addr >> 8);
> +	mqd->queue_state.cp_hqd_pq_base_hi = upper_32((uintptr_t)queue->pq_addr >> 8);
> +	mqd->queue_state.cp_hqd_pq_control = QUEUE_SIZE(queue->queue_size_encoded) | DEFAULT_RPTR_BLOCK_SIZE
> +					    | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uintptr_t)queue->rptr_address);
> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uintptr_t)queue->rptr_address);
> +	mqd->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(queue->doorbell_index) | DOORBELL_EN;
> +	mqd->queue_state.cp_hqd_vmid = process->vmid;
> +	mqd->queue_state.cp_hqd_active = 1;
> +
> +	mqd->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
> +
> +	/* The values for these 3 are from WinKFD. */
> +	mqd->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
> +	mqd->queue_state.cp_hqd_pipe_priority = 1;
> +	mqd->queue_state.cp_hqd_queue_priority = 15;
> +
> +	mqd->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
> +}
> +
> +/* Write the HQD registers and activate the queue.
> + * Requires that SRBM_GFX_CNTL has already been programmed for the queue.
> + */
> +static void load_hqd(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	struct kfd_dev *dev = priv->dev;
> +	const struct cik_hqd_registers *qs = &queue->mqd->queue_state;
> +
> +	WRITE_REG(dev, CP_MQD_BASE_ADDR, qs->cp_mqd_base_addr);
> +	WRITE_REG(dev, CP_MQD_BASE_ADDR_HI, qs->cp_mqd_base_addr_hi);
> +	WRITE_REG(dev, CP_MQD_CONTROL, qs->cp_mqd_control);
> +
> +	WRITE_REG(dev, CP_HQD_PQ_BASE, qs->cp_hqd_pq_base);
> +	WRITE_REG(dev, CP_HQD_PQ_BASE_HI, qs->cp_hqd_pq_base_hi);
> +	WRITE_REG(dev, CP_HQD_PQ_CONTROL, qs->cp_hqd_pq_control);
> +	/* DOORBELL_CONTROL before WPTR because WPTR writes are dropped if DOORBELL_HIT is set. */
> +	WRITE_REG(dev, CP_HQD_PQ_DOORBELL_CONTROL, qs->cp_hqd_pq_doorbell_control);
> +	WRITE_REG(dev, CP_HQD_PQ_WPTR, qs->cp_hqd_pq_wptr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR, qs->cp_hqd_pq_rptr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR, qs->cp_hqd_pq_rptr_report_addr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, qs->cp_hqd_pq_rptr_report_addr_hi);
> +
> +	WRITE_REG(dev, CP_HQD_VMID, qs->cp_hqd_vmid);
> +	WRITE_REG(dev, CP_HQD_PERSISTENT_STATE, qs->cp_hqd_persistent_state);
> +	WRITE_REG(dev, CP_HQD_QUANTUM, qs->cp_hqd_quantum);
> +	WRITE_REG(dev, CP_HQD_PIPE_PRIORITY, qs->cp_hqd_pipe_priority);
> +	WRITE_REG(dev, CP_HQD_QUEUE_PRIORITY, qs->cp_hqd_queue_priority);
> +
> +	WRITE_REG(dev, CP_HQD_IB_CONTROL, qs->cp_hqd_ib_control);
> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR, qs->cp_hqd_ib_base_addr);
> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR_HI, qs->cp_hqd_ib_base_addr_hi);
> +	WRITE_REG(dev, CP_HQD_IB_RPTR, qs->cp_hqd_ib_rptr);
> +	WRITE_REG(dev, CP_HQD_SEMA_CMD, qs->cp_hqd_sema_cmd);
> +	WRITE_REG(dev, CP_HQD_MSG_TYPE, qs->cp_hqd_msg_type);
> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_LO, qs->cp_hqd_atomic0_preop_lo);
> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_HI, qs->cp_hqd_atomic0_preop_hi);
> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_LO, qs->cp_hqd_atomic1_preop_lo);
> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_HI, qs->cp_hqd_atomic1_preop_hi);
> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER0, qs->cp_hqd_hq_scheduler0);
> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER1, qs->cp_hqd_hq_scheduler1);
> +
> +	WRITE_REG(dev, CP_HQD_ACTIVE, 1);
> +}
> +
> +static void activate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	bool wptr_shadow_valid;
> +	doorbell_t wptr_shadow;
> +
> +	/* Avoid sleeping while holding the SRBM lock. */
> +	wptr_shadow_valid = !get_user(wptr_shadow, queue->wptr_address);
> +
> +	lock_srbm_index(priv);
> +	queue_select(priv, queue->queue);
> +
> +	load_hqd(priv, queue);
> +
> +	/* Doorbell and wptr are special because there is a race when reactivating a queue.
> +	 * Since doorbell writes to deactivated queues are ignored by hardware, the application
> +	 * shadows the doorbell into memory at queue->wptr_address.
> +	 *
> +	 * We want the queue to automatically resume processing as if it were always active,
> +	 * so we want to copy from queue->wptr_address into the wptr/doorbell.
> +	 *
> +	 * The race is that the app could write a new wptr into the doorbell before we
> +	 * write the shadowed wptr, resulting in an old wptr written later.
> +	 *
> +	 * The hardware solves this ignoring CP_HQD_WPTR writes after a doorbell write.
> +	 * So the KFD can activate the doorbell then write the shadow wptr to CP_HQD_WPTR
> +	 * knowing it will be ignored if the user has written a more-recent doorbell.
> +	 */
> +	if (wptr_shadow_valid)
> +		WRITE_REG(priv->dev, CP_HQD_PQ_WPTR, wptr_shadow);
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +static void drain_hqd(struct cik_static_private *priv)
> +{
> +	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
> +}
> +
> +static void wait_hqd_inactive(struct cik_static_private *priv)
> +{
> +	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
> +		cpu_relax();
> +}
> +
> +static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	lock_srbm_index(priv);
> +	queue_select(priv, queue->queue);
> +
> +	drain_hqd(priv);
> +	wait_hqd_inactive(priv);
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +#define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
> +#define RING_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 8))
> +#define RWPTR_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 2))
> +
> +#define MAX_QUEUE_SIZE (1ULL << 32)
> +#define MIN_QUEUE_SIZE (1ULL << 10)
> +
> +static int
> +cik_static_create_queue(struct kfd_scheduler *scheduler,
> +			struct kfd_scheduler_process *process,
> +			struct kfd_scheduler_queue *queue,
> +			void __user *ring_address,
> +			uint64_t ring_size,
> +			void __user *rptr_address,
> +			void __user *wptr_address,
> +			unsigned int doorbell)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_process *hwp = kfd_process_to_private(process);
> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
> +
> +	if ((uint64_t)ring_address & RING_ADDRESS_BAD_BIT_MASK
> +	    || (uint64_t)rptr_address & RWPTR_ADDRESS_BAD_BIT_MASK
> +	    || (uint64_t)wptr_address & RWPTR_ADDRESS_BAD_BIT_MASK)
> +		return -EINVAL;
> +
> +	if (ring_size > MAX_QUEUE_SIZE || ring_size < MIN_QUEUE_SIZE || !is_power_of_2(ring_size))
> +		return -EINVAL;
> +
> +	if (!allocate_hqd(priv, &hwq->queue))
> +		return -ENOMEM;
> +
> +	hwq->mqd_addr = priv->mqd_addr + sizeof(struct cik_mqd_padded) * hwq->queue;
> +	hwq->mqd = &priv->mqds[hwq->queue].mqd;
> +	hwq->pq_addr = ring_address;
> +	hwq->rptr_address = rptr_address;
> +	hwq->wptr_address = wptr_address;
> +	hwq->doorbell_index = doorbell;
> +	hwq->queue_size_encoded = ilog2(ring_size) - 3;
> +
> +	init_mqd(hwq, hwp);
> +	activate_queue(priv, hwq);
> +
> +	return 0;
> +}
> +
> +static void
> +cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_queue *queue)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
> +
> +	deactivate_queue(priv, hwq);
> +
> +	release_hqd(priv, hwq->queue);
> +}
> +
> +const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
> +	.name = "CIK static scheduler",
> +	.create = cik_static_create,
> +	.destroy = cik_static_destroy,
> +	.start = cik_static_start,
> +	.stop = cik_static_stop,
> +	.register_process = cik_static_register_process,
> +	.deregister_process = cik_static_deregister_process,
> +	.queue_size = sizeof(struct cik_static_queue),
> +	.create_queue = cik_static_create_queue,
> +	.destroy_queue = cik_static_destroy_queue,
> +};
> diff --git a/drivers/gpu/hsa/radeon/kfd_vidmem.c b/drivers/gpu/hsa/radeon/kfd_vidmem.c
> new file mode 100644
> index 0000000..c8d3770
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_vidmem.c
> @@ -0,0 +1,61 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "kfd_priv.h"
> +
> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj)
> +{
> +	return kfd2kgd->allocate_mem(kfd->kgd,
> +					size,
> +					alignment,
> +					(enum kgd_memory_pool)pool,
> +					(struct kgd_mem **)mem_obj);
> +}
> +
> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
> +				uint64_t *vmid0_address)
> +{
> +	return kfd2kgd->gpumap_mem(kfd->kgd,
> +					(struct kgd_mem *)mem_obj,
> +					vmid0_address);
> +}
> +
> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
> +{
> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
> +}
> +
> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 11/83] hsa/radeon: Add scheduler code
@ 2014-07-11 18:25     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:25 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:11AM +0300, Oded Gabbay wrote:
> This patch adds the code base of the scheduler, which handles queue
> creation, deletion and scheduling on the CP of the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

I would rather see all this squashed, this gave feeling that driver
can access register which is latter remove. I know jungling with
patch squashing can be daunting but really it makes reviewing hard
here because i have to jump back and forth to see if thing i am looking
at really matter in the final version.

Cheers,
Jérôme

> ---
>  drivers/gpu/hsa/radeon/Makefile               |   3 +-
>  drivers/gpu/hsa/radeon/cik_regs.h             | 213 +++++++
>  drivers/gpu/hsa/radeon/kfd_device.c           |   1 +
>  drivers/gpu/hsa/radeon/kfd_registers.c        |  50 ++
>  drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 800 ++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_vidmem.c           |  61 ++
>  6 files changed, 1127 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_registers.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
> 
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> index 989518a..28da10c 100644
> --- a/drivers/gpu/hsa/radeon/Makefile
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -4,6 +4,7 @@
>  
>  radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>  		kfd_pasid.o kfd_topology.o kfd_process.o \
> -		kfd_doorbell.o
> +		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
> +		kfd_vidmem.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
> new file mode 100644
> index 0000000..d0cdc57
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/cik_regs.h
> @@ -0,0 +1,213 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef CIK_REGS_H
> +#define CIK_REGS_H
> +
> +#define BIF_DOORBELL_CNTL				0x530Cu
> +
> +#define	SRBM_GFX_CNTL					0xE44
> +#define	PIPEID(x)					((x) << 0)
> +#define	MEID(x)						((x) << 2)
> +#define	VMID(x)						((x) << 4)
> +#define	QUEUEID(x)					((x) << 8)
> +
> +#define	SQ_CONFIG					0x8C00
> +
> +#define	SH_MEM_BASES					0x8C28
> +/* if PTR32, these are the bases for scratch and lds */
> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
> +#define	SH_MEM_APE1_BASE				0x8C2C
> +/* if PTR32, this is the base location of GPUVM */
> +#define	SH_MEM_APE1_LIMIT				0x8C30
> +/* if PTR32, this is the upper limit of GPUVM */
> +#define	SH_MEM_CONFIG					0x8C34
> +#define	PTR32						(1 << 0)
> +#define	ALIGNMENT_MODE(x)				((x) << 2)
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
> +#define	DEFAULT_MTYPE(x)				((x) << 4)
> +#define	APE1_MTYPE(x)					((x) << 7)
> +
> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
> +#define	MTYPE_NONCACHED					3
> +
> +
> +#define SH_STATIC_MEM_CONFIG				0x9604u
> +
> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
> +#define	TC_CFG_L1_VOLATILE				0xAC88
> +#define	TC_CFG_L2_VOLATILE				0xAC8C
> +
> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
> +#define	WPTR_POLL_EN					(1 << 31)
> +
> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
> +#define PRIV_REG_INT_ENABLE				(1 << 23)
> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
> +#define GENERIC2_INT_ENABLE				(1 << 29)
> +#define GENERIC1_INT_ENABLE				(1 << 30)
> +#define GENERIC0_INT_ENABLE				(1 << 31)
> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
> +#define PRIV_REG_INT_STATUS				(1 << 23)
> +#define TIME_STAMP_INT_STATUS				(1 << 26)
> +#define GENERIC2_INT_STATUS				(1 << 29)
> +#define GENERIC1_INT_STATUS				(1 << 30)
> +#define GENERIC0_INT_STATUS				(1 << 31)
> +
> +#define CP_HPD_EOP_BASE_ADDR				0xC904
> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
> +#define CP_HPD_EOP_VMID					0xC90C
> +#define CP_HPD_EOP_CONTROL				0xC910
> +#define	EOP_SIZE(x)					((x) << 0)
> +#define	EOP_SIZE_MASK					(0x3f << 0)
> +#define CP_MQD_BASE_ADDR				0xC914
> +#define CP_MQD_BASE_ADDR_HI				0xC918
> +#define CP_HQD_ACTIVE					0xC91C
> +#define CP_HQD_VMID					0xC920
> +
> +#define CP_HQD_PERSISTENT_STATE				0xC924u
> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
> +
> +#define CP_HQD_PIPE_PRIORITY				0xC928u
> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
> +#define CP_HQD_QUANTUM					0xC930u
> +#define	QUANTUM_EN					1U
> +#define	QUANTUM_SCALE_1MS				(1U << 4)
> +#define	QUANTUM_DURATION(x)				((x) << 8)
> +
> +#define CP_HQD_PQ_BASE					0xC934
> +#define CP_HQD_PQ_BASE_HI				0xC938
> +#define CP_HQD_PQ_RPTR					0xC93C
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
> +#define	DOORBELL_OFFSET(x)				((x) << 2)
> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
> +#define	DOORBELL_SOURCE					(1 << 28)
> +#define	DOORBELL_SCHD_HIT				(1 << 29)
> +#define	DOORBELL_EN					(1 << 30)
> +#define	DOORBELL_HIT					(1 << 31)
> +#define CP_HQD_PQ_WPTR					0xC954
> +#define CP_HQD_PQ_CONTROL				0xC958
> +#define	QUEUE_SIZE(x)					((x) << 0)
> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
> +#define	PQ_ATC_EN					(1 << 23)
> +#define	PQ_VOLATILE					(1 << 26)
> +#define	NO_UPDATE_RPTR					(1 << 27)
> +#define	UNORD_DISPATCH					(1 << 28)
> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
> +#define	PRIV_STATE					(1 << 30)
> +#define	KMD_QUEUE					(1 << 31)
> +
> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
> +
> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
> +#define CP_HQD_IB_RPTR					0xC964u
> +#define CP_HQD_IB_CONTROL				0xC968u
> +#define	IB_ATC_EN					(1U << 23)
> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
> +
> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
> +#define	DEQUEUE_REQUEST_DRAIN				1
> +
> +#define CP_HQD_SEMA_CMD					0xC97Cu
> +#define CP_HQD_MSG_TYPE					0xC980u
> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
> +
> +
> +#define CP_MQD_CONTROL					0xC99C
> +#define	MQD_VMID(x)					((x) << 0)
> +#define	MQD_VMID_MASK					(0xf << 0)
> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
> +
> +#define GRBM_GFX_INDEX					0x30800
> +#define	INSTANCE_INDEX(x)				((x) << 0)
> +#define	SH_INDEX(x)					((x) << 8)
> +#define	SE_INDEX(x)					((x) << 16)
> +#define	SH_BROADCAST_WRITES				(1 << 29)
> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
> +#define	SE_BROADCAST_WRITES				(1 << 31)
> +
> +#define SQC_CACHES					0x30d20
> +#define SQC_POLICY					0x8C38u
> +#define SQC_VOLATILE					0x8C3Cu
> +
> +#define CP_PERFMON_CNTL					0x36020
> +
> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
> +
> +#define ATC_VM_APERTURE0_CNTL				0x3310u
> +#define	ATS_ACCESS_MODE_NEVER				0
> +#define	ATS_ACCESS_MODE_ALWAYS				1
> +
> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
> +#define ATC_VM_APERTURE1_CNTL				0x3314u
> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
> +
> +#endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> index 4e9fe6c..465c822 100644
> --- a/drivers/gpu/hsa/radeon/kfd_device.c
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -28,6 +28,7 @@
>  #include "kfd_scheduler.h"
>  
>  static const struct kfd_device_info bonaire_device_info = {
> +	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>  	.max_pasid_bits = 16,
>  };
>  
> diff --git a/drivers/gpu/hsa/radeon/kfd_registers.c b/drivers/gpu/hsa/radeon/kfd_registers.c
> new file mode 100644
> index 0000000..223debd
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_registers.c
> @@ -0,0 +1,50 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/io.h>
> +#include "kfd_priv.h"
> +
> +/* In KFD, "reg" is the byte offset of the register. */
> +static void __iomem *reg_address(struct kfd_dev *dev, uint32_t reg)
> +{
> +	return dev->regs + reg;
> +}
> +
> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value)
> +{
> +	writel(value, reg_address(dev, reg));
> +}
> +
> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg)
> +{
> +	return readl(reg_address(dev, reg));
> +}
> +
> +void radeon_kfd_lock_srbm_index(struct kfd_dev *dev)
> +{
> +	kfd2kgd->lock_srbm_gfx_cntl(dev->kgd);
> +}
> +
> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *dev)
> +{
> +	kfd2kgd->unlock_srbm_gfx_cntl(dev->kgd);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
> new file mode 100644
> index 0000000..b986ff9
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
> @@ -0,0 +1,800 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/log2.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/types.h>
> +#include <linux/uaccess.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +#include "cik_regs.h"
> +
> +/* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
> + * The first MEC is ME 1 with the GFX ME as ME 0.
> + * We split the CP with the KGD, they take the first N pipes and we take the rest.
> + */
> +#define CIK_QUEUES_PER_PIPE 8
> +#define CIK_PIPES_PER_MEC 4
> +
> +#define CIK_MAX_PIPES (2 * CIK_PIPES_PER_MEC)
> +
> +#define CIK_NUM_VMID 16
> +
> +#define CIK_HPD_SIZE_LOG2 11
> +#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
> +#define CIK_HPD_ALIGNMENT 256
> +#define CIK_MQD_ALIGNMENT 4
> +
> +#pragma pack(push, 4)
> +
> +struct cik_hqd_registers {
> +	u32 cp_mqd_base_addr;
> +	u32 cp_mqd_base_addr_hi;
> +	u32 cp_hqd_active;
> +	u32 cp_hqd_vmid;
> +	u32 cp_hqd_persistent_state;
> +	u32 cp_hqd_pipe_priority;
> +	u32 cp_hqd_queue_priority;
> +	u32 cp_hqd_quantum;
> +	u32 cp_hqd_pq_base;
> +	u32 cp_hqd_pq_base_hi;
> +	u32 cp_hqd_pq_rptr;
> +	u32 cp_hqd_pq_rptr_report_addr;
> +	u32 cp_hqd_pq_rptr_report_addr_hi;
> +	u32 cp_hqd_pq_wptr_poll_addr;
> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
> +	u32 cp_hqd_pq_doorbell_control;
> +	u32 cp_hqd_pq_wptr;
> +	u32 cp_hqd_pq_control;
> +	u32 cp_hqd_ib_base_addr;
> +	u32 cp_hqd_ib_base_addr_hi;
> +	u32 cp_hqd_ib_rptr;
> +	u32 cp_hqd_ib_control;
> +	u32 cp_hqd_iq_timer;
> +	u32 cp_hqd_iq_rptr;
> +	u32 cp_hqd_dequeue_request;
> +	u32 cp_hqd_dma_offload;
> +	u32 cp_hqd_sema_cmd;
> +	u32 cp_hqd_msg_type;
> +	u32 cp_hqd_atomic0_preop_lo;
> +	u32 cp_hqd_atomic0_preop_hi;
> +	u32 cp_hqd_atomic1_preop_lo;
> +	u32 cp_hqd_atomic1_preop_hi;
> +	u32 cp_hqd_hq_scheduler0;
> +	u32 cp_hqd_hq_scheduler1;
> +	u32 cp_mqd_control;
> +};
> +
> +struct cik_mqd {
> +	u32 header;
> +	u32 dispatch_initiator;
> +	u32 dimensions[3];
> +	u32 start_idx[3];
> +	u32 num_threads[3];
> +	u32 pipeline_stat_enable;
> +	u32 perf_counter_enable;
> +	u32 pgm[2];
> +	u32 tba[2];
> +	u32 tma[2];
> +	u32 pgm_rsrc[2];
> +	u32 vmid;
> +	u32 resource_limits;
> +	u32 static_thread_mgmt01[2];
> +	u32 tmp_ring_size;
> +	u32 static_thread_mgmt23[2];
> +	u32 restart[3];
> +	u32 thread_trace_enable;
> +	u32 reserved1;
> +	u32 user_data[16];
> +	u32 vgtcs_invoke_count[2];
> +	struct cik_hqd_registers queue_state;
> +	u32 dequeue_cntr;
> +	u32 interrupt_queue[64];
> +};
> +
> +struct cik_mqd_padded {
> +	struct cik_mqd mqd;
> +	u8 padding[1024 - sizeof(struct cik_mqd)]; /* Pad MQD out to 1KB. (HW requires 4-byte alignment.) */
> +};
> +
> +#pragma pack(pop)
> +
> +struct cik_static_private {
> +	struct kfd_dev *dev;
> +
> +	struct mutex mutex;
> +
> +	unsigned int first_pipe;
> +	unsigned int num_pipes;
> +
> +	unsigned long free_vmid_mask; /* unsigned long to make set/clear_bit happy */
> +
> +	/* Everything below here is offset by first_pipe. E.g. bit 0 in
> +	 * free_queues is queue 0 in pipe first_pipe
> +	 */
> +
> +	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
> +	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
> +
> +	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
> +	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
> +				 * pipes. This is actually struct cik_mqd_padded. */
> +	uint64_t hpd_addr;	/* GPU address for hpd_mem. */
> +	uint64_t mqd_addr;	/* GPU address for mqd_mem. */
> +	 /*
> +	  * Pointer for mqd_mem.
> +	  * We keep this mapped because multiple processes may need to access it
> +	  * in parallel and this is simpler than controlling concurrent kmaps
> +	  */
> +	struct cik_mqd_padded *mqds;
> +};
> +
> +struct cik_static_process {
> +	unsigned int vmid;
> +	pasid_t pasid;
> +};
> +
> +struct cik_static_queue {
> +	unsigned int queue; /* + first_pipe * QUEUES_PER_PIPE */
> +
> +	uint64_t mqd_addr;
> +	struct cik_mqd *mqd;
> +
> +	void __user *pq_addr;
> +	void __user *rptr_address;
> +	doorbell_t __user *wptr_address;
> +	uint32_t doorbell_index;
> +
> +	uint32_t queue_size_encoded; /* CP_HQD_PQ_CONTROL.QUEUE_SIZE takes the queue size as log2(size) - 3. */
> +};
> +
> +static uint32_t lower_32(uint64_t x)
> +{
> +	return (uint32_t)x;
> +}
> +
> +static uint32_t upper_32(uint64_t x)
> +{
> +	return (uint32_t)(x >> 32);
> +}
> +
> +/* SRBM_GFX_CNTL provides the MEC/pipe/queue and vmid for many registers that are
> + * In particular, CP_HQD_* and CP_MQD_* are instanced for each queue. CP_HPD_* are instanced for each pipe.
> + * SH_MEM_* are instanced per-VMID.
> + *
> + * We provide queue_select, pipe_select and vmid_select helpers that should be used before accessing
> + * registers from those groups. Note that these overwrite each other, e.g. after vmid_select the current
> + * selected MEC/pipe/queue is undefined.
> + *
> + * SRBM_GFX_CNTL and the registers it indexes are shared with KGD. You must be holding the srbm_gfx_cntl
> + * lock via lock_srbm_index before setting SRBM_GFX_CNTL or accessing any of the instanced registers.
> + */
> +static uint32_t make_srbm_gfx_cntl_mpqv(unsigned int me, unsigned int pipe, unsigned int queue, unsigned int vmid)
> +{
> +	return QUEUEID(queue) | VMID(vmid) | MEID(me) | PIPEID(pipe);
> +}
> +
> +static void pipe_select(struct cik_static_private *priv, unsigned int pipe)
> +{
> +	unsigned int pipe_in_mec = (pipe + priv->first_pipe) % CIK_PIPES_PER_MEC;
> +	unsigned int mec = (pipe + priv->first_pipe) / CIK_PIPES_PER_MEC;
> +
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, 0, 0));
> +}
> +
> +static void queue_select(struct cik_static_private *priv, unsigned int queue)
> +{
> +	unsigned int queue_in_pipe = queue % CIK_QUEUES_PER_PIPE;
> +	unsigned int pipe = queue / CIK_QUEUES_PER_PIPE + priv->first_pipe;
> +	unsigned int pipe_in_mec = pipe % CIK_PIPES_PER_MEC;
> +	unsigned int mec = pipe / CIK_PIPES_PER_MEC;
> +
> +#if 0
> +	dev_err(radeon_kfd_chardev(), "queue select %d = %u/%u/%u = 0x%08x\n", queue, mec+1, pipe_in_mec, queue_in_pipe,
> +		make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
> +#endif
> +
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
> +}
> +
> +static void vmid_select(struct cik_static_private *priv, unsigned int vmid)
> +{
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(0, 0, 0, vmid));
> +}
> +
> +static void lock_srbm_index(struct cik_static_private *priv)
> +{
> +	radeon_kfd_lock_srbm_index(priv->dev);
> +}
> +
> +static void unlock_srbm_index(struct cik_static_private *priv)
> +{
> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, 0);	/* Be nice to KGD, reset indexed CP registers to the GFX pipe. */
> +	radeon_kfd_unlock_srbm_index(priv->dev);
> +}
> +
> +/* One-time setup for all compute pipes. They need to be programmed with the address & size of the HPD EOP buffer. */
> +static void init_pipes(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	lock_srbm_index(priv);
> +
> +	for (i = 0; i < priv->num_pipes; i++) {
> +		uint64_t pipe_hpd_addr = priv->hpd_addr + i * CIK_HPD_SIZE;
> +
> +		pipe_select(priv, i);
> +
> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR, lower_32(pipe_hpd_addr >> 8));
> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR_HI, upper_32(pipe_hpd_addr >> 8));
> +		WRITE_REG(priv->dev, CP_HPD_EOP_VMID, 0);
> +		WRITE_REG(priv->dev, CP_HPD_EOP_CONTROL, CIK_HPD_SIZE_LOG2 - 1);
> +	}
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +/* Program the VMID -> PASID mapping for one VMID.
> + * PASID 0 is special: it means to associate no PASID with that VMID.
> + * This function waits for the VMID/PASID mapping to complete.
> + */
> +static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int vmid, pasid_t pasid)
> +{
> +	/* We have to assume that there is no outstanding mapping.
> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
> +	 * is in progress or because a mapping finished and the SW cleared it.
> +	 * So the protocol is to always wait & clear.
> +	 */
> +
> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
> +
> +	WRITE_REG(priv->dev, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
> +
> +	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
> +		cpu_relax();
> +	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
> +}
> +
> +static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
> +{
> +	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
> +	 * The hardware fills in the remaining 59 bits according to the following pattern:
> +	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
> +	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
> +	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
> +	 *
> +	 * (where X/Y is the configurable nybble with the low-bit 0)
> +	 *
> +	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
> +	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
> +	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
> +	 */
> +
> +	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE);
> +
> +	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
> +}
> +
> +/* Initial programming for all ATS registers.
> + * - enable ATS for all compute VMIDs
> + * - clear the VMID/PASID mapping for all compute VMIDS
> + * - program the shader core flat address settings:
> + * -- 64-bit mode
> + * -- unaligned access allowed
> + * -- noncached (this is the only CPU-coherent mode in CIK)
> + * -- APE 1 disabled
> + */
> +static void init_ats(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	/* Enable self-ringing doorbell recognition and direct the BIF to send
> +	 * untranslated writes to the IOMMU before comparing to the aperture.*/
> +	WRITE_REG(priv->dev, BIF_DOORBELL_CNTL, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_ALWAYS);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, priv->free_vmid_mask);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_LOW_ADDR, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_HIGH_ADDR, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL2, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_LOW_ADDR, 0);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_HIGH_ADDR, 0);
> +
> +	lock_srbm_index(priv);
> +
> +	for (i = 0; i < CIK_NUM_VMID; i++) {
> +		if (priv->free_vmid_mask & (1U << i)) {
> +			uint32_t sh_mem_config;
> +
> +			set_vmid_pasid_mapping(priv, i, 0);
> +
> +			vmid_select(priv, i);
> +
> +			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
> +			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
> +
> +			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
> +
> +			/* Configure apertures:
> +			 * LDS:		0x60000000'00000000 - 0x60000001'00000000 (4GB)
> +			 * Scratch:	0x60000001'00000000 - 0x60000002'00000000 (4GB)
> +			 * GPUVM:	0x60010000'00000000 - 0x60020000'00000000 (1TB)
> +			 */
> +			WRITE_REG(priv->dev, SH_MEM_BASES, compute_sh_mem_bases_64bit(6));
> +
> +			/* Scratch aperture is not supported for now. */
> +			WRITE_REG(priv->dev, SH_STATIC_MEM_CONFIG, 0);
> +
> +			/* APE1 disabled for now. */
> +			WRITE_REG(priv->dev, SH_MEM_APE1_BASE, 1);
> +			WRITE_REG(priv->dev, SH_MEM_APE1_LIMIT, 0);
> +		}
> +	}
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +static void exit_ats(struct cik_static_private *priv)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < CIK_NUM_VMID; i++)
> +		if (priv->free_vmid_mask & (1U << i))
> +			set_vmid_pasid_mapping(priv, i, 0);
> +
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_NEVER);
> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, 0);
> +}
> +
> +static struct cik_static_private *kfd_scheduler_to_private(struct kfd_scheduler *scheduler)
> +{
> +	return (struct cik_static_private *)scheduler;
> +}
> +
> +static struct cik_static_process *kfd_process_to_private(struct kfd_scheduler_process *process)
> +{
> +	return (struct cik_static_process *)process;
> +}
> +
> +static struct cik_static_queue *kfd_queue_to_private(struct kfd_scheduler_queue *queue)
> +{
> +	return (struct cik_static_queue *)queue;
> +}
> +
> +static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **scheduler)
> +{
> +	struct cik_static_private *priv;
> +	unsigned int i;
> +	int err;
> +	void *hpdptr;
> +
> +	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
> +	if (priv == NULL)
> +		return -ENOMEM;
> +
> +	mutex_init(&priv->mutex);
> +
> +	priv->dev = dev;
> +
> +	priv->first_pipe = dev->shared_resources.first_compute_pipe;
> +	priv->num_pipes = dev->shared_resources.compute_pipe_count;
> +
> +	for (i = 0; i < priv->num_pipes * CIK_QUEUES_PER_PIPE; i++)
> +		__set_bit(i, priv->free_queues);
> +
> +	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
> +
> +	/*
> +	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
> +	 * The driver never accesses this memory after zeroing it. It doesn't even have
> +	 * to be saved/restored on suspend/resume because it contains no data when there
> +	 * are no active queues.
> +	 */
> +	err = radeon_kfd_vidmem_alloc(dev,
> +				      CIK_HPD_SIZE * priv->num_pipes * 2,
> +				      PAGE_SIZE,
> +				      KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
> +				      &priv->hpd_mem);
> +	if (err)
> +		goto err_hpd_alloc;
> +
> +	err = radeon_kfd_vidmem_kmap(dev, priv->hpd_mem, &hpdptr);
> +	if (err)
> +		goto err_hpd_kmap;
> +	memset(hpdptr, 0, CIK_HPD_SIZE * priv->num_pipes);
> +	radeon_kfd_vidmem_unkmap(dev, priv->hpd_mem);
> +
> +	/*
> +	 * Allocate memory for all the MQDs.
> +	 * These are per-queue data that is hardware owned but with driver init.
> +	 * The driver has to copy this data into HQD registers when a
> +	 * pipe is (re)activated.
> +	 */
> +	err = radeon_kfd_vidmem_alloc(dev,
> +				      sizeof(struct cik_mqd_padded) * priv->num_pipes * CIK_QUEUES_PER_PIPE,
> +				      PAGE_SIZE,
> +				      KFD_MEMPOOL_SYSTEM_CACHEABLE,
> +				      &priv->mqd_mem);
> +	if (err)
> +		goto err_mqd_alloc;
> +	radeon_kfd_vidmem_kmap(dev, priv->mqd_mem, (void **)&priv->mqds);
> +	if (err)
> +		goto err_mqd_kmap;
> +
> +	*scheduler = (struct kfd_scheduler *)priv;
> +
> +	return 0;
> +
> +err_mqd_kmap:
> +	radeon_kfd_vidmem_free(dev, priv->mqd_mem);
> +err_mqd_alloc:
> +err_hpd_kmap:
> +	radeon_kfd_vidmem_free(dev, priv->hpd_mem);
> +err_hpd_alloc:
> +	mutex_destroy(&priv->mutex);
> +	kfree(priv);
> +	return err;
> +}
> +
> +static void cik_static_destroy(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	radeon_kfd_vidmem_unkmap(priv->dev, priv->mqd_mem);
> +	radeon_kfd_vidmem_free(priv->dev, priv->mqd_mem);
> +	radeon_kfd_vidmem_free(priv->dev, priv->hpd_mem);
> +
> +	mutex_destroy(&priv->mutex);
> +
> +	kfree(priv);
> +}
> +
> +static void cik_static_start(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->hpd_mem, &priv->hpd_addr);
> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->mqd_mem, &priv->mqd_addr);
> +
> +	init_pipes(priv);
> +	init_ats(priv);
> +}
> +
> +static void cik_static_stop(struct kfd_scheduler *scheduler)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	exit_ats(priv);
> +
> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
> +}
> +
> +static bool allocate_vmid(struct cik_static_private *priv, unsigned int *vmid)
> +{
> +	bool ok = false;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	if (priv->free_vmid_mask != 0) {
> +		unsigned int v = __ffs64(priv->free_vmid_mask);
> +
> +		clear_bit(v, &priv->free_vmid_mask);
> +		*vmid = v;
> +
> +		ok = true;
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +
> +	return ok;
> +}
> +
> +static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
> +{
> +	/* It's okay to race against allocate_vmid because this only adds bits to free_vmid_mask.
> +	 * And set_bit/clear_bit are atomic wrt each other. */
> +	set_bit(vmid, &priv->free_vmid_mask);
> +}
> +
> +static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
> +{
> +	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
> +
> +	/*
> +	 * SH_MEM_CONFIG and others need to be programmed differently
> +	 * for 32/64-bit processes. And maybe other reasons.
> +	 */
> +}
> +
> +static int
> +cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process *process,
> +			    struct kfd_scheduler_process **scheduler_process)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +
> +	struct cik_static_process *hwp;
> +
> +	hwp = kmalloc(sizeof(*hwp), GFP_KERNEL);
> +	if (hwp == NULL)
> +		return -ENOMEM;
> +
> +	if (!allocate_vmid(priv, &hwp->vmid)) {
> +		kfree(hwp);
> +		return -ENOMEM;
> +	}
> +
> +	hwp->pasid = process->pasid;
> +
> +	setup_vmid_for_process(priv, hwp);
> +
> +	*scheduler_process = (struct kfd_scheduler_process *)hwp;
> +
> +	return 0;
> +}
> +
> +static void cik_static_deregister_process(struct kfd_scheduler *scheduler,
> +				struct kfd_scheduler_process *scheduler_process)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_process *pp = kfd_process_to_private(scheduler_process);
> +
> +	release_vmid(priv, pp->vmid);
> +	kfree(pp);
> +}
> +
> +static bool allocate_hqd(struct cik_static_private *priv, unsigned int *queue)
> +{
> +	bool ok = false;
> +	unsigned int q;
> +
> +	mutex_lock(&priv->mutex);
> +
> +	q = find_first_bit(priv->free_queues, priv->num_pipes * CIK_QUEUES_PER_PIPE);
> +
> +	if (q != priv->num_pipes * CIK_QUEUES_PER_PIPE) {
> +		clear_bit(q, priv->free_queues);
> +		*queue = q;
> +
> +		ok = true;
> +	}
> +
> +	mutex_unlock(&priv->mutex);
> +
> +	return ok;
> +}
> +
> +static void release_hqd(struct cik_static_private *priv, unsigned int queue)
> +{
> +	/* It's okay to race against allocate_hqd because this only adds bits to free_queues.
> +	 * And set_bit/clear_bit are atomic wrt each other. */
> +	set_bit(queue, priv->free_queues);
> +}
> +
> +static void init_mqd(const struct cik_static_queue *queue, const struct cik_static_process *process)
> +{
> +	struct cik_mqd *mqd = queue->mqd;
> +
> +	memset(mqd, 0, sizeof(*mqd));
> +
> +	mqd->header = 0xC0310800;
> +	mqd->pipeline_stat_enable = 1;
> +	mqd->static_thread_mgmt01[0] = 0xffffffff;
> +	mqd->static_thread_mgmt01[1] = 0xffffffff;
> +	mqd->static_thread_mgmt23[0] = 0xffffffff;
> +	mqd->static_thread_mgmt23[1] = 0xffffffff;
> +
> +	mqd->queue_state.cp_mqd_base_addr = lower_32(queue->mqd_addr);
> +	mqd->queue_state.cp_mqd_base_addr_hi = upper_32(queue->mqd_addr);
> +	mqd->queue_state.cp_mqd_control = MQD_CONTROL_PRIV_STATE_EN;
> +
> +	mqd->queue_state.cp_hqd_pq_base = lower_32((uintptr_t)queue->pq_addr >> 8);
> +	mqd->queue_state.cp_hqd_pq_base_hi = upper_32((uintptr_t)queue->pq_addr >> 8);
> +	mqd->queue_state.cp_hqd_pq_control = QUEUE_SIZE(queue->queue_size_encoded) | DEFAULT_RPTR_BLOCK_SIZE
> +					    | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uintptr_t)queue->rptr_address);
> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uintptr_t)queue->rptr_address);
> +	mqd->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(queue->doorbell_index) | DOORBELL_EN;
> +	mqd->queue_state.cp_hqd_vmid = process->vmid;
> +	mqd->queue_state.cp_hqd_active = 1;
> +
> +	mqd->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
> +
> +	/* The values for these 3 are from WinKFD. */
> +	mqd->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
> +	mqd->queue_state.cp_hqd_pipe_priority = 1;
> +	mqd->queue_state.cp_hqd_queue_priority = 15;
> +
> +	mqd->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
> +}
> +
> +/* Write the HQD registers and activate the queue.
> + * Requires that SRBM_GFX_CNTL has already been programmed for the queue.
> + */
> +static void load_hqd(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	struct kfd_dev *dev = priv->dev;
> +	const struct cik_hqd_registers *qs = &queue->mqd->queue_state;
> +
> +	WRITE_REG(dev, CP_MQD_BASE_ADDR, qs->cp_mqd_base_addr);
> +	WRITE_REG(dev, CP_MQD_BASE_ADDR_HI, qs->cp_mqd_base_addr_hi);
> +	WRITE_REG(dev, CP_MQD_CONTROL, qs->cp_mqd_control);
> +
> +	WRITE_REG(dev, CP_HQD_PQ_BASE, qs->cp_hqd_pq_base);
> +	WRITE_REG(dev, CP_HQD_PQ_BASE_HI, qs->cp_hqd_pq_base_hi);
> +	WRITE_REG(dev, CP_HQD_PQ_CONTROL, qs->cp_hqd_pq_control);
> +	/* DOORBELL_CONTROL before WPTR because WPTR writes are dropped if DOORBELL_HIT is set. */
> +	WRITE_REG(dev, CP_HQD_PQ_DOORBELL_CONTROL, qs->cp_hqd_pq_doorbell_control);
> +	WRITE_REG(dev, CP_HQD_PQ_WPTR, qs->cp_hqd_pq_wptr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR, qs->cp_hqd_pq_rptr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR, qs->cp_hqd_pq_rptr_report_addr);
> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, qs->cp_hqd_pq_rptr_report_addr_hi);
> +
> +	WRITE_REG(dev, CP_HQD_VMID, qs->cp_hqd_vmid);
> +	WRITE_REG(dev, CP_HQD_PERSISTENT_STATE, qs->cp_hqd_persistent_state);
> +	WRITE_REG(dev, CP_HQD_QUANTUM, qs->cp_hqd_quantum);
> +	WRITE_REG(dev, CP_HQD_PIPE_PRIORITY, qs->cp_hqd_pipe_priority);
> +	WRITE_REG(dev, CP_HQD_QUEUE_PRIORITY, qs->cp_hqd_queue_priority);
> +
> +	WRITE_REG(dev, CP_HQD_IB_CONTROL, qs->cp_hqd_ib_control);
> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR, qs->cp_hqd_ib_base_addr);
> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR_HI, qs->cp_hqd_ib_base_addr_hi);
> +	WRITE_REG(dev, CP_HQD_IB_RPTR, qs->cp_hqd_ib_rptr);
> +	WRITE_REG(dev, CP_HQD_SEMA_CMD, qs->cp_hqd_sema_cmd);
> +	WRITE_REG(dev, CP_HQD_MSG_TYPE, qs->cp_hqd_msg_type);
> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_LO, qs->cp_hqd_atomic0_preop_lo);
> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_HI, qs->cp_hqd_atomic0_preop_hi);
> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_LO, qs->cp_hqd_atomic1_preop_lo);
> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_HI, qs->cp_hqd_atomic1_preop_hi);
> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER0, qs->cp_hqd_hq_scheduler0);
> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER1, qs->cp_hqd_hq_scheduler1);
> +
> +	WRITE_REG(dev, CP_HQD_ACTIVE, 1);
> +}
> +
> +static void activate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	bool wptr_shadow_valid;
> +	doorbell_t wptr_shadow;
> +
> +	/* Avoid sleeping while holding the SRBM lock. */
> +	wptr_shadow_valid = !get_user(wptr_shadow, queue->wptr_address);
> +
> +	lock_srbm_index(priv);
> +	queue_select(priv, queue->queue);
> +
> +	load_hqd(priv, queue);
> +
> +	/* Doorbell and wptr are special because there is a race when reactivating a queue.
> +	 * Since doorbell writes to deactivated queues are ignored by hardware, the application
> +	 * shadows the doorbell into memory at queue->wptr_address.
> +	 *
> +	 * We want the queue to automatically resume processing as if it were always active,
> +	 * so we want to copy from queue->wptr_address into the wptr/doorbell.
> +	 *
> +	 * The race is that the app could write a new wptr into the doorbell before we
> +	 * write the shadowed wptr, resulting in an old wptr written later.
> +	 *
> +	 * The hardware solves this ignoring CP_HQD_WPTR writes after a doorbell write.
> +	 * So the KFD can activate the doorbell then write the shadow wptr to CP_HQD_WPTR
> +	 * knowing it will be ignored if the user has written a more-recent doorbell.
> +	 */
> +	if (wptr_shadow_valid)
> +		WRITE_REG(priv->dev, CP_HQD_PQ_WPTR, wptr_shadow);
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +static void drain_hqd(struct cik_static_private *priv)
> +{
> +	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
> +}
> +
> +static void wait_hqd_inactive(struct cik_static_private *priv)
> +{
> +	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
> +		cpu_relax();
> +}
> +
> +static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
> +{
> +	lock_srbm_index(priv);
> +	queue_select(priv, queue->queue);
> +
> +	drain_hqd(priv);
> +	wait_hqd_inactive(priv);
> +
> +	unlock_srbm_index(priv);
> +}
> +
> +#define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
> +#define RING_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 8))
> +#define RWPTR_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 2))
> +
> +#define MAX_QUEUE_SIZE (1ULL << 32)
> +#define MIN_QUEUE_SIZE (1ULL << 10)
> +
> +static int
> +cik_static_create_queue(struct kfd_scheduler *scheduler,
> +			struct kfd_scheduler_process *process,
> +			struct kfd_scheduler_queue *queue,
> +			void __user *ring_address,
> +			uint64_t ring_size,
> +			void __user *rptr_address,
> +			void __user *wptr_address,
> +			unsigned int doorbell)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_process *hwp = kfd_process_to_private(process);
> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
> +
> +	if ((uint64_t)ring_address & RING_ADDRESS_BAD_BIT_MASK
> +	    || (uint64_t)rptr_address & RWPTR_ADDRESS_BAD_BIT_MASK
> +	    || (uint64_t)wptr_address & RWPTR_ADDRESS_BAD_BIT_MASK)
> +		return -EINVAL;
> +
> +	if (ring_size > MAX_QUEUE_SIZE || ring_size < MIN_QUEUE_SIZE || !is_power_of_2(ring_size))
> +		return -EINVAL;
> +
> +	if (!allocate_hqd(priv, &hwq->queue))
> +		return -ENOMEM;
> +
> +	hwq->mqd_addr = priv->mqd_addr + sizeof(struct cik_mqd_padded) * hwq->queue;
> +	hwq->mqd = &priv->mqds[hwq->queue].mqd;
> +	hwq->pq_addr = ring_address;
> +	hwq->rptr_address = rptr_address;
> +	hwq->wptr_address = wptr_address;
> +	hwq->doorbell_index = doorbell;
> +	hwq->queue_size_encoded = ilog2(ring_size) - 3;
> +
> +	init_mqd(hwq, hwp);
> +	activate_queue(priv, hwq);
> +
> +	return 0;
> +}
> +
> +static void
> +cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_queue *queue)
> +{
> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
> +
> +	deactivate_queue(priv, hwq);
> +
> +	release_hqd(priv, hwq->queue);
> +}
> +
> +const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
> +	.name = "CIK static scheduler",
> +	.create = cik_static_create,
> +	.destroy = cik_static_destroy,
> +	.start = cik_static_start,
> +	.stop = cik_static_stop,
> +	.register_process = cik_static_register_process,
> +	.deregister_process = cik_static_deregister_process,
> +	.queue_size = sizeof(struct cik_static_queue),
> +	.create_queue = cik_static_create_queue,
> +	.destroy_queue = cik_static_destroy_queue,
> +};
> diff --git a/drivers/gpu/hsa/radeon/kfd_vidmem.c b/drivers/gpu/hsa/radeon/kfd_vidmem.c
> new file mode 100644
> index 0000000..c8d3770
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_vidmem.c
> @@ -0,0 +1,61 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include "kfd_priv.h"
> +
> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj)
> +{
> +	return kfd2kgd->allocate_mem(kfd->kgd,
> +					size,
> +					alignment,
> +					(enum kgd_memory_pool)pool,
> +					(struct kgd_mem **)mem_obj);
> +}
> +
> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
> +				uint64_t *vmid0_address)
> +{
> +	return kfd2kgd->gpumap_mem(kfd->kgd,
> +					(struct kgd_mem *)mem_obj,
> +					vmid0_address);
> +}
> +
> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> +
> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
> +{
> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
> +}
> +
> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
> +{
> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
> +}
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 18:10         ` Jerome Glisse
@ 2014-07-11 18:46           ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:46 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, David Airlie, Deucher, Alexander, linux-kernel,
	dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay, Oded,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 2:11 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >Sent: Friday, July 11, 2014 1:04 PM
>> >To: Oded Gabbay
>> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp
>> >Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >>
>> >> This driver is called kfd.
>> >>
>> >> This initial version supports the first HSA chip, Kaveri.
>> >>
>> >> This driver is located in a new directory structure under drivers/gpu.
>> >>
>> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> >
>> >There is too coding style issues. While we have been lax on the
>> >enforcing the scripts/checkpatch.pl rules i think there is a limit to
>> >that. I am not strict on the 80chars per line but others things needs fixing
>so we stay inline.
>> >
>> >Also i am a bit worried about the license, given top comment in each
>> >of the files i am not sure this is GPL2 compatible. I would need to
>> >ask lawyer to review that.
>> >
>>
>> Hi Jerome,
>>
>> Which line in the license are you concerned about ? In theory we're using
>the same license as the initial code pushes for radeon, and I just did a side-by
>side compare with the license header on cik.c in the radeon tree and
>confirmed that the two licenses are identical.
>>
>> The cik.c header has an additional "Authors:" line which the kfd files do
>not, but AFAIK that is not part of the license text proper.
>>
>
>You can not claim GPL if you want to use this license. radeon is weird best for
>historical reasons as we wanted to share code with BSD thus it is dual
>licensed and this is reflected with :
>MODULE_LICENSE("GPL and additional rights");
>
>inside radeon_drv.c
>
>So if you want to have MODULE_LICENSE(GPL) then you should have header
>that use the GPL license wording and no wording from BSD like license.
>Otherwise change the MODULE_LICENSE and it would also be good to say
>dual licensed at top of each files (or least next to each license) so that it is
>clear this is BSD & GPL license.

Got it. Missed that we had a different MODULE_LICENSE.

Since the goal is license compatibility with radeon so we can update the interface and move code between the drivers in future I guess my preference would be to update MODULE_LICENSE in the kfd code to "GPL and additional rights", do you think that would be OK ?
>
>Cheers,
>Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 18:46           ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:46 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 2:11 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >Sent: Friday, July 11, 2014 1:04 PM
>> >To: Oded Gabbay
>> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp
>> >Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >>
>> >> This driver is called kfd.
>> >>
>> >> This initial version supports the first HSA chip, Kaveri.
>> >>
>> >> This driver is located in a new directory structure under drivers/gpu.
>> >>
>> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> >
>> >There is too coding style issues. While we have been lax on the
>> >enforcing the scripts/checkpatch.pl rules i think there is a limit to
>> >that. I am not strict on the 80chars per line but others things needs fixing
>so we stay inline.
>> >
>> >Also i am a bit worried about the license, given top comment in each
>> >of the files i am not sure this is GPL2 compatible. I would need to
>> >ask lawyer to review that.
>> >
>>
>> Hi Jerome,
>>
>> Which line in the license are you concerned about ? In theory we're using
>the same license as the initial code pushes for radeon, and I just did a side-by
>side compare with the license header on cik.c in the radeon tree and
>confirmed that the two licenses are identical.
>>
>> The cik.c header has an additional "Authors:" line which the kfd files do
>not, but AFAIK that is not part of the license text proper.
>>
>
>You can not claim GPL if you want to use this license. radeon is weird best for
>historical reasons as we wanted to share code with BSD thus it is dual
>licensed and this is reflected with :
>MODULE_LICENSE("GPL and additional rights");
>
>inside radeon_drv.c
>
>So if you want to have MODULE_LICENSE(GPL) then you should have header
>that use the GPL license wording and no wording from BSD like license.
>Otherwise change the MODULE_LICENSE and it would also be good to say
>dual licensed at top of each files (or least next to each license) so that it is
>clear this is BSD & GPL license.

Got it. Missed that we had a different MODULE_LICENSE.

Since the goal is license compatibility with radeon so we can update the interface and move code between the drivers in future I guess my preference would be to update MODULE_LICENSE in the kfd code to "GPL and additional rights", do you think that would be OK ?
>
>Cheers,
>Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 12/83] hsa/radeon: Add kfd mmap handler
  2014-07-10 21:50 ` [PATCH 12/83] hsa/radeon: Add kfd mmap handler Oded Gabbay
@ 2014-07-11 18:47     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:47 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay

On Fri, Jul 11, 2014 at 12:50:12AM +0300, Oded Gabbay wrote:
> This patch adds the kfd mmap handler that maps the physical address
> of a doorbell page to a user-space virtual address. That virtual address
> belongs to the process that uses the doorbell page.
> 
> This mmap handler is called only from within the kernel and not to be
> called from user-mode mmap of /dev/kfd.

I think you need to modify max doorbell to be function of page size.
You definitly want to forbid any access to other process doorbell and
you can only map page with PAGE_SIZE granularity hence you need to
modulate the max number of doorbell depending on page size and not
assume page size is 4k on x86. Someone might build a kernel with
different page size and if it wants to use this driver it will open
several security issues.

Cheers,
Jérôme

> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 20 +++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c | 85 +++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+)
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 7a56a8f..0b5bc74 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -39,6 +39,7 @@ static const struct file_operations kfd_fops = {
>  	.owner = THIS_MODULE,
>  	.unlocked_ioctl = kfd_ioctl,
>  	.open = kfd_open,
> +	.mmap = kfd_mmap,
>  };
>  
>  static int kfd_char_dev_major = -1;
> @@ -131,3 +132,22 @@ kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  
>  	return err;
>  }
> +
> +static int
> +kfd_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	unsigned long pgoff = vma->vm_pgoff;
> +	struct kfd_process *process;
> +
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	if (pgoff < KFD_MMAP_DOORBELL_START)
> +		return -EINVAL;
> +
> +	if (pgoff < KFD_MMAP_DOORBELL_END)
> +		return radeon_kfd_doorbell_mmap(process, vma);
> +
> +	return -EINVAL;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index 79a9d4b..e1d8506 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -70,3 +70,88 @@ void radeon_kfd_doorbell_init(struct kfd_dev *kfd)
>  	kfd->doorbell_process_limit = doorbell_process_limit;
>  }
>  
> +/* This is the /dev/kfd mmap (for doorbell) implementation. We intend that this is only called through map_doorbells,
> +** not through user-mode mmap of /dev/kfd. */
> +int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
> +{
> +	unsigned int device_index;
> +	struct kfd_dev *dev;
> +	phys_addr_t start;
> +
> +	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
> +
> +	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
> +	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
> +		return -EINVAL;
> +
> +	/* device_index must be GPU ID!! */
> +	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
> +
> +	dev = radeon_kfd_device_by_id(device_index);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +
> +	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
> +
> +	pr_debug("kfd: mapping doorbell page in radeon_kfd_doorbell_mmap\n"
> +		 "     target user address == 0x%016llX\n"
> +		 "     physical address    == 0x%016llX\n"
> +		 "     vm_flags            == 0x%08lX\n"
> +		 "     size                == 0x%08lX\n",
> +		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
> +		 doorbell_process_allocation());
> +
> +	return io_remap_pfn_range(vma,
> +				vma->vm_start,
> +				start >> PAGE_SHIFT,
> +				doorbell_process_allocation(),
> +				vma->vm_page_prot);
> +}
> +
> +/* Map the doorbells for a single process & device. This will indirectly call radeon_kfd_doorbell_mmap.
> +** This assumes that the process mutex is being held. */
> +static int
> +map_doorbells(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev)
> +{
> +	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, process);
> +
> +	if (pdd == NULL)
> +		return -ENOMEM;
> +
> +	if (pdd->doorbell_mapping == NULL) {
> +		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
> +		doorbell_t __user *doorbell_mapping;
> +
> +		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
> +								MAP_SHARED, offset);
> +		if (IS_ERR(doorbell_mapping))
> +			return PTR_ERR(doorbell_mapping);
> +
> +		pdd->doorbell_mapping = doorbell_mapping;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Get the user-mode address of a doorbell. Assumes that the process mutex is being held. */
> +doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
> +					   unsigned int doorbell_index)
> +{
> +	struct kfd_process_device *pdd;
> +	int err;
> +
> +	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
> +
> +	err = map_doorbells(devkfd, process, dev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, process);
> +	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
> +
> +	return &pdd->doorbell_mapping[doorbell_index];
> +}
> +
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 12/83] hsa/radeon: Add kfd mmap handler
@ 2014-07-11 18:47     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:47 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:12AM +0300, Oded Gabbay wrote:
> This patch adds the kfd mmap handler that maps the physical address
> of a doorbell page to a user-space virtual address. That virtual address
> belongs to the process that uses the doorbell page.
> 
> This mmap handler is called only from within the kernel and not to be
> called from user-mode mmap of /dev/kfd.

I think you need to modify max doorbell to be function of page size.
You definitly want to forbid any access to other process doorbell and
you can only map page with PAGE_SIZE granularity hence you need to
modulate the max number of doorbell depending on page size and not
assume page size is 4k on x86. Someone might build a kernel with
different page size and if it wants to use this driver it will open
several security issues.

Cheers,
Jérôme

> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 20 +++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c | 85 +++++++++++++++++++++++++++++++++++
>  2 files changed, 105 insertions(+)
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 7a56a8f..0b5bc74 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -39,6 +39,7 @@ static const struct file_operations kfd_fops = {
>  	.owner = THIS_MODULE,
>  	.unlocked_ioctl = kfd_ioctl,
>  	.open = kfd_open,
> +	.mmap = kfd_mmap,
>  };
>  
>  static int kfd_char_dev_major = -1;
> @@ -131,3 +132,22 @@ kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  
>  	return err;
>  }
> +
> +static int
> +kfd_mmap(struct file *filp, struct vm_area_struct *vma)
> +{
> +	unsigned long pgoff = vma->vm_pgoff;
> +	struct kfd_process *process;
> +
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
> +	if (pgoff < KFD_MMAP_DOORBELL_START)
> +		return -EINVAL;
> +
> +	if (pgoff < KFD_MMAP_DOORBELL_END)
> +		return radeon_kfd_doorbell_mmap(process, vma);
> +
> +	return -EINVAL;
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index 79a9d4b..e1d8506 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -70,3 +70,88 @@ void radeon_kfd_doorbell_init(struct kfd_dev *kfd)
>  	kfd->doorbell_process_limit = doorbell_process_limit;
>  }
>  
> +/* This is the /dev/kfd mmap (for doorbell) implementation. We intend that this is only called through map_doorbells,
> +** not through user-mode mmap of /dev/kfd. */
> +int radeon_kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
> +{
> +	unsigned int device_index;
> +	struct kfd_dev *dev;
> +	phys_addr_t start;
> +
> +	BUG_ON(vma->vm_pgoff < KFD_MMAP_DOORBELL_START || vma->vm_pgoff >= KFD_MMAP_DOORBELL_END);
> +
> +	/* For simplicitly we only allow mapping of the entire doorbell allocation of a single device & process. */
> +	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
> +		return -EINVAL;
> +
> +	/* device_index must be GPU ID!! */
> +	device_index = vma->vm_pgoff - KFD_MMAP_DOORBELL_START;
> +
> +	dev = radeon_kfd_device_by_id(device_index);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	vma->vm_flags |= VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE | VM_DONTDUMP | VM_PFNMAP;
> +	vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
> +
> +	start = dev->doorbell_base + process->pasid * doorbell_process_allocation();
> +
> +	pr_debug("kfd: mapping doorbell page in radeon_kfd_doorbell_mmap\n"
> +		 "     target user address == 0x%016llX\n"
> +		 "     physical address    == 0x%016llX\n"
> +		 "     vm_flags            == 0x%08lX\n"
> +		 "     size                == 0x%08lX\n",
> +		 (long long unsigned int) vma->vm_start, start, vma->vm_flags,
> +		 doorbell_process_allocation());
> +
> +	return io_remap_pfn_range(vma,
> +				vma->vm_start,
> +				start >> PAGE_SHIFT,
> +				doorbell_process_allocation(),
> +				vma->vm_page_prot);
> +}
> +
> +/* Map the doorbells for a single process & device. This will indirectly call radeon_kfd_doorbell_mmap.
> +** This assumes that the process mutex is being held. */
> +static int
> +map_doorbells(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev)
> +{
> +	struct kfd_process_device *pdd = radeon_kfd_get_process_device_data(dev, process);
> +
> +	if (pdd == NULL)
> +		return -ENOMEM;
> +
> +	if (pdd->doorbell_mapping == NULL) {
> +		unsigned long offset = (KFD_MMAP_DOORBELL_START + dev->id) << PAGE_SHIFT;
> +		doorbell_t __user *doorbell_mapping;
> +
> +		doorbell_mapping = (doorbell_t __user *)vm_mmap(devkfd, 0, doorbell_process_allocation(), PROT_WRITE,
> +								MAP_SHARED, offset);
> +		if (IS_ERR(doorbell_mapping))
> +			return PTR_ERR(doorbell_mapping);
> +
> +		pdd->doorbell_mapping = doorbell_mapping;
> +	}
> +
> +	return 0;
> +}
> +
> +/* Get the user-mode address of a doorbell. Assumes that the process mutex is being held. */
> +doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_process *process, struct kfd_dev *dev,
> +					   unsigned int doorbell_index)
> +{
> +	struct kfd_process_device *pdd;
> +	int err;
> +
> +	BUG_ON(doorbell_index > MAX_DOORBELL_INDEX);
> +
> +	err = map_doorbells(devkfd, process, dev);
> +	if (err)
> +		return ERR_PTR(err);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, process);
> +	BUG_ON(pdd == NULL); /* map_doorbells would have failed otherwise */
> +
> +	return &pdd->doorbell_mapping[doorbell_index];
> +}
> +
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 18:46           ` Bridgman, John
@ 2014-07-11 18:51             ` Jerome Glisse
  -1 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:51 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, David Airlie, Deucher, Alexander, linux-kernel,
	dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay, Oded,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel

On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 2:11 PM
> >To: Bridgman, John
> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
> >Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
> >Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >Sent: Friday, July 11, 2014 1:04 PM
> >> >To: Oded Gabbay
> >> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp
> >> >Zabel
> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >for AMD's GPUs
> >> >
> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
> >> >>
> >> >> This driver is called kfd.
> >> >>
> >> >> This initial version supports the first HSA chip, Kaveri.
> >> >>
> >> >> This driver is located in a new directory structure under drivers/gpu.
> >> >>
> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >> >
> >> >There is too coding style issues. While we have been lax on the
> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit to
> >> >that. I am not strict on the 80chars per line but others things needs fixing
> >so we stay inline.
> >> >
> >> >Also i am a bit worried about the license, given top comment in each
> >> >of the files i am not sure this is GPL2 compatible. I would need to
> >> >ask lawyer to review that.
> >> >
> >>
> >> Hi Jerome,
> >>
> >> Which line in the license are you concerned about ? In theory we're using
> >the same license as the initial code pushes for radeon, and I just did a side-by
> >side compare with the license header on cik.c in the radeon tree and
> >confirmed that the two licenses are identical.
> >>
> >> The cik.c header has an additional "Authors:" line which the kfd files do
> >not, but AFAIK that is not part of the license text proper.
> >>
> >
> >You can not claim GPL if you want to use this license. radeon is weird best for
> >historical reasons as we wanted to share code with BSD thus it is dual
> >licensed and this is reflected with :
> >MODULE_LICENSE("GPL and additional rights");
> >
> >inside radeon_drv.c
> >
> >So if you want to have MODULE_LICENSE(GPL) then you should have header
> >that use the GPL license wording and no wording from BSD like license.
> >Otherwise change the MODULE_LICENSE and it would also be good to say
> >dual licensed at top of each files (or least next to each license) so that it is
> >clear this is BSD & GPL license.
> 
> Got it. Missed that we had a different MODULE_LICENSE.
> 
> Since the goal is license compatibility with radeon so we can update the interface and move code between the drivers in future I guess my preference would be to update MODULE_LICENSE in the kfd code to "GPL and additional rights", do you think that would be OK ?

I am not a lawyer and nothing that i said should be considered as legal
advice (on the contrary ;)) I think you need to be more clear with each
license to clear says GPLv2 or BSD ie dual licensed but the dual license
is a beast you would definitly want to talk to lawyer about.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 18:51             ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 18:51 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander

On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 2:11 PM
> >To: Bridgman, John
> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
> >Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
> >Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >Sent: Friday, July 11, 2014 1:04 PM
> >> >To: Oded Gabbay
> >> >Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp
> >> >Zabel
> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >for AMD's GPUs
> >> >
> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
> >> >>
> >> >> This driver is called kfd.
> >> >>
> >> >> This initial version supports the first HSA chip, Kaveri.
> >> >>
> >> >> This driver is located in a new directory structure under drivers/gpu.
> >> >>
> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >> >
> >> >There is too coding style issues. While we have been lax on the
> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit to
> >> >that. I am not strict on the 80chars per line but others things needs fixing
> >so we stay inline.
> >> >
> >> >Also i am a bit worried about the license, given top comment in each
> >> >of the files i am not sure this is GPL2 compatible. I would need to
> >> >ask lawyer to review that.
> >> >
> >>
> >> Hi Jerome,
> >>
> >> Which line in the license are you concerned about ? In theory we're using
> >the same license as the initial code pushes for radeon, and I just did a side-by
> >side compare with the license header on cik.c in the radeon tree and
> >confirmed that the two licenses are identical.
> >>
> >> The cik.c header has an additional "Authors:" line which the kfd files do
> >not, but AFAIK that is not part of the license text proper.
> >>
> >
> >You can not claim GPL if you want to use this license. radeon is weird best for
> >historical reasons as we wanted to share code with BSD thus it is dual
> >licensed and this is reflected with :
> >MODULE_LICENSE("GPL and additional rights");
> >
> >inside radeon_drv.c
> >
> >So if you want to have MODULE_LICENSE(GPL) then you should have header
> >that use the GPL license wording and no wording from BSD like license.
> >Otherwise change the MODULE_LICENSE and it would also be good to say
> >dual licensed at top of each files (or least next to each license) so that it is
> >clear this is BSD & GPL license.
> 
> Got it. Missed that we had a different MODULE_LICENSE.
> 
> Since the goal is license compatibility with radeon so we can update the interface and move code between the drivers in future I guess my preference would be to update MODULE_LICENSE in the kfd code to "GPL and additional rights", do you think that would be OK ?

I am not a lawyer and nothing that i said should be considered as legal
advice (on the contrary ;)) I think you need to be more clear with each
license to clear says GPLv2 or BSD ie dual licensed but the dual license
is a beast you would definitly want to talk to lawyer about.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 18:51             ` Jerome Glisse
@ 2014-07-11 18:56               ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:56 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, David Airlie, Deucher, Alexander, linux-kernel,
	dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay, Oded,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 2:52 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >Sent: Friday, July 11, 2014 2:11 PM
>> >To: Bridgman, John
>> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
>> >Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
>> >Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
>> >Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >Philipp Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >> >Sent: Friday, July 11, 2014 1:04 PM
>> >> >To: Oded Gabbay
>> >> >Cc: David Airlie; Deucher, Alexander;
>> >> >linux-kernel@vger.kernel.org;
>> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >> >Philipp Zabel
>> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >> >for AMD's GPUs
>> >> >
>> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >> >>
>> >> >> This driver is called kfd.
>> >> >>
>> >> >> This initial version supports the first HSA chip, Kaveri.
>> >> >>
>> >> >> This driver is located in a new directory structure under drivers/gpu.
>> >> >>
>> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> >> >
>> >> >There is too coding style issues. While we have been lax on the
>> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit
>> >> >to that. I am not strict on the 80chars per line but others things
>> >> >needs fixing
>> >so we stay inline.
>> >> >
>> >> >Also i am a bit worried about the license, given top comment in
>> >> >each of the files i am not sure this is GPL2 compatible. I would
>> >> >need to ask lawyer to review that.
>> >> >
>> >>
>> >> Hi Jerome,
>> >>
>> >> Which line in the license are you concerned about ? In theory we're
>> >> using
>> >the same license as the initial code pushes for radeon, and I just
>> >did a side-by side compare with the license header on cik.c in the
>> >radeon tree and confirmed that the two licenses are identical.
>> >>
>> >> The cik.c header has an additional "Authors:" line which the kfd
>> >> files do
>> >not, but AFAIK that is not part of the license text proper.
>> >>
>> >
>> >You can not claim GPL if you want to use this license. radeon is
>> >weird best for historical reasons as we wanted to share code with BSD
>> >thus it is dual licensed and this is reflected with :
>> >MODULE_LICENSE("GPL and additional rights");
>> >
>> >inside radeon_drv.c
>> >
>> >So if you want to have MODULE_LICENSE(GPL) then you should have
>> >header that use the GPL license wording and no wording from BSD like
>license.
>> >Otherwise change the MODULE_LICENSE and it would also be good to say
>> >dual licensed at top of each files (or least next to each license) so
>> >that it is clear this is BSD & GPL license.
>>
>> Got it. Missed that we had a different MODULE_LICENSE.
>>
>> Since the goal is license compatibility with radeon so we can update the
>interface and move code between the drivers in future I guess my
>preference would be to update MODULE_LICENSE in the kfd code to "GPL and
>additional rights", do you think that would be OK ?
>
>I am not a lawyer and nothing that i said should be considered as legal advice
>(on the contrary ;)) I think you need to be more clear with each license to
>clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
>would definitly want to talk to lawyer about.

Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us. 

Meditate on this I will. Thanks !

>
>Cheers,
>Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 18:56               ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 18:56 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander



>-----Original Message-----
>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>Sent: Friday, July 11, 2014 2:52 PM
>To: Bridgman, John
>Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>AMD's GPUs
>
>On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
>> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >Sent: Friday, July 11, 2014 2:11 PM
>> >To: Bridgman, John
>> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
>> >Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
>> >Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
>> >Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >Philipp Zabel
>> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >for AMD's GPUs
>> >
>> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
>> >> >Sent: Friday, July 11, 2014 1:04 PM
>> >> >To: Oded Gabbay
>> >> >Cc: David Airlie; Deucher, Alexander;
>> >> >linux-kernel@vger.kernel.org;
>> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>> >> >Philipp Zabel
>> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>> >> >for AMD's GPUs
>> >> >
>> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
>> >> >>
>> >> >> This driver is called kfd.
>> >> >>
>> >> >> This initial version supports the first HSA chip, Kaveri.
>> >> >>
>> >> >> This driver is located in a new directory structure under drivers/gpu.
>> >> >>
>> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>> >> >
>> >> >There is too coding style issues. While we have been lax on the
>> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit
>> >> >to that. I am not strict on the 80chars per line but others things
>> >> >needs fixing
>> >so we stay inline.
>> >> >
>> >> >Also i am a bit worried about the license, given top comment in
>> >> >each of the files i am not sure this is GPL2 compatible. I would
>> >> >need to ask lawyer to review that.
>> >> >
>> >>
>> >> Hi Jerome,
>> >>
>> >> Which line in the license are you concerned about ? In theory we're
>> >> using
>> >the same license as the initial code pushes for radeon, and I just
>> >did a side-by side compare with the license header on cik.c in the
>> >radeon tree and confirmed that the two licenses are identical.
>> >>
>> >> The cik.c header has an additional "Authors:" line which the kfd
>> >> files do
>> >not, but AFAIK that is not part of the license text proper.
>> >>
>> >
>> >You can not claim GPL if you want to use this license. radeon is
>> >weird best for historical reasons as we wanted to share code with BSD
>> >thus it is dual licensed and this is reflected with :
>> >MODULE_LICENSE("GPL and additional rights");
>> >
>> >inside radeon_drv.c
>> >
>> >So if you want to have MODULE_LICENSE(GPL) then you should have
>> >header that use the GPL license wording and no wording from BSD like
>license.
>> >Otherwise change the MODULE_LICENSE and it would also be good to say
>> >dual licensed at top of each files (or least next to each license) so
>> >that it is clear this is BSD & GPL license.
>>
>> Got it. Missed that we had a different MODULE_LICENSE.
>>
>> Since the goal is license compatibility with radeon so we can update the
>interface and move code between the drivers in future I guess my
>preference would be to update MODULE_LICENSE in the kfd code to "GPL and
>additional rights", do you think that would be OK ?
>
>I am not a lawyer and nothing that i said should be considered as legal advice
>(on the contrary ;)) I think you need to be more clear with each license to
>clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
>would definitly want to talk to lawyer about.

Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us. 

Meditate on this I will. Thanks !

>
>Cheers,
>Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
@ 2014-07-11 19:19     ` Jerome Glisse
  2014-07-11 21:01     ` Jerome Glisse
  2014-07-11 21:42     ` Dave Airlie
  2 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:19 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Alexey Skidanov, Ben Goz, Evgeny Pinchuk, linux-api

On Fri, Jul 11, 2014 at 12:50:13AM +0300, Oded Gabbay wrote:
> This patch adds 2 new IOCTL to kfd driver.
> 
> The first IOCTL is KFD_IOC_CREATE_QUEUE that is used by the user-mode
> application to create a compute queue on the GPU.
> 
> The second IOCTL is KFD_IOC_DESTROY_QUEUE that is used by the
> user-mode application to destroy an existing compute queue on the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

Coding style need fixing. What is the percent argument ? What is it use
for ?

You need to check range validity of argument provided by userspace. Rules
is never trust userspace. Especialy for things like queue_size which is
use without never being check allowing userspace to send 0 which leads
to broken queue size.

Also out of curiosity what kind of event happens if userspace munmap its
ring buffer before unregistering a queue ?

> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 155 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c |  11 +++
>  include/uapi/linux/kfd_ioctl.h        |  69 +++++++++++++++

Again better to create an hsa directory for kfd_ioctl.h

>  3 files changed, 235 insertions(+)
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 0b5bc74..4e7d5d0 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -27,11 +27,13 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
> +#include <uapi/linux/kfd_ioctl.h>
>  #include "kfd_priv.h"
>  #include "kfd_scheduler.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>  
>  static const char kfd_dev_name[] = "kfd";
>  
> @@ -108,17 +110,170 @@ kfd_open(struct inode *inode, struct file *filep)
>  	return 0;
>  }
>  
> +static long
> +kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_create_queue_args args;
> +	struct kfd_dev *dev;
> +	int err = 0;
> +	unsigned int queue_id;
> +	struct kfd_queue *queue;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	dev = radeon_kfd_device_by_id(args.gpu_id);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	queue = kzalloc(
> +		offsetof(struct kfd_queue, scheduler_queue) + dev->device_info->scheduler_class->queue_size,
> +		GFP_KERNEL);
> +
> +	if (!queue)
> +		return -ENOMEM;
> +
> +	queue->dev = dev;
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = radeon_kfd_bind_process_to_device(dev, p);
> +	if (IS_ERR(pdd) < 0) {
> +		err = PTR_ERR(pdd);
> +		goto err_bind_pasid;
> +	}
> +
> +	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
> +			pdd->queue_count,
> +			p->pasid,
> +			dev->id);
> +
> +	if (pdd->queue_count++ == 0) {
> +		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
> +		if (err < 0)
> +			goto err_register_process;
> +	}
> +
> +	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
> +		goto err_allocate_queue_id;
> +
> +	err = dev->device_info->scheduler_class->create_queue(dev->scheduler, pdd->scheduler_process,
> +							      &queue->scheduler_queue,
> +							      (void __user *)args.ring_base_address,
> +							      args.ring_size,
> +							      (void __user *)args.read_pointer_address,
> +							      (void __user *)args.write_pointer_address,
> +							      radeon_kfd_queue_id_to_doorbell(dev, p, queue_id));
> +	if (err)
> +		goto err_create_queue;
> +
> +	radeon_kfd_install_queue(p, queue_id, queue);
> +
> +	args.queue_id = queue_id;
> +	args.doorbell_address = (uint64_t)(uintptr_t)radeon_kfd_get_doorbell(filep, p, dev, queue_id);
> +
> +	if (copy_to_user(arg, &args, sizeof(args))) {
> +		err = -EFAULT;
> +		goto err_copy_args_out;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +
> +	pr_debug("kfd: queue id %d was created successfully.\n"
> +		 "     ring buffer address == 0x%016llX\n"
> +		 "     read ptr address    == 0x%016llX\n"
> +		 "     write ptr address   == 0x%016llX\n"
> +		 "     doorbell address    == 0x%016llX\n",
> +			args.queue_id,
> +			args.ring_base_address,
> +			args.read_pointer_address,
> +			args.write_pointer_address,
> +			args.doorbell_address);
> +
> +	return 0;
> +
> +err_copy_args_out:
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +err_create_queue:
> +	radeon_kfd_remove_queue(p, queue_id);
> +err_allocate_queue_id:
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +err_register_process:
> +err_bind_pasid:
> +	kfree(queue);
> +	mutex_unlock(&p->mutex);
> +	return err;
> +}
> +
> +static int
> +kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_destroy_queue_args args;
> +	struct kfd_queue *queue;
> +	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	mutex_lock(&p->mutex);
> +
> +	queue = radeon_kfd_get_queue(p, args.queue_id);
> +	if (!queue) {
> +		mutex_unlock(&p->mutex);
> +		return -EINVAL;
> +	}
> +
> +	dev = queue->dev;
> +
> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
> +			args.queue_id,
> +			p->pasid);
> +
> +	radeon_kfd_remove_queue(p, args.queue_id);
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +	kfree(queue);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +	BUG_ON(pdd == NULL); /* Because a queue exists. */
> +
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +	return 0;
> +}
>  
>  static long
>  kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
> +	struct kfd_process *process;
>  	long err = -EINVAL;
>  
>  	dev_info(kfd_device,
>  		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>  		 cmd, _IOC_NR(cmd), arg);
>  
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
>  	switch (cmd) {
> +	case KFD_IOC_CREATE_QUEUE:
> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_DESTROY_QUEUE:
> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
> +		break;
> +
>  	default:
>  		dev_err(kfd_device,
>  			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index e1d8506..3de8a02 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -155,3 +155,14 @@ doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_proce
>  	return &pdd->doorbell_mapping[doorbell_index];
>  }
>  
> +/*
> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
> + * to doorbells with the process's doorbell page
> + */
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
> +{
> +	/* doorbell_id_offset accounts for doorbells taken by KGD.
> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts to the process's doorbells */
> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
> +}
> +
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> new file mode 100644
> index 0000000..dcc5fe0
> --- /dev/null
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -0,0 +1,69 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_IOCTL_H_INCLUDED
> +#define KFD_IOCTL_H_INCLUDED
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define KFD_IOCTL_CURRENT_VERSION 1
> +
> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +
> +struct kfd_ioctl_get_version_args {
> +	uint32_t min_supported_version;	/* from KFD */
> +	uint32_t max_supported_version;	/* from KFD */
> +};
> +
> +/* For kfd_ioctl_create_queue_args.queue_type. */
> +#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
> +#define KFD_IOC_QUEUE_TYPE_SDMA      1
> +
> +struct kfd_ioctl_create_queue_args {
> +	uint64_t ring_base_address;	/* to KFD */
> +	uint32_t ring_size;		/* to KFD */
> +	uint32_t gpu_id;		/* to KFD */
> +	uint32_t queue_type;		/* to KFD */
> +	uint32_t queue_percentage;	/* to KFD */
> +	uint32_t queue_priority;	/* to KFD */
> +	uint64_t write_pointer_address;	/* to KFD */
> +	uint64_t read_pointer_address;	/* to KFD */
> +
> +	uint64_t doorbell_address;	/* from KFD */
> +	uint32_t queue_id;		/* from KFD */
> +};
> +
> +struct kfd_ioctl_destroy_queue_args {
> +	uint32_t queue_id;		/* to KFD */
> +};
> +
> +#define KFD_IOC_MAGIC 'K'
> +
> +#define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
> +#define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
> +#define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
> +
> +#pragma pack(pop)
> +
> +#endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
@ 2014-07-11 19:19     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:19 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Ben Goz, linux-kernel, dri-devel, Evgeny Pinchuk,
	Alexey Skidanov, linux-api, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:13AM +0300, Oded Gabbay wrote:
> This patch adds 2 new IOCTL to kfd driver.
> 
> The first IOCTL is KFD_IOC_CREATE_QUEUE that is used by the user-mode
> application to create a compute queue on the GPU.
> 
> The second IOCTL is KFD_IOC_DESTROY_QUEUE that is used by the
> user-mode application to destroy an existing compute queue on the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>

Coding style need fixing. What is the percent argument ? What is it use
for ?

You need to check range validity of argument provided by userspace. Rules
is never trust userspace. Especialy for things like queue_size which is
use without never being check allowing userspace to send 0 which leads
to broken queue size.

Also out of curiosity what kind of event happens if userspace munmap its
ring buffer before unregistering a queue ?

> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 155 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c |  11 +++
>  include/uapi/linux/kfd_ioctl.h        |  69 +++++++++++++++

Again better to create an hsa directory for kfd_ioctl.h

>  3 files changed, 235 insertions(+)
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 0b5bc74..4e7d5d0 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -27,11 +27,13 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
> +#include <uapi/linux/kfd_ioctl.h>
>  #include "kfd_priv.h"
>  #include "kfd_scheduler.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>  
>  static const char kfd_dev_name[] = "kfd";
>  
> @@ -108,17 +110,170 @@ kfd_open(struct inode *inode, struct file *filep)
>  	return 0;
>  }
>  
> +static long
> +kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_create_queue_args args;
> +	struct kfd_dev *dev;
> +	int err = 0;
> +	unsigned int queue_id;
> +	struct kfd_queue *queue;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	dev = radeon_kfd_device_by_id(args.gpu_id);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	queue = kzalloc(
> +		offsetof(struct kfd_queue, scheduler_queue) + dev->device_info->scheduler_class->queue_size,
> +		GFP_KERNEL);
> +
> +	if (!queue)
> +		return -ENOMEM;
> +
> +	queue->dev = dev;
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = radeon_kfd_bind_process_to_device(dev, p);
> +	if (IS_ERR(pdd) < 0) {
> +		err = PTR_ERR(pdd);
> +		goto err_bind_pasid;
> +	}
> +
> +	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
> +			pdd->queue_count,
> +			p->pasid,
> +			dev->id);
> +
> +	if (pdd->queue_count++ == 0) {
> +		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
> +		if (err < 0)
> +			goto err_register_process;
> +	}
> +
> +	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
> +		goto err_allocate_queue_id;
> +
> +	err = dev->device_info->scheduler_class->create_queue(dev->scheduler, pdd->scheduler_process,
> +							      &queue->scheduler_queue,
> +							      (void __user *)args.ring_base_address,
> +							      args.ring_size,
> +							      (void __user *)args.read_pointer_address,
> +							      (void __user *)args.write_pointer_address,
> +							      radeon_kfd_queue_id_to_doorbell(dev, p, queue_id));
> +	if (err)
> +		goto err_create_queue;
> +
> +	radeon_kfd_install_queue(p, queue_id, queue);
> +
> +	args.queue_id = queue_id;
> +	args.doorbell_address = (uint64_t)(uintptr_t)radeon_kfd_get_doorbell(filep, p, dev, queue_id);
> +
> +	if (copy_to_user(arg, &args, sizeof(args))) {
> +		err = -EFAULT;
> +		goto err_copy_args_out;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +
> +	pr_debug("kfd: queue id %d was created successfully.\n"
> +		 "     ring buffer address == 0x%016llX\n"
> +		 "     read ptr address    == 0x%016llX\n"
> +		 "     write ptr address   == 0x%016llX\n"
> +		 "     doorbell address    == 0x%016llX\n",
> +			args.queue_id,
> +			args.ring_base_address,
> +			args.read_pointer_address,
> +			args.write_pointer_address,
> +			args.doorbell_address);
> +
> +	return 0;
> +
> +err_copy_args_out:
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +err_create_queue:
> +	radeon_kfd_remove_queue(p, queue_id);
> +err_allocate_queue_id:
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +err_register_process:
> +err_bind_pasid:
> +	kfree(queue);
> +	mutex_unlock(&p->mutex);
> +	return err;
> +}
> +
> +static int
> +kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_destroy_queue_args args;
> +	struct kfd_queue *queue;
> +	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	mutex_lock(&p->mutex);
> +
> +	queue = radeon_kfd_get_queue(p, args.queue_id);
> +	if (!queue) {
> +		mutex_unlock(&p->mutex);
> +		return -EINVAL;
> +	}
> +
> +	dev = queue->dev;
> +
> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
> +			args.queue_id,
> +			p->pasid);
> +
> +	radeon_kfd_remove_queue(p, args.queue_id);
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +	kfree(queue);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +	BUG_ON(pdd == NULL); /* Because a queue exists. */
> +
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +	return 0;
> +}
>  
>  static long
>  kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
> +	struct kfd_process *process;
>  	long err = -EINVAL;
>  
>  	dev_info(kfd_device,
>  		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>  		 cmd, _IOC_NR(cmd), arg);
>  
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
>  	switch (cmd) {
> +	case KFD_IOC_CREATE_QUEUE:
> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_DESTROY_QUEUE:
> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
> +		break;
> +
>  	default:
>  		dev_err(kfd_device,
>  			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index e1d8506..3de8a02 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -155,3 +155,14 @@ doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_proce
>  	return &pdd->doorbell_mapping[doorbell_index];
>  }
>  
> +/*
> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
> + * to doorbells with the process's doorbell page
> + */
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
> +{
> +	/* doorbell_id_offset accounts for doorbells taken by KGD.
> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts to the process's doorbells */
> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
> +}
> +
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> new file mode 100644
> index 0000000..dcc5fe0
> --- /dev/null
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -0,0 +1,69 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_IOCTL_H_INCLUDED
> +#define KFD_IOCTL_H_INCLUDED
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define KFD_IOCTL_CURRENT_VERSION 1
> +
> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +
> +struct kfd_ioctl_get_version_args {
> +	uint32_t min_supported_version;	/* from KFD */
> +	uint32_t max_supported_version;	/* from KFD */
> +};
> +
> +/* For kfd_ioctl_create_queue_args.queue_type. */
> +#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
> +#define KFD_IOC_QUEUE_TYPE_SDMA      1
> +
> +struct kfd_ioctl_create_queue_args {
> +	uint64_t ring_base_address;	/* to KFD */
> +	uint32_t ring_size;		/* to KFD */
> +	uint32_t gpu_id;		/* to KFD */
> +	uint32_t queue_type;		/* to KFD */
> +	uint32_t queue_percentage;	/* to KFD */
> +	uint32_t queue_priority;	/* to KFD */
> +	uint64_t write_pointer_address;	/* to KFD */
> +	uint64_t read_pointer_address;	/* to KFD */
> +
> +	uint64_t doorbell_address;	/* from KFD */
> +	uint32_t queue_id;		/* from KFD */
> +};
> +
> +struct kfd_ioctl_destroy_queue_args {
> +	uint32_t queue_id;		/* to KFD */
> +};
> +
> +#define KFD_IOC_MAGIC 'K'
> +
> +#define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
> +#define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
> +#define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
> +
> +#pragma pack(pop)
> +
> +#endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 18:56               ` Bridgman, John
@ 2014-07-11 19:22                 ` Jerome Glisse
  -1 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:22 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, David Airlie, Deucher, Alexander, linux-kernel,
	dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay, Oded,
	Greg Kroah-Hartman, Rafael J. Wysocki, Kishon Vijay Abraham I,
	Sandeep Nair, Kenneth Heitke, Srinivas Pandruvada,
	Santosh Shilimkar, Andreas Noever, Lucas Stach, Philipp Zabel

On Fri, Jul 11, 2014 at 06:56:12PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 2:52 PM
> >To: Bridgman, John
> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
> >Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
> >Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >Sent: Friday, July 11, 2014 2:11 PM
> >> >To: Bridgman, John
> >> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
> >> >Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
> >> >Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
> >> >Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
> >> >Philipp Zabel
> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >for AMD's GPUs
> >> >
> >> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >> >Sent: Friday, July 11, 2014 1:04 PM
> >> >> >To: Oded Gabbay
> >> >> >Cc: David Airlie; Deucher, Alexander;
> >> >> >linux-kernel@vger.kernel.org;
> >> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
> >> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
> >> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
> >> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
> >> >> >Philipp Zabel
> >> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >> >for AMD's GPUs
> >> >> >
> >> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
> >> >> >>
> >> >> >> This driver is called kfd.
> >> >> >>
> >> >> >> This initial version supports the first HSA chip, Kaveri.
> >> >> >>
> >> >> >> This driver is located in a new directory structure under drivers/gpu.
> >> >> >>
> >> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >> >> >
> >> >> >There is too coding style issues. While we have been lax on the
> >> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit
> >> >> >to that. I am not strict on the 80chars per line but others things
> >> >> >needs fixing
> >> >so we stay inline.
> >> >> >
> >> >> >Also i am a bit worried about the license, given top comment in
> >> >> >each of the files i am not sure this is GPL2 compatible. I would
> >> >> >need to ask lawyer to review that.
> >> >> >
> >> >>
> >> >> Hi Jerome,
> >> >>
> >> >> Which line in the license are you concerned about ? In theory we're
> >> >> using
> >> >the same license as the initial code pushes for radeon, and I just
> >> >did a side-by side compare with the license header on cik.c in the
> >> >radeon tree and confirmed that the two licenses are identical.
> >> >>
> >> >> The cik.c header has an additional "Authors:" line which the kfd
> >> >> files do
> >> >not, but AFAIK that is not part of the license text proper.
> >> >>
> >> >
> >> >You can not claim GPL if you want to use this license. radeon is
> >> >weird best for historical reasons as we wanted to share code with BSD
> >> >thus it is dual licensed and this is reflected with :
> >> >MODULE_LICENSE("GPL and additional rights");
> >> >
> >> >inside radeon_drv.c
> >> >
> >> >So if you want to have MODULE_LICENSE(GPL) then you should have
> >> >header that use the GPL license wording and no wording from BSD like
> >license.
> >> >Otherwise change the MODULE_LICENSE and it would also be good to say
> >> >dual licensed at top of each files (or least next to each license) so
> >> >that it is clear this is BSD & GPL license.
> >>
> >> Got it. Missed that we had a different MODULE_LICENSE.
> >>
> >> Since the goal is license compatibility with radeon so we can update the
> >interface and move code between the drivers in future I guess my
> >preference would be to update MODULE_LICENSE in the kfd code to "GPL and
> >additional rights", do you think that would be OK ?
> >
> >I am not a lawyer and nothing that i said should be considered as legal advice
> >(on the contrary ;)) I think you need to be more clear with each license to
> >clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
> >would definitly want to talk to lawyer about.
> 
> Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us. 
> 
> Meditate on this I will. Thanks !

Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
then this is a GPL licensed code, if you claim GPL with additional rights
than this is dual licensed code. This is how i read and interpret this
with additional rights. In all the case the radeon code is considered
dual license ie GPL+BSD (at least this is how i consider that code).

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 19:22                 ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:22 UTC (permalink / raw)
  To: Bridgman, John
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander

On Fri, Jul 11, 2014 at 06:56:12PM +0000, Bridgman, John wrote:
> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >Sent: Friday, July 11, 2014 2:52 PM
> >To: Bridgman, John
> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
> >Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
> >Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
> >AMD's GPUs
> >
> >On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >Sent: Friday, July 11, 2014 2:11 PM
> >> >To: Bridgman, John
> >> >Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
> >> >kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
> >> >Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
> >> >Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
> >> >Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
> >> >Philipp Zabel
> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >for AMD's GPUs
> >> >
> >> >On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
> >> >> >From: Jerome Glisse [mailto:j.glisse@gmail.com]
> >> >> >Sent: Friday, July 11, 2014 1:04 PM
> >> >> >To: Oded Gabbay
> >> >> >Cc: David Airlie; Deucher, Alexander;
> >> >> >linux-kernel@vger.kernel.org;
> >> >> >dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
> >> >> >Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
> >> >> >Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
> >> >> >Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
> >> >> >Philipp Zabel
> >> >> >Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
> >> >> >for AMD's GPUs
> >> >> >
> >> >> >On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> >> >> >> This patch adds the code base of the hsa driver for AMD's GPUs.
> >> >> >>
> >> >> >> This driver is called kfd.
> >> >> >>
> >> >> >> This initial version supports the first HSA chip, Kaveri.
> >> >> >>
> >> >> >> This driver is located in a new directory structure under drivers/gpu.
> >> >> >>
> >> >> >> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >> >> >
> >> >> >There is too coding style issues. While we have been lax on the
> >> >> >enforcing the scripts/checkpatch.pl rules i think there is a limit
> >> >> >to that. I am not strict on the 80chars per line but others things
> >> >> >needs fixing
> >> >so we stay inline.
> >> >> >
> >> >> >Also i am a bit worried about the license, given top comment in
> >> >> >each of the files i am not sure this is GPL2 compatible. I would
> >> >> >need to ask lawyer to review that.
> >> >> >
> >> >>
> >> >> Hi Jerome,
> >> >>
> >> >> Which line in the license are you concerned about ? In theory we're
> >> >> using
> >> >the same license as the initial code pushes for radeon, and I just
> >> >did a side-by side compare with the license header on cik.c in the
> >> >radeon tree and confirmed that the two licenses are identical.
> >> >>
> >> >> The cik.c header has an additional "Authors:" line which the kfd
> >> >> files do
> >> >not, but AFAIK that is not part of the license text proper.
> >> >>
> >> >
> >> >You can not claim GPL if you want to use this license. radeon is
> >> >weird best for historical reasons as we wanted to share code with BSD
> >> >thus it is dual licensed and this is reflected with :
> >> >MODULE_LICENSE("GPL and additional rights");
> >> >
> >> >inside radeon_drv.c
> >> >
> >> >So if you want to have MODULE_LICENSE(GPL) then you should have
> >> >header that use the GPL license wording and no wording from BSD like
> >license.
> >> >Otherwise change the MODULE_LICENSE and it would also be good to say
> >> >dual licensed at top of each files (or least next to each license) so
> >> >that it is clear this is BSD & GPL license.
> >>
> >> Got it. Missed that we had a different MODULE_LICENSE.
> >>
> >> Since the goal is license compatibility with radeon so we can update the
> >interface and move code between the drivers in future I guess my
> >preference would be to update MODULE_LICENSE in the kfd code to "GPL and
> >additional rights", do you think that would be OK ?
> >
> >I am not a lawyer and nothing that i said should be considered as legal advice
> >(on the contrary ;)) I think you need to be more clear with each license to
> >clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
> >would definitly want to talk to lawyer about.
> 
> Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us. 
> 
> Meditate on this I will. Thanks !

Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
then this is a GPL licensed code, if you claim GPL with additional rights
than this is dual licensed code. This is how i read and interpret this
with additional rights. In all the case the radeon code is considered
dual license ie GPL+BSD (at least this is how i consider that code).

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 19:22                 ` Jerome Glisse
@ 2014-07-11 19:38                   ` Joe Perches
  -1 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-11 19:38 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Bridgman, John, Oded Gabbay, David Airlie, Deucher, Alexander,
	linux-kernel, dri-devel, Lewycky, Andrew, Joerg Roedel, Gabbay,
	Oded, Greg Kroah-Hartman, Rafael J. Wysocki,
	Kishon Vijay Abraham I, Sandeep Nair, Kenneth Heitke,
	Srinivas Pandruvada, Santosh Shilimkar, Andreas Noever,
	Lucas Stach, Philipp Zabel

On Fri, 2014-07-11 at 15:22 -0400, Jerome Glisse wrote:
> Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
> then this is a GPL licensed code, if you claim GPL with additional rights
> than this is dual licensed code. This is how i read and interpret this
> with additional rights. In all the case the radeon code is considered
> dual license ie GPL+BSD (at least this is how i consider that code).

This is pretty common:

MODULE_LICENSE("Dual BSD/GPL");

There are a couple hundred of them.



^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-11 19:38                   ` Joe Perches
  0 siblings, 0 replies; 116+ messages in thread
From: Joe Perches @ 2014-07-11 19:38 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Oded Gabbay, Lewycky, Andrew, Greg Kroah-Hartman,
	Rafael J. Wysocki, linux-kernel, dri-devel,
	Kishon Vijay Abraham I, Andreas Noever, Kenneth Heitke,
	Sandeep Nair, Santosh Shilimkar, Srinivas Pandruvada, Deucher,
	Alexander

On Fri, 2014-07-11 at 15:22 -0400, Jerome Glisse wrote:
> Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
> then this is a GPL licensed code, if you claim GPL with additional rights
> than this is dual licensed code. This is how i read and interpret this
> with additional rights. In all the case the radeon code is considered
> dual license ie GPL+BSD (at least this is how i consider that code).

This is pretty common:

MODULE_LICENSE("Dual BSD/GPL");

There are a couple hundred of them.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 15/83] hsa/radeon: Add interrupt handling module
  2014-07-10 21:50 ` [PATCH 15/83] hsa/radeon: Add interrupt handling module Oded Gabbay
@ 2014-07-11 19:57     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:57 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay

On Fri, Jul 11, 2014 at 12:50:15AM +0300, Oded Gabbay wrote:
> This patch adds the interrupt handling module, in kfd_interrupt.c,
> and its related members in different data structures to the KFD
> driver.
> 
> The KFD interrupt module maintains an internal interrupt ring per kfd
> device. The internal interrupt ring contains interrupts that needs further
> handling.The extra handling is deferred to a later time through a workqueue.
> 
> There's no acknowledgment for the interrupts we use. The hardware simply queues a new interrupt each time without waiting.
> 
> The fixed-size internal queue means that it's possible for us to lose interrupts because we have no back-pressure to the hardware.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/Makefile        |   2 +-
>  drivers/gpu/hsa/radeon/kfd_device.c    |   1 +
>  drivers/gpu/hsa/radeon/kfd_interrupt.c | 179 +++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_priv.h      |  18 ++++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h |   3 +
>  5 files changed, 202 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> 
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> index 28da10c..5422e6a 100644
> --- a/drivers/gpu/hsa/radeon/Makefile
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -5,6 +5,6 @@
>  radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>  		kfd_pasid.o kfd_topology.o kfd_process.o \
>  		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
> -		kfd_vidmem.o
> +		kfd_vidmem.o kfd_interrupt.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> index 465c822..b2d2861 100644
> --- a/drivers/gpu/hsa/radeon/kfd_device.c
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -30,6 +30,7 @@
>  static const struct kfd_device_info bonaire_device_info = {
>  	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>  	.max_pasid_bits = 16,
> +	.ih_ring_entry_size = 4 * sizeof(uint32_t)
>  };
>  
>  struct kfd_deviceid {
> diff --git a/drivers/gpu/hsa/radeon/kfd_interrupt.c b/drivers/gpu/hsa/radeon/kfd_interrupt.c
> new file mode 100644
> index 0000000..2179780
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_interrupt.c
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +/*
> + * KFD Interrupts.
> + *
> + * AMD GPUs deliver interrupts by pushing an interrupt description onto the
> + * interrupt ring and then sending an interrupt. KGD receives the interrupt
> + * in ISR and sends us a pointer to each new entry on the interrupt ring.
> + *
> + * We generally can't process interrupt-signaled events from ISR, so we call
> + * out to each interrupt client module (currently only the scheduler) to ask if
> + * each interrupt is interesting. If they return true, then it requires further
> + * processing so we copy it to an internal interrupt ring and call each
> + * interrupt client again from a work-queue.
> + *
> + * There's no acknowledgment for the interrupts we use. The hardware simply
> + * queues a new interrupt each time without waiting.
> + *
> + * The fixed-size internal queue means that it's possible for us to lose
> + * interrupts because we have no back-pressure to the hardware.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +#define KFD_INTERRUPT_RING_SIZE 256
> +
> +static void interrupt_wq(struct work_struct *);
> +
> +int
> +radeon_kfd_interrupt_init(struct kfd_dev *kfd)
> +{
> +	void *interrupt_ring = kmalloc_array(KFD_INTERRUPT_RING_SIZE,
> +					kfd->device_info->ih_ring_entry_size,
> +					GFP_KERNEL);
> +	if (!interrupt_ring)
> +		return -ENOMEM;
> +
> +	kfd->interrupt_ring = interrupt_ring;
> +	kfd->interrupt_ring_size =
> +		KFD_INTERRUPT_RING_SIZE * kfd->device_info->ih_ring_entry_size;
> +	atomic_set(&kfd->interrupt_ring_wptr, 0);
> +	atomic_set(&kfd->interrupt_ring_rptr, 0);
> +
> +	spin_lock_init(&kfd->interrupt_lock);
> +
> +	INIT_WORK(&kfd->interrupt_work, interrupt_wq);
> +
> +	kfd->interrupts_active = true;
> +
> +	/*
> +	 * After this function returns, the interrupt will be enabled. This
> +	 * barrier ensures that the interrupt running on a different processor
> +	 * sees all the above writes.
> +	 */
> +	smp_wmb();
> +
> +	return 0;
> +}
> +
> +void
> +radeon_kfd_interrupt_exit(struct kfd_dev *kfd)
> +{
> +	/*
> +	 * Stop the interrupt handler from writing to the ring and scheduling
> +	 * workqueue items. The spinlock ensures that any interrupt running
> +	 * after we have unlocked sees interrupts_active = false.
> +	 */
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&kfd->interrupt_lock, flags);
> +	kfd->interrupts_active = false;
> +	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
> +
> +	/*
> +	 * Flush_scheduled_work ensures that there are no outstanding work-queue
> +	 * items that will access interrupt_ring. New work items can't be
> +	 * created because we stopped interrupt handling above.
> +	 */
> +	flush_scheduled_work();
> +
> +	kfree(kfd->interrupt_ring);
> +}
> +
> +/*
> + * This assumes that it can't be called concurrently with itself
> + * but only with dequeue_ih_ring_entry.
> + */
> +static bool
> +enqueue_ih_ring_entry(struct kfd_dev *kfd, const void *ih_ring_entry)
> +{
> +	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
> +	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
> +
> +	if ((rptr - wptr) % kfd->interrupt_ring_size == kfd->device_info->ih_ring_entry_size) {
> +		/* This is very bad, the system is likely to hang. */
> +		dev_err_ratelimited(radeon_kfd_chardev(),
> +			"Interrupt ring overflow, dropping interrupt.\n");

Why is it that bad ? What are those interrupt use for ? I would assume that
worst case some queue do not see there job progressing but isn't there is a
way for them to manualy pull information after some time out ?

Because afaict there is way to trigger interrupt from shader and i assume
those can reach this hsa code and thus rogue userspace can irq bomb hsa.
Hence i would like to understand what could go wrong.

Cheers,
Jérôme

> +		return false;
> +	}
> +
> +	memcpy(kfd->interrupt_ring + wptr, ih_ring_entry, kfd->device_info->ih_ring_entry_size);
> +	wptr = (wptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
> +	smp_wmb(); /* Ensure memcpy'd data is visible before wptr update. */
> +	atomic_set(&kfd->interrupt_ring_wptr, wptr);
> +
> +	return true;
> +}
> +
> +/*
> + * This assumes that it can't be called concurrently with itself
> + * but only with enqueue_ih_ring_entry.
> + */
> +static bool
> +dequeue_ih_ring_entry(struct kfd_dev *kfd, void *ih_ring_entry)
> +{
> +	/*
> +	 * Assume that wait queues have an implicit barrier, i.e. anything that
> +	 * happened in the ISR before it queued work is visible.
> +	 */
> +
> +	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
> +	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
> +
> +	if (rptr == wptr)
> +		return false;
> +
> +	memcpy(ih_ring_entry, kfd->interrupt_ring + rptr, kfd->device_info->ih_ring_entry_size);
> +	rptr = (rptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
> +	smp_mb(); /* Ensure the rptr write update is not visible until memcpy has finished reading. */
> +	atomic_set(&kfd->interrupt_ring_rptr, rptr);
> +
> +	return true;
> +}
> +
> +static void interrupt_wq(struct work_struct *work)
> +{
> +	struct kfd_dev *dev = container_of(work, struct kfd_dev, interrupt_work);
> +
> +	uint32_t ih_ring_entry[DIV_ROUND_UP(dev->device_info->ih_ring_entry_size, sizeof(uint32_t))];
> +
> +	while (dequeue_ih_ring_entry(dev, ih_ring_entry))
> +		dev->device_info->scheduler_class->interrupt_wq(dev->scheduler, ih_ring_entry);
> +}
> +
> +/* This is called directly from KGD at ISR. */
> +void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> +{
> +	spin_lock(&kfd->interrupt_lock);
> +
> +	if (kfd->interrupts_active
> +	    && kfd->device_info->scheduler_class->interrupt_isr(kfd->scheduler, ih_ring_entry)
> +	    && enqueue_ih_ring_entry(kfd, ih_ring_entry))
> +		schedule_work(&kfd->interrupt_work);
> +
> +	spin_unlock(&kfd->interrupt_lock);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
> index 1d1dbcf..5b6611f 100644
> --- a/drivers/gpu/hsa/radeon/kfd_priv.h
> +++ b/drivers/gpu/hsa/radeon/kfd_priv.h
> @@ -28,6 +28,9 @@
>  #include <linux/mutex.h>
>  #include <linux/radeon_kfd.h>
>  #include <linux/types.h>
> +#include <linux/atomic.h>
> +#include <linux/workqueue.h>
> +#include <linux/spinlock.h>
>  
>  struct kfd_scheduler_class;
>  
> @@ -63,6 +66,7 @@ typedef u32 doorbell_t;
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> +	size_t ih_ring_entry_size;
>  };
>  
>  struct kfd_dev {
> @@ -90,6 +94,15 @@ struct kfd_dev {
>  	struct kgd2kfd_shared_resources shared_resources;
>  
>  	struct kfd_scheduler *scheduler;
> +
> +	/* Interrupts of interest to KFD are copied from the HW ring into a SW ring. */
> +	bool interrupts_active;
> +	void *interrupt_ring;
> +	size_t interrupt_ring_size;
> +	atomic_t interrupt_ring_rptr;
> +	atomic_t interrupt_ring_wptr;
> +	struct work_struct interrupt_work;
> +	spinlock_t interrupt_lock;
>  };
>  
>  /* KGD2KFD callbacks */
> @@ -229,4 +242,9 @@ struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
>  void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
>  uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
>  
> +/* Interrupts */
> +int radeon_kfd_interrupt_init(struct kfd_dev *dev);
> +void radeon_kfd_interrupt_exit(struct kfd_dev *dev);
> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
> +
>  #endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> index 48a032f..e5a93c4 100644
> --- a/drivers/gpu/hsa/radeon/kfd_scheduler.h
> +++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> @@ -55,6 +55,9 @@ struct kfd_scheduler_class {
>  			    unsigned int doorbell);
>  
>  	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
> +
> +	bool (*interrupt_isr)(struct kfd_scheduler *, const void *ih_ring_entry);
> +	void (*interrupt_wq)(struct kfd_scheduler *, const void *ih_ring_entry);
>  };
>  
>  extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 15/83] hsa/radeon: Add interrupt handling module
@ 2014-07-11 19:57     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 19:57 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:15AM +0300, Oded Gabbay wrote:
> This patch adds the interrupt handling module, in kfd_interrupt.c,
> and its related members in different data structures to the KFD
> driver.
> 
> The KFD interrupt module maintains an internal interrupt ring per kfd
> device. The internal interrupt ring contains interrupts that needs further
> handling.The extra handling is deferred to a later time through a workqueue.
> 
> There's no acknowledgment for the interrupts we use. The hardware simply queues a new interrupt each time without waiting.
> 
> The fixed-size internal queue means that it's possible for us to lose interrupts because we have no back-pressure to the hardware.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/Makefile        |   2 +-
>  drivers/gpu/hsa/radeon/kfd_device.c    |   1 +
>  drivers/gpu/hsa/radeon/kfd_interrupt.c | 179 +++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_priv.h      |  18 ++++
>  drivers/gpu/hsa/radeon/kfd_scheduler.h |   3 +
>  5 files changed, 202 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/hsa/radeon/kfd_interrupt.c
> 
> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
> index 28da10c..5422e6a 100644
> --- a/drivers/gpu/hsa/radeon/Makefile
> +++ b/drivers/gpu/hsa/radeon/Makefile
> @@ -5,6 +5,6 @@
>  radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>  		kfd_pasid.o kfd_topology.o kfd_process.o \
>  		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
> -		kfd_vidmem.o
> +		kfd_vidmem.o kfd_interrupt.o
>  
>  obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
> index 465c822..b2d2861 100644
> --- a/drivers/gpu/hsa/radeon/kfd_device.c
> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
> @@ -30,6 +30,7 @@
>  static const struct kfd_device_info bonaire_device_info = {
>  	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>  	.max_pasid_bits = 16,
> +	.ih_ring_entry_size = 4 * sizeof(uint32_t)
>  };
>  
>  struct kfd_deviceid {
> diff --git a/drivers/gpu/hsa/radeon/kfd_interrupt.c b/drivers/gpu/hsa/radeon/kfd_interrupt.c
> new file mode 100644
> index 0000000..2179780
> --- /dev/null
> +++ b/drivers/gpu/hsa/radeon/kfd_interrupt.c
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +/*
> + * KFD Interrupts.
> + *
> + * AMD GPUs deliver interrupts by pushing an interrupt description onto the
> + * interrupt ring and then sending an interrupt. KGD receives the interrupt
> + * in ISR and sends us a pointer to each new entry on the interrupt ring.
> + *
> + * We generally can't process interrupt-signaled events from ISR, so we call
> + * out to each interrupt client module (currently only the scheduler) to ask if
> + * each interrupt is interesting. If they return true, then it requires further
> + * processing so we copy it to an internal interrupt ring and call each
> + * interrupt client again from a work-queue.
> + *
> + * There's no acknowledgment for the interrupts we use. The hardware simply
> + * queues a new interrupt each time without waiting.
> + *
> + * The fixed-size internal queue means that it's possible for us to lose
> + * interrupts because we have no back-pressure to the hardware.
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +#include "kfd_priv.h"
> +#include "kfd_scheduler.h"
> +
> +#define KFD_INTERRUPT_RING_SIZE 256
> +
> +static void interrupt_wq(struct work_struct *);
> +
> +int
> +radeon_kfd_interrupt_init(struct kfd_dev *kfd)
> +{
> +	void *interrupt_ring = kmalloc_array(KFD_INTERRUPT_RING_SIZE,
> +					kfd->device_info->ih_ring_entry_size,
> +					GFP_KERNEL);
> +	if (!interrupt_ring)
> +		return -ENOMEM;
> +
> +	kfd->interrupt_ring = interrupt_ring;
> +	kfd->interrupt_ring_size =
> +		KFD_INTERRUPT_RING_SIZE * kfd->device_info->ih_ring_entry_size;
> +	atomic_set(&kfd->interrupt_ring_wptr, 0);
> +	atomic_set(&kfd->interrupt_ring_rptr, 0);
> +
> +	spin_lock_init(&kfd->interrupt_lock);
> +
> +	INIT_WORK(&kfd->interrupt_work, interrupt_wq);
> +
> +	kfd->interrupts_active = true;
> +
> +	/*
> +	 * After this function returns, the interrupt will be enabled. This
> +	 * barrier ensures that the interrupt running on a different processor
> +	 * sees all the above writes.
> +	 */
> +	smp_wmb();
> +
> +	return 0;
> +}
> +
> +void
> +radeon_kfd_interrupt_exit(struct kfd_dev *kfd)
> +{
> +	/*
> +	 * Stop the interrupt handler from writing to the ring and scheduling
> +	 * workqueue items. The spinlock ensures that any interrupt running
> +	 * after we have unlocked sees interrupts_active = false.
> +	 */
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&kfd->interrupt_lock, flags);
> +	kfd->interrupts_active = false;
> +	spin_unlock_irqrestore(&kfd->interrupt_lock, flags);
> +
> +	/*
> +	 * Flush_scheduled_work ensures that there are no outstanding work-queue
> +	 * items that will access interrupt_ring. New work items can't be
> +	 * created because we stopped interrupt handling above.
> +	 */
> +	flush_scheduled_work();
> +
> +	kfree(kfd->interrupt_ring);
> +}
> +
> +/*
> + * This assumes that it can't be called concurrently with itself
> + * but only with dequeue_ih_ring_entry.
> + */
> +static bool
> +enqueue_ih_ring_entry(struct kfd_dev *kfd, const void *ih_ring_entry)
> +{
> +	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
> +	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
> +
> +	if ((rptr - wptr) % kfd->interrupt_ring_size == kfd->device_info->ih_ring_entry_size) {
> +		/* This is very bad, the system is likely to hang. */
> +		dev_err_ratelimited(radeon_kfd_chardev(),
> +			"Interrupt ring overflow, dropping interrupt.\n");

Why is it that bad ? What are those interrupt use for ? I would assume that
worst case some queue do not see there job progressing but isn't there is a
way for them to manualy pull information after some time out ?

Because afaict there is way to trigger interrupt from shader and i assume
those can reach this hsa code and thus rogue userspace can irq bomb hsa.
Hence i would like to understand what could go wrong.

Cheers,
Jérôme

> +		return false;
> +	}
> +
> +	memcpy(kfd->interrupt_ring + wptr, ih_ring_entry, kfd->device_info->ih_ring_entry_size);
> +	wptr = (wptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
> +	smp_wmb(); /* Ensure memcpy'd data is visible before wptr update. */
> +	atomic_set(&kfd->interrupt_ring_wptr, wptr);
> +
> +	return true;
> +}
> +
> +/*
> + * This assumes that it can't be called concurrently with itself
> + * but only with enqueue_ih_ring_entry.
> + */
> +static bool
> +dequeue_ih_ring_entry(struct kfd_dev *kfd, void *ih_ring_entry)
> +{
> +	/*
> +	 * Assume that wait queues have an implicit barrier, i.e. anything that
> +	 * happened in the ISR before it queued work is visible.
> +	 */
> +
> +	unsigned int wptr = atomic_read(&kfd->interrupt_ring_wptr);
> +	unsigned int rptr = atomic_read(&kfd->interrupt_ring_rptr);
> +
> +	if (rptr == wptr)
> +		return false;
> +
> +	memcpy(ih_ring_entry, kfd->interrupt_ring + rptr, kfd->device_info->ih_ring_entry_size);
> +	rptr = (rptr + kfd->device_info->ih_ring_entry_size) % kfd->interrupt_ring_size;
> +	smp_mb(); /* Ensure the rptr write update is not visible until memcpy has finished reading. */
> +	atomic_set(&kfd->interrupt_ring_rptr, rptr);
> +
> +	return true;
> +}
> +
> +static void interrupt_wq(struct work_struct *work)
> +{
> +	struct kfd_dev *dev = container_of(work, struct kfd_dev, interrupt_work);
> +
> +	uint32_t ih_ring_entry[DIV_ROUND_UP(dev->device_info->ih_ring_entry_size, sizeof(uint32_t))];
> +
> +	while (dequeue_ih_ring_entry(dev, ih_ring_entry))
> +		dev->device_info->scheduler_class->interrupt_wq(dev->scheduler, ih_ring_entry);
> +}
> +
> +/* This is called directly from KGD at ISR. */
> +void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
> +{
> +	spin_lock(&kfd->interrupt_lock);
> +
> +	if (kfd->interrupts_active
> +	    && kfd->device_info->scheduler_class->interrupt_isr(kfd->scheduler, ih_ring_entry)
> +	    && enqueue_ih_ring_entry(kfd, ih_ring_entry))
> +		schedule_work(&kfd->interrupt_work);
> +
> +	spin_unlock(&kfd->interrupt_lock);
> +}
> diff --git a/drivers/gpu/hsa/radeon/kfd_priv.h b/drivers/gpu/hsa/radeon/kfd_priv.h
> index 1d1dbcf..5b6611f 100644
> --- a/drivers/gpu/hsa/radeon/kfd_priv.h
> +++ b/drivers/gpu/hsa/radeon/kfd_priv.h
> @@ -28,6 +28,9 @@
>  #include <linux/mutex.h>
>  #include <linux/radeon_kfd.h>
>  #include <linux/types.h>
> +#include <linux/atomic.h>
> +#include <linux/workqueue.h>
> +#include <linux/spinlock.h>
>  
>  struct kfd_scheduler_class;
>  
> @@ -63,6 +66,7 @@ typedef u32 doorbell_t;
>  struct kfd_device_info {
>  	const struct kfd_scheduler_class *scheduler_class;
>  	unsigned int max_pasid_bits;
> +	size_t ih_ring_entry_size;
>  };
>  
>  struct kfd_dev {
> @@ -90,6 +94,15 @@ struct kfd_dev {
>  	struct kgd2kfd_shared_resources shared_resources;
>  
>  	struct kfd_scheduler *scheduler;
> +
> +	/* Interrupts of interest to KFD are copied from the HW ring into a SW ring. */
> +	bool interrupts_active;
> +	void *interrupt_ring;
> +	size_t interrupt_ring_size;
> +	atomic_t interrupt_ring_rptr;
> +	atomic_t interrupt_ring_wptr;
> +	struct work_struct interrupt_work;
> +	spinlock_t interrupt_lock;
>  };
>  
>  /* KGD2KFD callbacks */
> @@ -229,4 +242,9 @@ struct kfd_dev *radeon_kfd_device_by_pci_dev(const struct pci_dev *pdev);
>  void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value);
>  uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg);
>  
> +/* Interrupts */
> +int radeon_kfd_interrupt_init(struct kfd_dev *dev);
> +void radeon_kfd_interrupt_exit(struct kfd_dev *dev);
> +void kgd2kfd_interrupt(struct kfd_dev *dev, const void *ih_ring_entry);
> +
>  #endif
> diff --git a/drivers/gpu/hsa/radeon/kfd_scheduler.h b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> index 48a032f..e5a93c4 100644
> --- a/drivers/gpu/hsa/radeon/kfd_scheduler.h
> +++ b/drivers/gpu/hsa/radeon/kfd_scheduler.h
> @@ -55,6 +55,9 @@ struct kfd_scheduler_class {
>  			    unsigned int doorbell);
>  
>  	void (*destroy_queue)(struct kfd_scheduler *, struct kfd_scheduler_queue *);
> +
> +	bool (*interrupt_isr)(struct kfd_scheduler *, const void *ih_ring_entry);
> +	void (*interrupt_wq)(struct kfd_scheduler *, const void *ih_ring_entry);
>  };
>  
>  extern const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class;
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
@ 2014-07-11 21:01     ` Jerome Glisse
  2014-07-11 21:01     ` Jerome Glisse
  2014-07-11 21:42     ` Dave Airlie
  2 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 21:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Alexey Skidanov, Ben Goz, Evgeny Pinchuk, linux-api

On Fri, Jul 11, 2014 at 12:50:13AM +0300, Oded Gabbay wrote:
> This patch adds 2 new IOCTL to kfd driver.
> 
> The first IOCTL is KFD_IOC_CREATE_QUEUE that is used by the user-mode
> application to create a compute queue on the GPU.
> 
> The second IOCTL is KFD_IOC_DESTROY_QUEUE that is used by the
> user-mode application to destroy an existing compute queue on the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 155 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c |  11 +++
>  include/uapi/linux/kfd_ioctl.h        |  69 +++++++++++++++
>  3 files changed, 235 insertions(+)
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 0b5bc74..4e7d5d0 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -27,11 +27,13 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
> +#include <uapi/linux/kfd_ioctl.h>
>  #include "kfd_priv.h"
>  #include "kfd_scheduler.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>  
>  static const char kfd_dev_name[] = "kfd";
>  
> @@ -108,17 +110,170 @@ kfd_open(struct inode *inode, struct file *filep)
>  	return 0;
>  }
>  
> +static long
> +kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_create_queue_args args;
> +	struct kfd_dev *dev;
> +	int err = 0;
> +	unsigned int queue_id;
> +	struct kfd_queue *queue;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	dev = radeon_kfd_device_by_id(args.gpu_id);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	queue = kzalloc(
> +		offsetof(struct kfd_queue, scheduler_queue) + dev->device_info->scheduler_class->queue_size,
> +		GFP_KERNEL);
> +
> +	if (!queue)
> +		return -ENOMEM;
> +
> +	queue->dev = dev;
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = radeon_kfd_bind_process_to_device(dev, p);
> +	if (IS_ERR(pdd) < 0) {
> +		err = PTR_ERR(pdd);
> +		goto err_bind_pasid;
> +	}
> +
> +	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
> +			pdd->queue_count,
> +			p->pasid,
> +			dev->id);
> +
> +	if (pdd->queue_count++ == 0) {
> +		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
> +		if (err < 0)
> +			goto err_register_process;
> +	}
> +
> +	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
> +		goto err_allocate_queue_id;
> +
> +	err = dev->device_info->scheduler_class->create_queue(dev->scheduler, pdd->scheduler_process,
> +							      &queue->scheduler_queue,
> +							      (void __user *)args.ring_base_address,
> +							      args.ring_size,
> +							      (void __user *)args.read_pointer_address,
> +							      (void __user *)args.write_pointer_address,
> +							      radeon_kfd_queue_id_to_doorbell(dev, p, queue_id));
> +	if (err)
> +		goto err_create_queue;
> +
> +	radeon_kfd_install_queue(p, queue_id, queue);
> +
> +	args.queue_id = queue_id;
> +	args.doorbell_address = (uint64_t)(uintptr_t)radeon_kfd_get_doorbell(filep, p, dev, queue_id);
> +
> +	if (copy_to_user(arg, &args, sizeof(args))) {
> +		err = -EFAULT;
> +		goto err_copy_args_out;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +
> +	pr_debug("kfd: queue id %d was created successfully.\n"
> +		 "     ring buffer address == 0x%016llX\n"
> +		 "     read ptr address    == 0x%016llX\n"
> +		 "     write ptr address   == 0x%016llX\n"
> +		 "     doorbell address    == 0x%016llX\n",
> +			args.queue_id,
> +			args.ring_base_address,
> +			args.read_pointer_address,
> +			args.write_pointer_address,
> +			args.doorbell_address);
> +
> +	return 0;
> +
> +err_copy_args_out:
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +err_create_queue:
> +	radeon_kfd_remove_queue(p, queue_id);
> +err_allocate_queue_id:
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +err_register_process:
> +err_bind_pasid:
> +	kfree(queue);
> +	mutex_unlock(&p->mutex);
> +	return err;
> +}
> +
> +static int
> +kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_destroy_queue_args args;
> +	struct kfd_queue *queue;
> +	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	mutex_lock(&p->mutex);
> +
> +	queue = radeon_kfd_get_queue(p, args.queue_id);
> +	if (!queue) {
> +		mutex_unlock(&p->mutex);
> +		return -EINVAL;
> +	}
> +
> +	dev = queue->dev;
> +
> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
> +			args.queue_id,
> +			p->pasid);
> +
> +	radeon_kfd_remove_queue(p, args.queue_id);
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +	kfree(queue);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +	BUG_ON(pdd == NULL); /* Because a queue exists. */
> +
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +	return 0;
> +}
>  
>  static long
>  kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
> +	struct kfd_process *process;
>  	long err = -EINVAL;
>  
>  	dev_info(kfd_device,
>  		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>  		 cmd, _IOC_NR(cmd), arg);
>  
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
>  	switch (cmd) {
> +	case KFD_IOC_CREATE_QUEUE:
> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_DESTROY_QUEUE:
> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
> +		break;
> +
>  	default:
>  		dev_err(kfd_device,
>  			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index e1d8506..3de8a02 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -155,3 +155,14 @@ doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_proce
>  	return &pdd->doorbell_mapping[doorbell_index];
>  }
>  
> +/*
> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
> + * to doorbells with the process's doorbell page
> + */
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
> +{
> +	/* doorbell_id_offset accounts for doorbells taken by KGD.
> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts to the process's doorbells */
> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
> +}
> +
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> new file mode 100644
> index 0000000..dcc5fe0
> --- /dev/null
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -0,0 +1,69 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_IOCTL_H_INCLUDED
> +#define KFD_IOCTL_H_INCLUDED
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define KFD_IOCTL_CURRENT_VERSION 1
> +
> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +
> +struct kfd_ioctl_get_version_args {
> +	uint32_t min_supported_version;	/* from KFD */
> +	uint32_t max_supported_version;	/* from KFD */
> +};
> +
> +/* For kfd_ioctl_create_queue_args.queue_type. */
> +#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
> +#define KFD_IOC_QUEUE_TYPE_SDMA      1
> +
> +struct kfd_ioctl_create_queue_args {
> +	uint64_t ring_base_address;	/* to KFD */
> +	uint32_t ring_size;		/* to KFD */
> +	uint32_t gpu_id;		/* to KFD */
> +	uint32_t queue_type;		/* to KFD */
> +	uint32_t queue_percentage;	/* to KFD */
> +	uint32_t queue_priority;	/* to KFD */

Is this priority global accross all process or local to the process ?
Local is fine. But global is not, if you want some global priority
best is probably to go use some value provided by cgroup.

> +	uint64_t write_pointer_address;	/* to KFD */
> +	uint64_t read_pointer_address;	/* to KFD */
> +
> +	uint64_t doorbell_address;	/* from KFD */
> +	uint32_t queue_id;		/* from KFD */
> +};
> +
> +struct kfd_ioctl_destroy_queue_args {
> +	uint32_t queue_id;		/* to KFD */
> +};
> +
> +#define KFD_IOC_MAGIC 'K'
> +
> +#define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
> +#define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
> +#define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
> +
> +#pragma pack(pop)
> +
> +#endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
@ 2014-07-11 21:01     ` Jerome Glisse
  0 siblings, 0 replies; 116+ messages in thread
From: Jerome Glisse @ 2014-07-11 21:01 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Ben Goz, linux-kernel, dri-devel, Evgeny Pinchuk,
	Alexey Skidanov, linux-api, Alex Deucher

On Fri, Jul 11, 2014 at 12:50:13AM +0300, Oded Gabbay wrote:
> This patch adds 2 new IOCTL to kfd driver.
> 
> The first IOCTL is KFD_IOC_CREATE_QUEUE that is used by the user-mode
> application to create a compute queue on the GPU.
> 
> The second IOCTL is KFD_IOC_DESTROY_QUEUE that is used by the
> user-mode application to destroy an existing compute queue on the GPU.
> 
> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> ---
>  drivers/gpu/hsa/radeon/kfd_chardev.c  | 155 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/hsa/radeon/kfd_doorbell.c |  11 +++
>  include/uapi/linux/kfd_ioctl.h        |  69 +++++++++++++++
>  3 files changed, 235 insertions(+)
>  create mode 100644 include/uapi/linux/kfd_ioctl.h
> 
> diff --git a/drivers/gpu/hsa/radeon/kfd_chardev.c b/drivers/gpu/hsa/radeon/kfd_chardev.c
> index 0b5bc74..4e7d5d0 100644
> --- a/drivers/gpu/hsa/radeon/kfd_chardev.c
> +++ b/drivers/gpu/hsa/radeon/kfd_chardev.c
> @@ -27,11 +27,13 @@
>  #include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
> +#include <uapi/linux/kfd_ioctl.h>
>  #include "kfd_priv.h"
>  #include "kfd_scheduler.h"
>  
>  static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>  static int kfd_open(struct inode *, struct file *);
> +static int kfd_mmap(struct file *, struct vm_area_struct *);
>  
>  static const char kfd_dev_name[] = "kfd";
>  
> @@ -108,17 +110,170 @@ kfd_open(struct inode *inode, struct file *filep)
>  	return 0;
>  }
>  
> +static long
> +kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_create_queue_args args;
> +	struct kfd_dev *dev;
> +	int err = 0;
> +	unsigned int queue_id;
> +	struct kfd_queue *queue;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	dev = radeon_kfd_device_by_id(args.gpu_id);
> +	if (dev == NULL)
> +		return -EINVAL;
> +
> +	queue = kzalloc(
> +		offsetof(struct kfd_queue, scheduler_queue) + dev->device_info->scheduler_class->queue_size,
> +		GFP_KERNEL);
> +
> +	if (!queue)
> +		return -ENOMEM;
> +
> +	queue->dev = dev;
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = radeon_kfd_bind_process_to_device(dev, p);
> +	if (IS_ERR(pdd) < 0) {
> +		err = PTR_ERR(pdd);
> +		goto err_bind_pasid;
> +	}
> +
> +	pr_debug("kfd: creating queue number %d for PASID %d on GPU 0x%x\n",
> +			pdd->queue_count,
> +			p->pasid,
> +			dev->id);
> +
> +	if (pdd->queue_count++ == 0) {
> +		err = dev->device_info->scheduler_class->register_process(dev->scheduler, p, &pdd->scheduler_process);
> +		if (err < 0)
> +			goto err_register_process;
> +	}
> +
> +	if (!radeon_kfd_allocate_queue_id(p, &queue_id))
> +		goto err_allocate_queue_id;
> +
> +	err = dev->device_info->scheduler_class->create_queue(dev->scheduler, pdd->scheduler_process,
> +							      &queue->scheduler_queue,
> +							      (void __user *)args.ring_base_address,
> +							      args.ring_size,
> +							      (void __user *)args.read_pointer_address,
> +							      (void __user *)args.write_pointer_address,
> +							      radeon_kfd_queue_id_to_doorbell(dev, p, queue_id));
> +	if (err)
> +		goto err_create_queue;
> +
> +	radeon_kfd_install_queue(p, queue_id, queue);
> +
> +	args.queue_id = queue_id;
> +	args.doorbell_address = (uint64_t)(uintptr_t)radeon_kfd_get_doorbell(filep, p, dev, queue_id);
> +
> +	if (copy_to_user(arg, &args, sizeof(args))) {
> +		err = -EFAULT;
> +		goto err_copy_args_out;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +
> +	pr_debug("kfd: queue id %d was created successfully.\n"
> +		 "     ring buffer address == 0x%016llX\n"
> +		 "     read ptr address    == 0x%016llX\n"
> +		 "     write ptr address   == 0x%016llX\n"
> +		 "     doorbell address    == 0x%016llX\n",
> +			args.queue_id,
> +			args.ring_base_address,
> +			args.read_pointer_address,
> +			args.write_pointer_address,
> +			args.doorbell_address);
> +
> +	return 0;
> +
> +err_copy_args_out:
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +err_create_queue:
> +	radeon_kfd_remove_queue(p, queue_id);
> +err_allocate_queue_id:
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +err_register_process:
> +err_bind_pasid:
> +	kfree(queue);
> +	mutex_unlock(&p->mutex);
> +	return err;
> +}
> +
> +static int
> +kfd_ioctl_destroy_queue(struct file *filp, struct kfd_process *p, void __user *arg)
> +{
> +	struct kfd_ioctl_destroy_queue_args args;
> +	struct kfd_queue *queue;
> +	struct kfd_dev *dev;
> +	struct kfd_process_device *pdd;
> +
> +	if (copy_from_user(&args, arg, sizeof(args)))
> +		return -EFAULT;
> +
> +	mutex_lock(&p->mutex);
> +
> +	queue = radeon_kfd_get_queue(p, args.queue_id);
> +	if (!queue) {
> +		mutex_unlock(&p->mutex);
> +		return -EINVAL;
> +	}
> +
> +	dev = queue->dev;
> +
> +	pr_debug("kfd: destroying queue id %d for PASID %d\n",
> +			args.queue_id,
> +			p->pasid);
> +
> +	radeon_kfd_remove_queue(p, args.queue_id);
> +	dev->device_info->scheduler_class->destroy_queue(dev->scheduler, &queue->scheduler_queue);
> +
> +	kfree(queue);
> +
> +	pdd = radeon_kfd_get_process_device_data(dev, p);
> +	BUG_ON(pdd == NULL); /* Because a queue exists. */
> +
> +	if (--pdd->queue_count == 0) {
> +		dev->device_info->scheduler_class->deregister_process(dev->scheduler, pdd->scheduler_process);
> +		pdd->scheduler_process = NULL;
> +	}
> +
> +	mutex_unlock(&p->mutex);
> +	return 0;
> +}
>  
>  static long
>  kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  {
> +	struct kfd_process *process;
>  	long err = -EINVAL;
>  
>  	dev_info(kfd_device,
>  		 "ioctl cmd 0x%x (#%d), arg 0x%lx\n",
>  		 cmd, _IOC_NR(cmd), arg);
>  
> +	process = radeon_kfd_get_process(current);
> +	if (IS_ERR(process))
> +		return PTR_ERR(process);
> +
>  	switch (cmd) {
> +	case KFD_IOC_CREATE_QUEUE:
> +		err = kfd_ioctl_create_queue(filep, process, (void __user *)arg);
> +		break;
> +
> +	case KFD_IOC_DESTROY_QUEUE:
> +		err = kfd_ioctl_destroy_queue(filep, process, (void __user *)arg);
> +		break;
> +
>  	default:
>  		dev_err(kfd_device,
>  			"unknown ioctl cmd 0x%x, arg 0x%lx)\n",
> diff --git a/drivers/gpu/hsa/radeon/kfd_doorbell.c b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> index e1d8506..3de8a02 100644
> --- a/drivers/gpu/hsa/radeon/kfd_doorbell.c
> +++ b/drivers/gpu/hsa/radeon/kfd_doorbell.c
> @@ -155,3 +155,14 @@ doorbell_t __user *radeon_kfd_get_doorbell(struct file *devkfd, struct kfd_proce
>  	return &pdd->doorbell_mapping[doorbell_index];
>  }
>  
> +/*
> + * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
> + * to doorbells with the process's doorbell page
> + */
> +unsigned int radeon_kfd_queue_id_to_doorbell(struct kfd_dev *kfd, struct kfd_process *process, unsigned int queue_id)
> +{
> +	/* doorbell_id_offset accounts for doorbells taken by KGD.
> +	 * pasid * doorbell_process_allocation/sizeof(doorbell_t) adjusts to the process's doorbells */
> +	return kfd->doorbell_id_offset + process->pasid * (doorbell_process_allocation()/sizeof(doorbell_t)) + queue_id;
> +}
> +
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> new file mode 100644
> index 0000000..dcc5fe0
> --- /dev/null
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -0,0 +1,69 @@
> +/*
> + * Copyright 2014 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef KFD_IOCTL_H_INCLUDED
> +#define KFD_IOCTL_H_INCLUDED
> +
> +#include <linux/types.h>
> +#include <linux/ioctl.h>
> +
> +#define KFD_IOCTL_CURRENT_VERSION 1
> +
> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +
> +struct kfd_ioctl_get_version_args {
> +	uint32_t min_supported_version;	/* from KFD */
> +	uint32_t max_supported_version;	/* from KFD */
> +};
> +
> +/* For kfd_ioctl_create_queue_args.queue_type. */
> +#define KFD_IOC_QUEUE_TYPE_COMPUTE   0
> +#define KFD_IOC_QUEUE_TYPE_SDMA      1
> +
> +struct kfd_ioctl_create_queue_args {
> +	uint64_t ring_base_address;	/* to KFD */
> +	uint32_t ring_size;		/* to KFD */
> +	uint32_t gpu_id;		/* to KFD */
> +	uint32_t queue_type;		/* to KFD */
> +	uint32_t queue_percentage;	/* to KFD */
> +	uint32_t queue_priority;	/* to KFD */

Is this priority global accross all process or local to the process ?
Local is fine. But global is not, if you want some global priority
best is probably to go use some value provided by cgroup.

> +	uint64_t write_pointer_address;	/* to KFD */
> +	uint64_t read_pointer_address;	/* to KFD */
> +
> +	uint64_t doorbell_address;	/* from KFD */
> +	uint32_t queue_id;		/* from KFD */
> +};
> +
> +struct kfd_ioctl_destroy_queue_args {
> +	uint32_t queue_id;		/* to KFD */
> +};
> +
> +#define KFD_IOC_MAGIC 'K'
> +
> +#define KFD_IOC_GET_VERSION	_IOR(KFD_IOC_MAGIC, 1, struct kfd_ioctl_get_version_args)
> +#define KFD_IOC_CREATE_QUEUE	_IOWR(KFD_IOC_MAGIC, 2, struct kfd_ioctl_create_queue_args)
> +#define KFD_IOC_DESTROY_QUEUE	_IOWR(KFD_IOC_MAGIC, 3, struct kfd_ioctl_destroy_queue_args)
> +
> +#pragma pack(pop)
> +
> +#endif
> -- 
> 1.9.1
> 

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
@ 2014-07-11 21:42     ` Dave Airlie
  2014-07-11 21:01     ` Jerome Glisse
  2014-07-11 21:42     ` Dave Airlie
  2 siblings, 0 replies; 116+ messages in thread
From: Dave Airlie @ 2014-07-11 21:42 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: David Airlie, Alex Deucher, Jerome Glisse, LKML, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Oded Gabbay,
	Alexey Skidanov, Ben Goz, Evgeny Pinchuk, linux-api

> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +

Don't do this, pad and align things explicitly in structs.

> +struct kfd_ioctl_create_queue_args {
> +       uint64_t ring_base_address;     /* to KFD */
> +       uint32_t ring_size;             /* to KFD */
> +       uint32_t gpu_id;                /* to KFD */
> +       uint32_t queue_type;            /* to KFD */
> +       uint32_t queue_percentage;      /* to KFD */
> +       uint32_t queue_priority;        /* to KFD */
> +       uint64_t write_pointer_address; /* to KFD */
> +       uint64_t read_pointer_address;  /* to KFD */
> +
> +       uint64_t doorbell_address;      /* from KFD */
> +       uint32_t queue_id;              /* from KFD */
> +};
> +

maybe put all the uint64_t at the start, or add explicit padding.

Dave.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
@ 2014-07-11 21:42     ` Dave Airlie
  0 siblings, 0 replies; 116+ messages in thread
From: Dave Airlie @ 2014-07-11 21:42 UTC (permalink / raw)
  To: Oded Gabbay
  Cc: Andrew Lewycky, Ben Goz, LKML, dri-devel, Evgeny Pinchuk,
	Alexey Skidanov, linux-api, Alex Deucher

> +/* The 64-bit ABI is the authoritative version. */
> +#pragma pack(push, 8)
> +

Don't do this, pad and align things explicitly in structs.

> +struct kfd_ioctl_create_queue_args {
> +       uint64_t ring_base_address;     /* to KFD */
> +       uint32_t ring_size;             /* to KFD */
> +       uint32_t gpu_id;                /* to KFD */
> +       uint32_t queue_type;            /* to KFD */
> +       uint32_t queue_percentage;      /* to KFD */
> +       uint32_t queue_priority;        /* to KFD */
> +       uint64_t write_pointer_address; /* to KFD */
> +       uint64_t read_pointer_address;  /* to KFD */
> +
> +       uint64_t doorbell_address;      /* from KFD */
> +       uint32_t queue_id;              /* from KFD */
> +};
> +

maybe put all the uint64_t at the start, or add explicit padding.

Dave.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 17:59         ` Ilyes Gouta
@ 2014-07-11 22:54             ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 22:54 UTC (permalink / raw)
  To: Ilyes Gouta
  Cc: Alex Deucher, Koenig, Christian, Oded Gabbay, Deucher, Alexander,
	Lewycky, Andrew, LKML, Maling list - DRI developers

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1159 bytes --]

>From: Ilyes Gouta [mailto:ilyes.gouta@gmail.com] 
>Sent: Friday, July 11, 2014 2:00 PM
>To: Bridgman, John
>Cc: Alex Deucher; Koenig, Christian; Oded Gabbay; Deucher, Alexander; Lewycky, Andrew; LKML; Maling list - DRI developers
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
>
>Hi,
>
>Just a side question (for information),
>
>On Fri, Jul 11, 2014 at 6:07 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
>
>Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then >the scheduler uses the allocated VMIDs to support a potentially larger number of user processes by dynamically mapping PASIDs to VMIDs and memory >queue descriptors (MQDs) to HW queues.
>
>Are there any documentation/specifications online describing these mechanisms?

Nothing yet, but we should write some docco for this similar to what was written for the gfx blocks. I'll add that to the list, thanks.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-11 22:54             ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-11 22:54 UTC (permalink / raw)
  To: Ilyes Gouta
  Cc: Oded Gabbay, Lewycky, Andrew, LKML, Maling list - DRI developers,
	Deucher, Alexander, Koenig, Christian

>From: Ilyes Gouta [mailto:ilyes.gouta@gmail.com] 
>Sent: Friday, July 11, 2014 2:00 PM
>To: Bridgman, John
>Cc: Alex Deucher; Koenig, Christian; Oded Gabbay; Deucher, Alexander; Lewycky, Andrew; LKML; Maling list - DRI developers
>Subject: Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
>
>Hi,
>
>Just a side question (for information),
>
>On Fri, Jul 11, 2014 at 6:07 PM, Bridgman, John <John.Bridgman@amd.com> wrote:
>
>Right. The SET_RESOURCES packet (kfd_pm4_headers.h, added in patch 49) allocates a range of HW queues, VMIDs and GDS to the HW scheduler, then >the scheduler uses the allocated VMIDs to support a potentially larger number of user processes by dynamically mapping PASIDs to VMIDs and memory >queue descriptors (MQDs) to HW queues.
>
>Are there any documentation/specifications online describing these mechanisms?

Nothing yet, but we should write some docco for this similar to what was written for the gfx blocks. I'll add that to the list, thanks.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  2014-07-11 17:48       ` Bridgman, John
@ 2014-07-12  0:36         ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-12  0:36 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Deucher, Alexander, linux-kernel, dri-devel,
	Lewycky, Andrew, Joerg Roedel, Gabbay, Oded, Koenig, Christian

Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. 

I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ?

Thanks,
JB

>-----Original Message-----
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.
>
>>-----Original Message-----
>>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>>dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>Jérôme
>>
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>>  include/linux/radeon_kfd.h          |  4 ++++
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.allocate_mem = allocate_mem,
>>>  	.free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.kmap_mem = kmap_mem,
>>>  	.unkmap_mem = unkmap_mem,
>>>  	.get_vmem_size = get_vmem_size,
>>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>>  	return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_unlock(&rdev->srbm_mutex);
>>> +}
>>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>>> index c7997d4..40b691c 100644
>>> --- a/include/linux/radeon_kfd.h
>>> +++ b/include/linux/radeon_kfd.h
>>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>>
>>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>>> +
>>> +	/* SRBM_GFX_CNTL mutex */
>>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>>  };
>>>
>>>  bool kgd2kfd_init(unsigned interface_version,
>>> --
>>> 1.9.1
>>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
@ 2014-07-12  0:36         ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-12  0:36 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander,
	Koenig, Christian

Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. 

I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ?

Thanks,
JB

>-----Original Message-----
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.
>
>>-----Original Message-----
>>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>>dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>Jérôme
>>
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>>  include/linux/radeon_kfd.h          |  4 ++++
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.allocate_mem = allocate_mem,
>>>  	.free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.kmap_mem = kmap_mem,
>>>  	.unkmap_mem = unkmap_mem,
>>>  	.get_vmem_size = get_vmem_size,
>>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>>  	return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_unlock(&rdev->srbm_mutex);
>>> +}
>>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>>> index c7997d4..40b691c 100644
>>> --- a/include/linux/radeon_kfd.h
>>> +++ b/include/linux/radeon_kfd.h
>>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>>
>>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>>> +
>>> +	/* SRBM_GFX_CNTL mutex */
>>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>>  };
>>>
>>>  bool kgd2kfd_init(unsigned interface_version,
>>> --
>>> 1.9.1
>>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
  2014-07-11 17:48       ` Bridgman, John
@ 2014-07-12  0:37         ` Bridgman, John
  -1 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-12  0:37 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Deucher, Alexander, linux-kernel, dri-devel,
	Lewycky, Andrew, Joerg Roedel, Gabbay, Oded, Koenig, Christian

>-----Original Message-----
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.

Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. 

I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ?

Thanks,
JB

>
>>-----Original Message-----
>>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>>dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>Jérôme
>>
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>>  include/linux/radeon_kfd.h          |  4 ++++
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.allocate_mem = allocate_mem,
>>>  	.free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.kmap_mem = kmap_mem,
>>>  	.unkmap_mem = unkmap_mem,
>>>  	.get_vmem_size = get_vmem_size,
>>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>>  	return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_unlock(&rdev->srbm_mutex);
>>> +}
>>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>>> index c7997d4..40b691c 100644
>>> --- a/include/linux/radeon_kfd.h
>>> +++ b/include/linux/radeon_kfd.h
>>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>>
>>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>>> +
>>> +	/* SRBM_GFX_CNTL mutex */
>>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>>  };
>>>
>>>  bool kgd2kfd_init(unsigned interface_version,
>>> --
>>> 1.9.1
>>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register
@ 2014-07-12  0:37         ` Bridgman, John
  0 siblings, 0 replies; 116+ messages in thread
From: Bridgman, John @ 2014-07-12  0:37 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Lewycky, Andrew, linux-kernel, dri-devel, Deucher, Alexander,
	Koenig, Christian

>-----Original Message-----
>From: Bridgman, John
>Sent: Friday, July 11, 2014 1:48 PM
>To: 'Jerome Glisse'; Oded Gabbay
>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org; dri-
>devel@lists.freedesktop.org; Lewycky, Andrew; Joerg Roedel; Gabbay, Oded;
>Koenig, Christian
>Subject: RE: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking
>srbm_gfx_cntl register
>
>Checking... we shouldn't need to call the lock from kfd any more.We should
>be able to do any required locking in radeon kgd code.

Confirmed. The locking functions are removed from the interface in commit 82 :

[PATCH 82/83] drm/radeon: Remove lock functions from kfd2kgd interface

There is an elegant symmetry there, but yeah we need to find a way to make this less awkward to review without screwing up all the work you've done so far. It's not obvious how to do that though. I looked at squashing into a smaller number of big commits earlier on but unless we completely rip the code out and recreate from scratch I don't see anything better than :

- a few foundation commits
- a big code dump that covers everything up to ~patch 54 (with 71 squashed in)
- remaining commits squashed a bit to combine fixes with initial code

Is that what you had in mind when you said ~10 big commits ? Our feeling was that the need to skip over the original scheduler would make it more like "one really big commit and 10-20 smaller ones", and I think we all felt that the "big code dump" required to skip over the original scheduler would be a non-starter. 

I guess there is another option, and maybe that's what you had in mind -- breaking the "big code dump" into smaller commits would be possible if we were willing to not have working code until we got to the equivalent of ~patch 54 (+71) when all the new scheduler bits were in. Maybe that would still be an improvement ?

Thanks,
JB

>
>>-----Original Message-----
>>From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>Sent: Friday, July 11, 2014 12:35 PM
>>To: Oded Gabbay
>>Cc: David Airlie; Deucher, Alexander; linux-kernel@vger.kernel.org;
>>dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>Joerg Roedel; Gabbay, Oded; Koenig, Christian
>>Subject: Re: [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of
>>locking srbm_gfx_cntl register
>>
>>On Fri, Jul 11, 2014 at 12:50:07AM +0300, Oded Gabbay wrote:
>>> This patch adds a new interface to kfd2kgd_calls structure, which
>>> allows the kfd to lock and unlock the srbm_gfx_cntl register
>>
>>Why does kfd needs to lock this register if kfd can not access any of
>>those register ? This sounds broken to me, exposing a driver internal
>>mutex to another driver is not something i am fan of.
>>
>>Cheers,
>>Jérôme
>>
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> ---
>>>  drivers/gpu/drm/radeon/radeon_kfd.c | 20 ++++++++++++++++++++
>>>  include/linux/radeon_kfd.h          |  4 ++++
>>>  2 files changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> index 66ee36b..594020e 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kfd.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kfd.c
>>> @@ -43,6 +43,10 @@ static void unkmap_mem(struct kgd_dev *kgd,
>struct
>>> kgd_mem *mem);
>>>
>>>  static uint64_t get_vmem_size(struct kgd_dev *kgd);
>>>
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd); static void
>>> +unlock_srbm_gfx_cntl(struct kgd_dev *kgd);
>>> +
>>> +
>>>  static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.allocate_mem = allocate_mem,
>>>  	.free_mem = free_mem,
>>> @@ -51,6 +55,8 @@ static const struct kfd2kgd_calls kfd2kgd = {
>>>  	.kmap_mem = kmap_mem,
>>>  	.unkmap_mem = unkmap_mem,
>>>  	.get_vmem_size = get_vmem_size,
>>> +	.lock_srbm_gfx_cntl = lock_srbm_gfx_cntl,
>>> +	.unlock_srbm_gfx_cntl = unlock_srbm_gfx_cntl,
>>>  };
>>>
>>>  static const struct kgd2kfd_calls *kgd2kfd; @@ -233,3 +239,17 @@
>>> static uint64_t get_vmem_size(struct kgd_dev *kgd)
>>>
>>>  	return rdev->mc.real_vram_size;
>>>  }
>>> +
>>> +static void lock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_lock(&rdev->srbm_mutex);
>>> +}
>>> +
>>> +static void unlock_srbm_gfx_cntl(struct kgd_dev *kgd) {
>>> +	struct radeon_device *rdev = (struct radeon_device *)kgd;
>>> +
>>> +	mutex_unlock(&rdev->srbm_mutex);
>>> +}
>>> diff --git a/include/linux/radeon_kfd.h b/include/linux/radeon_kfd.h
>>> index c7997d4..40b691c 100644
>>> --- a/include/linux/radeon_kfd.h
>>> +++ b/include/linux/radeon_kfd.h
>>> @@ -81,6 +81,10 @@ struct kfd2kgd_calls {
>>>  	void (*unkmap_mem)(struct kgd_dev *kgd, struct kgd_mem *mem);
>>>
>>>  	uint64_t (*get_vmem_size)(struct kgd_dev *kgd);
>>> +
>>> +	/* SRBM_GFX_CNTL mutex */
>>> +	void (*lock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>> +	void (*unlock_srbm_gfx_cntl)(struct kgd_dev *kgd);
>>>  };
>>>
>>>  bool kgd2kfd_init(unsigned interface_version,
>>> --
>>> 1.9.1
>>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-11 16:22       ` Alex Deucher
@ 2014-07-12  9:00         ` Christian König
  -1 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-12  9:00 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Jerome Glisse, Oded Gabbay, Andrew Lewycky, LKML,
	Maling list - DRI developers, Alex Deucher

Am 11.07.2014 18:22, schrieb Alex Deucher:
> On Fri, Jul 11, 2014 at 12:18 PM, Christian König
> <christian.koenig@amd.com> wrote:
>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>
>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>> that are available for radeon's use with KV.
>>>>
>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>>> single
>>>> pipe in it
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
>> At least fro the VMIDs on demand allocation should be trivial to implement,
>> so I would rather prefer this instead of a fixed assignment.
> IIRC, the way the CP hw scheduler works you have to give it a range of
> vmids and it assigns them dynamically as queues are mapped so
> effectively they are potentially in use once the CP scheduler is set
> up.

That's not what I meant. Changing it completely on the fly is nice to 
have, but we should at least make it configurable as a module parameter.

And even if we hardcode it we should use a define for it somewhere 
instead of hardcoding 8 VMIDs on the KGD side and 8 VMIDs on KFD side 
without any relation to each other.

Christian.

> Alex
>
>
>> Christian.
>>
>>
>>>> ---
>>>>    drivers/gpu/drm/radeon/cik.c | 48
>>>> ++++++++++++++++++++++----------------------
>>>>    1 file changed, 24 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>>>> index 4bfc2c0..e0c8052 100644
>>>> --- a/drivers/gpu/drm/radeon/cik.c
>>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>>> *rdev)
>>>>          /*
>>>>           * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>>           * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>>>> +        * Nonetheless, we assign only 1 pipe because all other pipes
>>>> will
>>>> +        * be handled by KFD
>>>>           */
>>>> -       if (rdev->family == CHIP_KAVERI)
>>>> -               rdev->mec.num_mec = 2;
>>>> -       else
>>>> -               rdev->mec.num_mec = 1;
>>>> -       rdev->mec.num_pipe = 4;
>>>> +       rdev->mec.num_mec = 1;
>>>> +       rdev->mec.num_pipe = 1;
>>>>          rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>>>          if (rdev->mec.hpd_eop_obj == NULL) {
>>>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct
>>>> radeon_device *rdev)
>>>>          /* init the pipes */
>>>>          mutex_lock(&rdev->srbm_mutex);
>>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>>> -               int me = (i < 4) ? 1 : 2;
>>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>>    -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>>> MEC_HPD_SIZE * 2);
>>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>>    -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>    -             /* write the EOP addr */
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>>> upper_32_bits(eop_gpu_addr) >> 8);
>>>> +       /* write the EOP addr */
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >>
>>>> 8);
>>>>    -             /* set the VMID assigned */
>>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>>> +       /* set the VMID assigned */
>>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>>> +
>>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> +       tmp &= ~EOP_SIZE_MASK;
>>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>>    -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>>> dwords */
>>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> -               tmp &= ~EOP_SIZE_MASK;
>>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>> -       }
>>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>          mutex_unlock(&rdev->srbm_mutex);
>>>>          /* init the queues.  Just two for now. */
>>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev,
>>>> struct radeon_ib *ib)
>>>>     */
>>>>    int cik_vm_init(struct radeon_device *rdev)
>>>>    {
>>>> -       /* number of VMs */
>>>> -       rdev->vm_manager.nvm = 16;
>>>> +       /*
>>>> +        * number of VMs
>>>> +        * VMID 0 is reserved for Graphics
>>>> +        * radeon compute will use VMIDs 1-7
>>>> +        * KFD will use VMIDs 8-15
>>>> +        */
>>>> +       rdev->vm_manager.nvm = 8;
>>>>          /* base offset of vram pages */
>>>>          if (rdev->flags & RADEON_IS_IGP) {
>>>>                  u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>>> --
>>>> 1.9.1
>>>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-12  9:00         ` Christian König
  0 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-12  9:00 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Oded Gabbay, Andrew Lewycky, LKML, Maling list - DRI developers,
	Alex Deucher

Am 11.07.2014 18:22, schrieb Alex Deucher:
> On Fri, Jul 11, 2014 at 12:18 PM, Christian König
> <christian.koenig@amd.com> wrote:
>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>
>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>> that are available for radeon's use with KV.
>>>>
>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>>> single
>>>> pipe in it
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
>> At least fro the VMIDs on demand allocation should be trivial to implement,
>> so I would rather prefer this instead of a fixed assignment.
> IIRC, the way the CP hw scheduler works you have to give it a range of
> vmids and it assigns them dynamically as queues are mapped so
> effectively they are potentially in use once the CP scheduler is set
> up.

That's not what I meant. Changing it completely on the fly is nice to 
have, but we should at least make it configurable as a module parameter.

And even if we hardcode it we should use a define for it somewhere 
instead of hardcoding 8 VMIDs on the KGD side and 8 VMIDs on KFD side 
without any relation to each other.

Christian.

> Alex
>
>
>> Christian.
>>
>>
>>>> ---
>>>>    drivers/gpu/drm/radeon/cik.c | 48
>>>> ++++++++++++++++++++++----------------------
>>>>    1 file changed, 24 insertions(+), 24 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
>>>> index 4bfc2c0..e0c8052 100644
>>>> --- a/drivers/gpu/drm/radeon/cik.c
>>>> +++ b/drivers/gpu/drm/radeon/cik.c
>>>> @@ -4662,12 +4662,11 @@ static int cik_mec_init(struct radeon_device
>>>> *rdev)
>>>>          /*
>>>>           * KV:    2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
>>>>           * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
>>>> +        * Nonetheless, we assign only 1 pipe because all other pipes
>>>> will
>>>> +        * be handled by KFD
>>>>           */
>>>> -       if (rdev->family == CHIP_KAVERI)
>>>> -               rdev->mec.num_mec = 2;
>>>> -       else
>>>> -               rdev->mec.num_mec = 1;
>>>> -       rdev->mec.num_pipe = 4;
>>>> +       rdev->mec.num_mec = 1;
>>>> +       rdev->mec.num_pipe = 1;
>>>>          rdev->mec.num_queue = rdev->mec.num_mec * rdev->mec.num_pipe * 8;
>>>>          if (rdev->mec.hpd_eop_obj == NULL) {
>>>> @@ -4809,28 +4808,24 @@ static int cik_cp_compute_resume(struct
>>>> radeon_device *rdev)
>>>>          /* init the pipes */
>>>>          mutex_lock(&rdev->srbm_mutex);
>>>> -       for (i = 0; i < (rdev->mec.num_pipe * rdev->mec.num_mec); i++) {
>>>> -               int me = (i < 4) ? 1 : 2;
>>>> -               int pipe = (i < 4) ? i : (i - 4);
>>>>    -             eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i *
>>>> MEC_HPD_SIZE * 2);
>>>> +       eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
>>>>    -             cik_srbm_select(rdev, me, pipe, 0, 0);
>>>> +       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>    -             /* write the EOP addr */
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> -               WREG32(CP_HPD_EOP_BASE_ADDR_HI,
>>>> upper_32_bits(eop_gpu_addr) >> 8);
>>>> +       /* write the EOP addr */
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
>>>> +       WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >>
>>>> 8);
>>>>    -             /* set the VMID assigned */
>>>> -               WREG32(CP_HPD_EOP_VMID, 0);
>>>> +       /* set the VMID assigned */
>>>> +       WREG32(CP_HPD_EOP_VMID, 0);
>>>> +
>>>> +       /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
>>>> +       tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> +       tmp &= ~EOP_SIZE_MASK;
>>>> +       tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> +       WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>>    -             /* set the EOP size, register value is 2^(EOP_SIZE+1)
>>>> dwords */
>>>> -               tmp = RREG32(CP_HPD_EOP_CONTROL);
>>>> -               tmp &= ~EOP_SIZE_MASK;
>>>> -               tmp |= order_base_2(MEC_HPD_SIZE / 8);
>>>> -               WREG32(CP_HPD_EOP_CONTROL, tmp);
>>>> -       }
>>>> -       cik_srbm_select(rdev, 0, 0, 0, 0);
>>>>          mutex_unlock(&rdev->srbm_mutex);
>>>>          /* init the queues.  Just two for now. */
>>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev,
>>>> struct radeon_ib *ib)
>>>>     */
>>>>    int cik_vm_init(struct radeon_device *rdev)
>>>>    {
>>>> -       /* number of VMs */
>>>> -       rdev->vm_manager.nvm = 16;
>>>> +       /*
>>>> +        * number of VMs
>>>> +        * VMID 0 is reserved for Graphics
>>>> +        * radeon compute will use VMIDs 1-7
>>>> +        * KFD will use VMIDs 8-15
>>>> +        */
>>>> +       rdev->vm_manager.nvm = 8;
>>>>          /* base offset of vram pages */
>>>>          if (rdev->flags & RADEON_IS_IGP) {
>>>>                  u64 tmp = RREG32(MC_VM_FB_OFFSET);
>>>> --
>>>> 1.9.1
>>>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-12  9:00         ` Christian König
@ 2014-07-14  7:31           ` Michel Dänzer
  -1 siblings, 0 replies; 116+ messages in thread
From: Michel Dänzer @ 2014-07-14  7:31 UTC (permalink / raw)
  To: Christian König
  Cc: Alex Deucher, Oded Gabbay, Andrew Lewycky, LKML,
	Maling list - DRI developers

On 12.07.2014 18:00, Christian König wrote:
> Am 11.07.2014 18:22, schrieb Alex Deucher:
>> On Fri, Jul 11, 2014 at 12:18 PM, Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>>
>>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>>> that are available for radeon's use with KV.
>>>>>
>>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>>>> single
>>>>> pipe in it
>>>>>
>>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>>
>>> At least fro the VMIDs on demand allocation should be trivial to
>>> implement,
>>> so I would rather prefer this instead of a fixed assignment.
>> IIRC, the way the CP hw scheduler works you have to give it a range of
>> vmids and it assigns them dynamically as queues are mapped so
>> effectively they are potentially in use once the CP scheduler is set
>> up.
> 
> That's not what I meant. Changing it completely on the fly is nice to
> have, but we should at least make it configurable as a module parameter.
> 
> And even if we hardcode it we should use a define for it somewhere
> instead of hardcoding 8 VMIDs on the KGD side and 8 VMIDs on KFD side
> without any relation to each other.

Seconded, and there should be more explanation and rationale for the way
things are set up in the code or at least in the commit log.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-14  7:31           ` Michel Dänzer
  0 siblings, 0 replies; 116+ messages in thread
From: Michel Dänzer @ 2014-07-14  7:31 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Andrew Lewycky, LKML, Maling list - DRI developers

On 12.07.2014 18:00, Christian König wrote:
> Am 11.07.2014 18:22, schrieb Alex Deucher:
>> On Fri, Jul 11, 2014 at 12:18 PM, Christian König
>> <christian.koenig@amd.com> wrote:
>>> Am 11.07.2014 18:05, schrieb Jerome Glisse:
>>>
>>>> On Fri, Jul 11, 2014 at 12:50:02AM +0300, Oded Gabbay wrote:
>>>>> To support HSA on KV, we need to limit the number of vmids and pipes
>>>>> that are available for radeon's use with KV.
>>>>>
>>>>> This patch reserves VMIDs 8-15 for KFD (so radeon can only use VMIDs
>>>>> 0-7) and also makes radeon thinks that KV has only a single MEC with a
>>>>> single
>>>>> pipe in it
>>>>>
>>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>>
>>> At least fro the VMIDs on demand allocation should be trivial to
>>> implement,
>>> so I would rather prefer this instead of a fixed assignment.
>> IIRC, the way the CP hw scheduler works you have to give it a range of
>> vmids and it assigns them dynamically as queues are mapped so
>> effectively they are potentially in use once the CP scheduler is set
>> up.
> 
> That's not what I meant. Changing it completely on the fly is nice to
> have, but we should at least make it configurable as a module parameter.
> 
> And even if we hardcode it we should use a define for it somewhere
> instead of hardcoding 8 VMIDs on the KGD side and 8 VMIDs on KFD side
> without any relation to each other.

Seconded, and there should be more explanation and rationale for the way
things are set up in the code or at least in the commit log.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
  2014-07-11 21:42     ` Dave Airlie
@ 2014-07-14  7:33       ` Gabbay, Oded
  -1 siblings, 0 replies; 116+ messages in thread
From: Gabbay, Oded @ 2014-07-14  7:33 UTC (permalink / raw)
  To: airlied
  Cc: linux-kernel, j.glisse, Bridgman, John, Deucher, Alexander,
	Lewycky, Andrew, linux-api, joro, Pinchuk, Evgeny, dri-devel,
	Skidanov, Alexey, airlied, oded.gabbay, Goz, Ben

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1144 bytes --]

On Sat, 2014-07-12 at 07:42 +1000, Dave Airlie wrote:
> >  +/* The 64-bit ABI is the authoritative version. */
> >  +#pragma pack(push, 8)
> >  +
>  
> Don't do this, pad and align things explicitly in structs.
>  
> >  +struct kfd_ioctl_create_queue_args {
> >  +       uint64_t ring_base_address;     /* to KFD */
> >  +       uint32_t ring_size;             /* to KFD */
> >  +       uint32_t gpu_id;                /* to KFD */
> >  +       uint32_t queue_type;            /* to KFD */
> >  +       uint32_t queue_percentage;      /* to KFD */
> >  +       uint32_t queue_priority;        /* to KFD */
> >  +       uint64_t write_pointer_address; /* to KFD */
> >  +       uint64_t read_pointer_address;  /* to KFD */
> >  +
> >  +       uint64_t doorbell_address;      /* from KFD */
> >  +       uint32_t queue_id;              /* from KFD */
> >  +};
> >  +
>  
> maybe put all the uint64_t at the start, or add explicit padding.
>  
> Dave.
Thanks, will be fixed.
        Odedÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE
@ 2014-07-14  7:33       ` Gabbay, Oded
  0 siblings, 0 replies; 116+ messages in thread
From: Gabbay, Oded @ 2014-07-14  7:33 UTC (permalink / raw)
  To: airlied
  Cc: oded.gabbay, Lewycky, Andrew, linux-api, linux-kernel, dri-devel,
	Pinchuk, Evgeny, Deucher, Alexander, Skidanov, Alexey

On Sat, 2014-07-12 at 07:42 +1000, Dave Airlie wrote:
> >  +/* The 64-bit ABI is the authoritative version. */
> >  +#pragma pack(push, 8)
> >  +
>  
> Don't do this, pad and align things explicitly in structs.
>  
> >  +struct kfd_ioctl_create_queue_args {
> >  +       uint64_t ring_base_address;     /* to KFD */
> >  +       uint32_t ring_size;             /* to KFD */
> >  +       uint32_t gpu_id;                /* to KFD */
> >  +       uint32_t queue_type;            /* to KFD */
> >  +       uint32_t queue_percentage;      /* to KFD */
> >  +       uint32_t queue_priority;        /* to KFD */
> >  +       uint64_t write_pointer_address; /* to KFD */
> >  +       uint64_t read_pointer_address;  /* to KFD */
> >  +
> >  +       uint64_t doorbell_address;      /* from KFD */
> >  +       uint32_t queue_id;              /* from KFD */
> >  +};
> >  +
>  
> maybe put all the uint64_t at the start, or add explicit padding.
>  
> Dave.
Thanks, will be fixed.
        Oded

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
                   ` (25 preceding siblings ...)
  2014-07-11 16:05   ` Jerome Glisse
@ 2014-07-14  7:38 ` Michel Dänzer
  2014-07-14  7:58   ` Christian König
  26 siblings, 1 reply; 116+ messages in thread
From: Michel Dänzer @ 2014-07-14  7:38 UTC (permalink / raw)
  To: Oded Gabbay, David Airlie, Alex Deucher, Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Christian König

On 11.07.2014 06:50, Oded Gabbay wrote:
> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>   */
>  int cik_vm_init(struct radeon_device *rdev)
>  {
> -	/* number of VMs */
> -	rdev->vm_manager.nvm = 16;
> +	/*
> +	 * number of VMs
> +	 * VMID 0 is reserved for Graphics
> +	 * radeon compute will use VMIDs 1-7
> +	 * KFD will use VMIDs 8-15
> +	 */
> +	rdev->vm_manager.nvm = 8;

This comment is inaccurate: Graphics can use VMIDs 1-7 as well.


-- 
Earthling Michel Dänzer            |                  http://www.amd.com
Libre software enthusiast          |                Mesa and X developer

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-14  7:38 ` Michel Dänzer
@ 2014-07-14  7:58   ` Christian König
  2014-07-17 11:47       ` Oded Gabbay
  0 siblings, 1 reply; 116+ messages in thread
From: Christian König @ 2014-07-14  7:58 UTC (permalink / raw)
  To: Michel Dänzer, Oded Gabbay, David Airlie, Alex Deucher,
	Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Christian König

Am 14.07.2014 09:38, schrieb Michel Dänzer:
> On 11.07.2014 06:50, Oded Gabbay wrote:
>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct radeon_ib *ib)
>>    */
>>   int cik_vm_init(struct radeon_device *rdev)
>>   {
>> -	/* number of VMs */
>> -	rdev->vm_manager.nvm = 16;
>> +	/*
>> +	 * number of VMs
>> +	 * VMID 0 is reserved for Graphics
>> +	 * radeon compute will use VMIDs 1-7
>> +	 * KFD will use VMIDs 8-15
>> +	 */
>> +	rdev->vm_manager.nvm = 8;
> This comment is inaccurate: Graphics can use VMIDs 1-7 as well.

Actually VMID 0 is reserved for system use and graphics operation only 
use VMIDs 1-7.

Christian.

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
  2014-07-14  7:58   ` Christian König
@ 2014-07-17 11:47       ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:47 UTC (permalink / raw)
  To: Christian König, Michel Dänzer, David Airlie,
	Alex Deucher, Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel

On 14/07/14 10:58, Christian König wrote:
> Am 14.07.2014 09:38, schrieb Michel Dänzer:
>> On 11.07.2014 06:50, Oded Gabbay wrote:
>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct
>>> radeon_ib *ib)
>>>    */
>>>   int cik_vm_init(struct radeon_device *rdev)
>>>   {
>>> -    /* number of VMs */
>>> -    rdev->vm_manager.nvm = 16;
>>> +    /*
>>> +     * number of VMs
>>> +     * VMID 0 is reserved for Graphics
>>> +     * radeon compute will use VMIDs 1-7
>>> +     * KFD will use VMIDs 8-15
>>> +     */
>>> +    rdev->vm_manager.nvm = 8;
>> This comment is inaccurate: Graphics can use VMIDs 1-7 as well.
>
> Actually VMID 0 is reserved for system use and graphics operation only use VMIDs
> 1-7.
>
> Christian.
Will be fixed in v2 of the patchset

	Oded
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV
@ 2014-07-17 11:47       ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:47 UTC (permalink / raw)
  To: Christian König, Michel Dänzer, David Airlie,
	Alex Deucher, Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel

On 14/07/14 10:58, Christian König wrote:
> Am 14.07.2014 09:38, schrieb Michel Dänzer:
>> On 11.07.2014 06:50, Oded Gabbay wrote:
>>> @@ -5876,8 +5871,13 @@ int cik_ib_parse(struct radeon_device *rdev, struct
>>> radeon_ib *ib)
>>>    */
>>>   int cik_vm_init(struct radeon_device *rdev)
>>>   {
>>> -    /* number of VMs */
>>> -    rdev->vm_manager.nvm = 16;
>>> +    /*
>>> +     * number of VMs
>>> +     * VMID 0 is reserved for Graphics
>>> +     * radeon compute will use VMIDs 1-7
>>> +     * KFD will use VMIDs 8-15
>>> +     */
>>> +    rdev->vm_manager.nvm = 8;
>> This comment is inaccurate: Graphics can use VMIDs 1-7 as well.
>
> Actually VMID 0 is reserved for system use and graphics operation only use VMIDs
> 1-7.
>
> Christian.
Will be fixed in v2 of the patchset

	Oded
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 19:22                 ` Jerome Glisse
@ 2014-07-17 11:51                   ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:51 UTC (permalink / raw)
  To: Jerome Glisse, Bridgman, John
  Cc: David Airlie, Deucher, Alexander, linux-kernel, dri-devel,
	Lewycky, Andrew, Joerg Roedel, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kishon Vijay Abraham I, Sandeep Nair,
	Kenneth Heitke, Srinivas Pandruvada, Santosh Shilimkar,
	Andreas Noever, Lucas Stach, Philipp Zabel

On 11/07/14 22:22, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 06:56:12PM +0000, Bridgman, John wrote:
>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>> Sent: Friday, July 11, 2014 2:52 PM
>>> To: Bridgman, John
>>> Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>>> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>>> Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>>> Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>>> Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>>> AMD's GPUs
>>>
>>> On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
>>>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>>>> Sent: Friday, July 11, 2014 2:11 PM
>>>>> To: Bridgman, John
>>>>> Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>>>>> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
>>>>> Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
>>>>> Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
>>>>> Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>>>>> Philipp Zabel
>>>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>>>>> for AMD's GPUs
>>>>>
>>>>> On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>>>>>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>>>>>> Sent: Friday, July 11, 2014 1:04 PM
>>>>>>> To: Oded Gabbay
>>>>>>> Cc: David Airlie; Deucher, Alexander;
>>>>>>> linux-kernel@vger.kernel.org;
>>>>>>> dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>>>>>> Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>>>>>>> Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>>>>>>> Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>>>>>>> Philipp Zabel
>>>>>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>>>>>>> for AMD's GPUs
>>>>>>>
>>>>>>> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>>>>>>>> This patch adds the code base of the hsa driver for AMD's GPUs.
>>>>>>>>
>>>>>>>> This driver is called kfd.
>>>>>>>>
>>>>>>>> This initial version supports the first HSA chip, Kaveri.
>>>>>>>>
>>>>>>>> This driver is located in a new directory structure under drivers/gpu.
>>>>>>>>
>>>>>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>>>>>
>>>>>>> There is too coding style issues. While we have been lax on the
>>>>>>> enforcing the scripts/checkpatch.pl rules i think there is a limit
>>>>>>> to that. I am not strict on the 80chars per line but others things
>>>>>>> needs fixing
>>>>> so we stay inline.
>>>>>>>
>>>>>>> Also i am a bit worried about the license, given top comment in
>>>>>>> each of the files i am not sure this is GPL2 compatible. I would
>>>>>>> need to ask lawyer to review that.
>>>>>>>
>>>>>>
>>>>>> Hi Jerome,
>>>>>>
>>>>>> Which line in the license are you concerned about ? In theory we're
>>>>>> using
>>>>> the same license as the initial code pushes for radeon, and I just
>>>>> did a side-by side compare with the license header on cik.c in the
>>>>> radeon tree and confirmed that the two licenses are identical.
>>>>>>
>>>>>> The cik.c header has an additional "Authors:" line which the kfd
>>>>>> files do
>>>>> not, but AFAIK that is not part of the license text proper.
>>>>>>
>>>>>
>>>>> You can not claim GPL if you want to use this license. radeon is
>>>>> weird best for historical reasons as we wanted to share code with BSD
>>>>> thus it is dual licensed and this is reflected with :
>>>>> MODULE_LICENSE("GPL and additional rights");
>>>>>
>>>>> inside radeon_drv.c
>>>>>
>>>>> So if you want to have MODULE_LICENSE(GPL) then you should have
>>>>> header that use the GPL license wording and no wording from BSD like
>>> license.
>>>>> Otherwise change the MODULE_LICENSE and it would also be good to say
>>>>> dual licensed at top of each files (or least next to each license) so
>>>>> that it is clear this is BSD & GPL license.
>>>>
>>>> Got it. Missed that we had a different MODULE_LICENSE.
>>>>
>>>> Since the goal is license compatibility with radeon so we can update the
>>> interface and move code between the drivers in future I guess my
>>> preference would be to update MODULE_LICENSE in the kfd code to "GPL and
>>> additional rights", do you think that would be OK ?
>>>
>>> I am not a lawyer and nothing that i said should be considered as legal advice
>>> (on the contrary ;)) I think you need to be more clear with each license to
>>> clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
>>> would definitly want to talk to lawyer about.
>>
>> Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us.
>>
>> Meditate on this I will. Thanks !
>
> Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
> then this is a GPL licensed code, if you claim GPL with additional rights
> than this is dual licensed code. This is how i read and interpret this
> with additional rights. In all the case the radeon code is considered
> dual license ie GPL+BSD (at least this is how i consider that code).
>
> Cheers,
> Jérôme
>
Changed it to "GPL and additional rights" in v2 of the patchset

	Oded

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-17 11:51                   ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:51 UTC (permalink / raw)
  To: Jerome Glisse, Bridgman, John
  Cc: Lewycky, Andrew, Greg Kroah-Hartman, Rafael J. Wysocki,
	linux-kernel, dri-devel, Kishon Vijay Abraham I, Andreas Noever,
	Kenneth Heitke, Santosh Shilimkar, Sandeep Nair,
	Srinivas Pandruvada, Deucher, Alexander

On 11/07/14 22:22, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 06:56:12PM +0000, Bridgman, John wrote:
>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>> Sent: Friday, July 11, 2014 2:52 PM
>>> To: Bridgman, John
>>> Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>>> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky, Andrew;
>>> Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki; Kishon
>>> Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas Pandruvada;
>>> Santosh Shilimkar; Andreas Noever; Lucas Stach; Philipp Zabel
>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for
>>> AMD's GPUs
>>>
>>> On Fri, Jul 11, 2014 at 06:46:30PM +0000, Bridgman, John wrote:
>>>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>>>> Sent: Friday, July 11, 2014 2:11 PM
>>>>> To: Bridgman, John
>>>>> Cc: Oded Gabbay; David Airlie; Deucher, Alexander; linux-
>>>>> kernel@vger.kernel.org; dri-devel@lists.freedesktop.org; Lewycky,
>>>>> Andrew; Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J.
>>>>> Wysocki; Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke;
>>>>> Srinivas Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>>>>> Philipp Zabel
>>>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>>>>> for AMD's GPUs
>>>>>
>>>>> On Fri, Jul 11, 2014 at 06:02:39PM +0000, Bridgman, John wrote:
>>>>>>> From: Jerome Glisse [mailto:j.glisse@gmail.com]
>>>>>>> Sent: Friday, July 11, 2014 1:04 PM
>>>>>>> To: Oded Gabbay
>>>>>>> Cc: David Airlie; Deucher, Alexander;
>>>>>>> linux-kernel@vger.kernel.org;
>>>>>>> dri- devel@lists.freedesktop.org; Bridgman, John; Lewycky, Andrew;
>>>>>>> Joerg Roedel; Gabbay, Oded; Greg Kroah-Hartman; Rafael J. Wysocki;
>>>>>>> Kishon Vijay Abraham I; Sandeep Nair; Kenneth Heitke; Srinivas
>>>>>>> Pandruvada; Santosh Shilimkar; Andreas Noever; Lucas Stach;
>>>>>>> Philipp Zabel
>>>>>>> Subject: Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver
>>>>>>> for AMD's GPUs
>>>>>>>
>>>>>>> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
>>>>>>>> This patch adds the code base of the hsa driver for AMD's GPUs.
>>>>>>>>
>>>>>>>> This driver is called kfd.
>>>>>>>>
>>>>>>>> This initial version supports the first HSA chip, Kaveri.
>>>>>>>>
>>>>>>>> This driver is located in a new directory structure under drivers/gpu.
>>>>>>>>
>>>>>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>>>>>
>>>>>>> There is too coding style issues. While we have been lax on the
>>>>>>> enforcing the scripts/checkpatch.pl rules i think there is a limit
>>>>>>> to that. I am not strict on the 80chars per line but others things
>>>>>>> needs fixing
>>>>> so we stay inline.
>>>>>>>
>>>>>>> Also i am a bit worried about the license, given top comment in
>>>>>>> each of the files i am not sure this is GPL2 compatible. I would
>>>>>>> need to ask lawyer to review that.
>>>>>>>
>>>>>>
>>>>>> Hi Jerome,
>>>>>>
>>>>>> Which line in the license are you concerned about ? In theory we're
>>>>>> using
>>>>> the same license as the initial code pushes for radeon, and I just
>>>>> did a side-by side compare with the license header on cik.c in the
>>>>> radeon tree and confirmed that the two licenses are identical.
>>>>>>
>>>>>> The cik.c header has an additional "Authors:" line which the kfd
>>>>>> files do
>>>>> not, but AFAIK that is not part of the license text proper.
>>>>>>
>>>>>
>>>>> You can not claim GPL if you want to use this license. radeon is
>>>>> weird best for historical reasons as we wanted to share code with BSD
>>>>> thus it is dual licensed and this is reflected with :
>>>>> MODULE_LICENSE("GPL and additional rights");
>>>>>
>>>>> inside radeon_drv.c
>>>>>
>>>>> So if you want to have MODULE_LICENSE(GPL) then you should have
>>>>> header that use the GPL license wording and no wording from BSD like
>>> license.
>>>>> Otherwise change the MODULE_LICENSE and it would also be good to say
>>>>> dual licensed at top of each files (or least next to each license) so
>>>>> that it is clear this is BSD & GPL license.
>>>>
>>>> Got it. Missed that we had a different MODULE_LICENSE.
>>>>
>>>> Since the goal is license compatibility with radeon so we can update the
>>> interface and move code between the drivers in future I guess my
>>> preference would be to update MODULE_LICENSE in the kfd code to "GPL and
>>> additional rights", do you think that would be OK ?
>>>
>>> I am not a lawyer and nothing that i said should be considered as legal advice
>>> (on the contrary ;)) I think you need to be more clear with each license to
>>> clear says GPLv2 or BSD ie dual licensed but the dual license is a beast you
>>> would definitly want to talk to lawyer about.
>>
>> Yeah, dual license seems horrid in its implications for developers so we've always tried to avoid it. GPL hurts us for porting to other OSes so the X11 / "GPL with additional rights" combo seemed like the ideal solution and we made it somewhat of a corporate standard. Hope that doesn't come back to haunt us.
>>
>> Meditate on this I will. Thanks !
>
> Just to be explicit, my point is that is you claim GPL in MODULE_LICENSE
> then this is a GPL licensed code, if you claim GPL with additional rights
> than this is dual licensed code. This is how i read and interpret this
> with additional rights. In all the case the radeon code is considered
> dual license ie GPL+BSD (at least this is how i consider that code).
>
> Cheers,
> Jérôme
>
Changed it to "GPL and additional rights" in v2 of the patchset

	Oded

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
  2014-07-11 17:28       ` Joe Perches
@ 2014-07-17 11:51         ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:51 UTC (permalink / raw)
  To: Joe Perches, Jerome Glisse
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel, Greg Kroah-Hartman,
	Rafael J. Wysocki, Kishon Vijay Abraham I, Sandeep Nair,
	Kenneth Heitke, Srinivas Pandruvada, Santosh Shilimkar,
	Andreas Noever, Lucas Stach, Philipp Zabel

On 11/07/14 20:28, Joe Perches wrote:
> On Fri, 2014-07-11 at 13:04 -0400, Jerome Glisse wrote:
>> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> []
>>> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>>
>> Nitpick, avoid unsigned int just use unsigned.
>
> I suggest unsigned int is much more common (and better)
> than just unsigned.
>
> $ git grep -P '\bunsigned\s+(?!long|int|short|char)' -- "*.[ch]" | wc -l
> 20778
>
> $ git grep -P "\bunsigned\s+int\b" -- "*.[ch]" | wc -l
> 98068
>
So I left it as unsigned int in v2 of the patchset.

>>> +static int kfd_open(struct inode *, struct file *);
>
> It's also generally better to use types and names tno
> improve how a human reads and understands the code.
>
>
Fixed in v2 of the patchset.

	Oded


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs
@ 2014-07-17 11:51         ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:51 UTC (permalink / raw)
  To: Joe Perches, Jerome Glisse
  Cc: Andrew Lewycky, Greg Kroah-Hartman, Rafael J. Wysocki,
	linux-kernel, dri-devel, Kishon Vijay Abraham I, Andreas Noever,
	Kenneth Heitke, Santosh Shilimkar, Sandeep Nair,
	Srinivas Pandruvada, Alex Deucher

On 11/07/14 20:28, Joe Perches wrote:
> On Fri, 2014-07-11 at 13:04 -0400, Jerome Glisse wrote:
>> On Fri, Jul 11, 2014 at 12:50:09AM +0300, Oded Gabbay wrote:
> []
>>> +static long kfd_ioctl(struct file *, unsigned int, unsigned long);
>>
>> Nitpick, avoid unsigned int just use unsigned.
>
> I suggest unsigned int is much more common (and better)
> than just unsigned.
>
> $ git grep -P '\bunsigned\s+(?!long|int|short|char)' -- "*.[ch]" | wc -l
> 20778
>
> $ git grep -P "\bunsigned\s+int\b" -- "*.[ch]" | wc -l
> 98068
>
So I left it as unsigned int in v2 of the patchset.

>>> +static int kfd_open(struct inode *, struct file *);
>
> It's also generally better to use types and names tno
> improve how a human reads and understands the code.
>
>
Fixed in v2 of the patchset.

	Oded

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface
  2014-07-11 16:24       ` Jerome Glisse
  (?)
@ 2014-07-17 11:55       ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:55 UTC (permalink / raw)
  To: Jerome Glisse, Joe Perches
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel,
	Christian König

On 11/07/14 19:24, Jerome Glisse wrote:
> On Thu, Jul 10, 2014 at 03:38:33PM -0700, Joe Perches wrote:
>> On Fri, 2014-07-11 at 00:50 +0300, Oded Gabbay wrote:
>>> This patch adds the interface between the radeon driver and the kfd
>>> driver. The interface implementation is contained in
>>> radeon_kfd.c and radeon_kfd.h.
>> []
>>>   include/linux/radeon_kfd.h          | 67 ++++++++++++++++++++++++++
>>
>> Is there a good reason to put this file in include/linux?
>>
>
> Agrees, we do not want to clutter include/linux/ with specific driver
> include, i think its one of the rules even thought there is some hw header
> already in there.
>
> I would rather see either a new dir include/hsa or inside include/drm.
>
> Cheers,
> Jérôme
>

Moved to drm/radeon in v2 of the patchset
	Oded

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-11 16:36     ` Jerome Glisse
@ 2014-07-17 11:57       ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:57 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel,
	Christian König

On 11/07/14 19:36, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>> The KFD driver should be loaded when the radeon driver is loaded and
>> should be finalized when the radeon driver is removed.
>>
>> This patch adds a function call to initialize kfd from radeon_init
>> and a function call to finalize kfd from radeon_exit.
>>
>> If the KFD driver is not present in the system, the initialize call
>> fails and the radeon driver continues normally.
>>
>> This patch also adds calls to probe, initialize and finalize a kfd device
>> per radeon device using the kgd-->kfd interface.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
> It might be nice to allow to build radeon without HSA so i think an
> CONFIG_HSA should be added and have other thing depends on it.
> Otherwise this one is.
>
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>
We do allow it :)
There is no problem building radeon without the kfd. In that case, when radeon 
finds out that kfd is not available, it simply moves on with its initialization 
procedure.
	Oded
>
>> ---
>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>   2 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
>> index cb14213..88a45a0 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>>   static inline void radeon_unregister_atpx_handler(void) {}
>>   #endif
>>
>> +extern bool radeon_kfd_init(void);
>> +extern void radeon_kfd_fini(void);
>> +
>>   int radeon_no_wb;
>>   int radeon_modeset = -1;
>>   int radeon_dynclks = -1;
>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>   #endif
>>   	}
>>
>> +	radeon_kfd_init();
>> +
>>   	/* let modprobe override vga console setting */
>>   	return drm_pci_init(driver, pdriver);
>>   }
>>
>>   static void __exit radeon_exit(void)
>>   {
>> +	radeon_kfd_fini();
>>   	drm_pci_exit(driver, pdriver);
>>   	radeon_unregister_atpx_handler();
>>   }
>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
>> index 35d9318..0748284 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>> @@ -34,6 +34,10 @@
>>   #include <linux/slab.h>
>>   #include <linux/pm_runtime.h>
>>
>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>> +
>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>   bool radeon_has_atpx(void);
>>   #else
>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>
>>   	pm_runtime_get_sync(dev->dev);
>>
>> +	radeon_kfd_device_fini(rdev);
>> +
>>   	radeon_acpi_fini(rdev);
>>   	
>>   	radeon_modeset_fini(rdev);
>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   				"Error during ACPI methods call\n");
>>   	}
>>
>> +	radeon_kfd_device_probe(rdev);
>> +	radeon_kfd_device_init(rdev);
>> +
>>   	if (radeon_is_px(dev)) {
>>   		pm_runtime_use_autosuspend(dev->dev);
>>   		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>> --
>> 1.9.1
>>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
@ 2014-07-17 11:57       ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:57 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher,
	Christian König

On 11/07/14 19:36, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>> The KFD driver should be loaded when the radeon driver is loaded and
>> should be finalized when the radeon driver is removed.
>>
>> This patch adds a function call to initialize kfd from radeon_init
>> and a function call to finalize kfd from radeon_exit.
>>
>> If the KFD driver is not present in the system, the initialize call
>> fails and the radeon driver continues normally.
>>
>> This patch also adds calls to probe, initialize and finalize a kfd device
>> per radeon device using the kgd-->kfd interface.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
> It might be nice to allow to build radeon without HSA so i think an
> CONFIG_HSA should be added and have other thing depends on it.
> Otherwise this one is.
>
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>
We do allow it :)
There is no problem building radeon without the kfd. In that case, when radeon 
finds out that kfd is not available, it simply moves on with its initialization 
procedure.
	Oded
>
>> ---
>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>   2 files changed, 15 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
>> index cb14213..88a45a0 100644
>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>>   static inline void radeon_unregister_atpx_handler(void) {}
>>   #endif
>>
>> +extern bool radeon_kfd_init(void);
>> +extern void radeon_kfd_fini(void);
>> +
>>   int radeon_no_wb;
>>   int radeon_modeset = -1;
>>   int radeon_dynclks = -1;
>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>   #endif
>>   	}
>>
>> +	radeon_kfd_init();
>> +
>>   	/* let modprobe override vga console setting */
>>   	return drm_pci_init(driver, pdriver);
>>   }
>>
>>   static void __exit radeon_exit(void)
>>   {
>> +	radeon_kfd_fini();
>>   	drm_pci_exit(driver, pdriver);
>>   	radeon_unregister_atpx_handler();
>>   }
>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
>> index 35d9318..0748284 100644
>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>> @@ -34,6 +34,10 @@
>>   #include <linux/slab.h>
>>   #include <linux/pm_runtime.h>
>>
>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>> +
>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>   bool radeon_has_atpx(void);
>>   #else
>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>
>>   	pm_runtime_get_sync(dev->dev);
>>
>> +	radeon_kfd_device_fini(rdev);
>> +
>>   	radeon_acpi_fini(rdev);
>>   	
>>   	radeon_modeset_fini(rdev);
>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   				"Error during ACPI methods call\n");
>>   	}
>>
>> +	radeon_kfd_device_probe(rdev);
>> +	radeon_kfd_device_init(rdev);
>> +
>>   	if (radeon_is_px(dev)) {
>>   		pm_runtime_use_autosuspend(dev->dev);
>>   		pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 11/83] hsa/radeon: Add scheduler code
  2014-07-11 18:25     ` Jerome Glisse
@ 2014-07-17 11:57       ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:57 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel

On 11/07/14 21:25, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:11AM +0300, Oded Gabbay wrote:
>> This patch adds the code base of the scheduler, which handles queue
>> creation, deletion and scheduling on the CP of the GPU.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
> I would rather see all this squashed, this gave feeling that driver
> can access register which is latter remove. I know jungling with
> patch squashing can be daunting but really it makes reviewing hard
> here because i have to jump back and forth to see if thing i am looking
> at really matter in the final version.
>
> Cheers,
> Jérôme
Squashed and restructured in v2 of the patchset.
	Oded
>
>> ---
>>   drivers/gpu/hsa/radeon/Makefile               |   3 +-
>>   drivers/gpu/hsa/radeon/cik_regs.h             | 213 +++++++
>>   drivers/gpu/hsa/radeon/kfd_device.c           |   1 +
>>   drivers/gpu/hsa/radeon/kfd_registers.c        |  50 ++
>>   drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 800 ++++++++++++++++++++++++++
>>   drivers/gpu/hsa/radeon/kfd_vidmem.c           |  61 ++
>>   6 files changed, 1127 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_registers.c
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>>
>> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
>> index 989518a..28da10c 100644
>> --- a/drivers/gpu/hsa/radeon/Makefile
>> +++ b/drivers/gpu/hsa/radeon/Makefile
>> @@ -4,6 +4,7 @@
>>
>>   radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>>   		kfd_pasid.o kfd_topology.o kfd_process.o \
>> -		kfd_doorbell.o
>> +		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
>> +		kfd_vidmem.o
>>
>>   obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
>> diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
>> new file mode 100644
>> index 0000000..d0cdc57
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/cik_regs.h
>> @@ -0,0 +1,213 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef CIK_REGS_H
>> +#define CIK_REGS_H
>> +
>> +#define BIF_DOORBELL_CNTL				0x530Cu
>> +
>> +#define	SRBM_GFX_CNTL					0xE44
>> +#define	PIPEID(x)					((x) << 0)
>> +#define	MEID(x)						((x) << 2)
>> +#define	VMID(x)						((x) << 4)
>> +#define	QUEUEID(x)					((x) << 8)
>> +
>> +#define	SQ_CONFIG					0x8C00
>> +
>> +#define	SH_MEM_BASES					0x8C28
>> +/* if PTR32, these are the bases for scratch and lds */
>> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
>> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
>> +#define	SH_MEM_APE1_BASE				0x8C2C
>> +/* if PTR32, this is the base location of GPUVM */
>> +#define	SH_MEM_APE1_LIMIT				0x8C30
>> +/* if PTR32, this is the upper limit of GPUVM */
>> +#define	SH_MEM_CONFIG					0x8C34
>> +#define	PTR32						(1 << 0)
>> +#define	ALIGNMENT_MODE(x)				((x) << 2)
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
>> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
>> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
>> +#define	DEFAULT_MTYPE(x)				((x) << 4)
>> +#define	APE1_MTYPE(x)					((x) << 7)
>> +
>> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
>> +#define	MTYPE_NONCACHED					3
>> +
>> +
>> +#define SH_STATIC_MEM_CONFIG				0x9604u
>> +
>> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
>> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
>> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
>> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
>> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
>> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
>> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
>> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
>> +#define	TC_CFG_L1_VOLATILE				0xAC88
>> +#define	TC_CFG_L2_VOLATILE				0xAC8C
>> +
>> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
>> +#define	WPTR_POLL_EN					(1 << 31)
>> +
>> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
>> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
>> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
>> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
>> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
>> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
>> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
>> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
>> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
>> +#define PRIV_REG_INT_ENABLE				(1 << 23)
>> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
>> +#define GENERIC2_INT_ENABLE				(1 << 29)
>> +#define GENERIC1_INT_ENABLE				(1 << 30)
>> +#define GENERIC0_INT_ENABLE				(1 << 31)
>> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
>> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
>> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
>> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
>> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
>> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
>> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
>> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
>> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
>> +#define PRIV_REG_INT_STATUS				(1 << 23)
>> +#define TIME_STAMP_INT_STATUS				(1 << 26)
>> +#define GENERIC2_INT_STATUS				(1 << 29)
>> +#define GENERIC1_INT_STATUS				(1 << 30)
>> +#define GENERIC0_INT_STATUS				(1 << 31)
>> +
>> +#define CP_HPD_EOP_BASE_ADDR				0xC904
>> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
>> +#define CP_HPD_EOP_VMID					0xC90C
>> +#define CP_HPD_EOP_CONTROL				0xC910
>> +#define	EOP_SIZE(x)					((x) << 0)
>> +#define	EOP_SIZE_MASK					(0x3f << 0)
>> +#define CP_MQD_BASE_ADDR				0xC914
>> +#define CP_MQD_BASE_ADDR_HI				0xC918
>> +#define CP_HQD_ACTIVE					0xC91C
>> +#define CP_HQD_VMID					0xC920
>> +
>> +#define CP_HQD_PERSISTENT_STATE				0xC924u
>> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
>> +
>> +#define CP_HQD_PIPE_PRIORITY				0xC928u
>> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
>> +#define CP_HQD_QUANTUM					0xC930u
>> +#define	QUANTUM_EN					1U
>> +#define	QUANTUM_SCALE_1MS				(1U << 4)
>> +#define	QUANTUM_DURATION(x)				((x) << 8)
>> +
>> +#define CP_HQD_PQ_BASE					0xC934
>> +#define CP_HQD_PQ_BASE_HI				0xC938
>> +#define CP_HQD_PQ_RPTR					0xC93C
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
>> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
>> +#define	DOORBELL_OFFSET(x)				((x) << 2)
>> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
>> +#define	DOORBELL_SOURCE					(1 << 28)
>> +#define	DOORBELL_SCHD_HIT				(1 << 29)
>> +#define	DOORBELL_EN					(1 << 30)
>> +#define	DOORBELL_HIT					(1 << 31)
>> +#define CP_HQD_PQ_WPTR					0xC954
>> +#define CP_HQD_PQ_CONTROL				0xC958
>> +#define	QUEUE_SIZE(x)					((x) << 0)
>> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
>> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
>> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
>> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
>> +#define	PQ_ATC_EN					(1 << 23)
>> +#define	PQ_VOLATILE					(1 << 26)
>> +#define	NO_UPDATE_RPTR					(1 << 27)
>> +#define	UNORD_DISPATCH					(1 << 28)
>> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
>> +#define	PRIV_STATE					(1 << 30)
>> +#define	KMD_QUEUE					(1 << 31)
>> +
>> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
>> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
>> +
>> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
>> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
>> +#define CP_HQD_IB_RPTR					0xC964u
>> +#define CP_HQD_IB_CONTROL				0xC968u
>> +#define	IB_ATC_EN					(1U << 23)
>> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
>> +
>> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
>> +#define	DEQUEUE_REQUEST_DRAIN				1
>> +
>> +#define CP_HQD_SEMA_CMD					0xC97Cu
>> +#define CP_HQD_MSG_TYPE					0xC980u
>> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
>> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
>> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
>> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
>> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
>> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
>> +
>> +
>> +#define CP_MQD_CONTROL					0xC99C
>> +#define	MQD_VMID(x)					((x) << 0)
>> +#define	MQD_VMID_MASK					(0xf << 0)
>> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
>> +
>> +#define GRBM_GFX_INDEX					0x30800
>> +#define	INSTANCE_INDEX(x)				((x) << 0)
>> +#define	SH_INDEX(x)					((x) << 8)
>> +#define	SE_INDEX(x)					((x) << 16)
>> +#define	SH_BROADCAST_WRITES				(1 << 29)
>> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
>> +#define	SE_BROADCAST_WRITES				(1 << 31)
>> +
>> +#define SQC_CACHES					0x30d20
>> +#define SQC_POLICY					0x8C38u
>> +#define SQC_VOLATILE					0x8C3Cu
>> +
>> +#define CP_PERFMON_CNTL					0x36020
>> +
>> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
>> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
>> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
>> +
>> +#define ATC_VM_APERTURE0_CNTL				0x3310u
>> +#define	ATS_ACCESS_MODE_NEVER				0
>> +#define	ATS_ACCESS_MODE_ALWAYS				1
>> +
>> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
>> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
>> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
>> +#define ATC_VM_APERTURE1_CNTL				0x3314u
>> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
>> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
>> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
>> +
>> +#endif
>> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
>> index 4e9fe6c..465c822 100644
>> --- a/drivers/gpu/hsa/radeon/kfd_device.c
>> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
>> @@ -28,6 +28,7 @@
>>   #include "kfd_scheduler.h"
>>
>>   static const struct kfd_device_info bonaire_device_info = {
>> +	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>>   	.max_pasid_bits = 16,
>>   };
>>
>> diff --git a/drivers/gpu/hsa/radeon/kfd_registers.c b/drivers/gpu/hsa/radeon/kfd_registers.c
>> new file mode 100644
>> index 0000000..223debd
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_registers.c
>> @@ -0,0 +1,50 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/io.h>
>> +#include "kfd_priv.h"
>> +
>> +/* In KFD, "reg" is the byte offset of the register. */
>> +static void __iomem *reg_address(struct kfd_dev *dev, uint32_t reg)
>> +{
>> +	return dev->regs + reg;
>> +}
>> +
>> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value)
>> +{
>> +	writel(value, reg_address(dev, reg));
>> +}
>> +
>> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg)
>> +{
>> +	return readl(reg_address(dev, reg));
>> +}
>> +
>> +void radeon_kfd_lock_srbm_index(struct kfd_dev *dev)
>> +{
>> +	kfd2kgd->lock_srbm_gfx_cntl(dev->kgd);
>> +}
>> +
>> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *dev)
>> +{
>> +	kfd2kgd->unlock_srbm_gfx_cntl(dev->kgd);
>> +}
>> diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>> new file mode 100644
>> index 0000000..b986ff9
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>> @@ -0,0 +1,800 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/log2.h>
>> +#include <linux/mutex.h>
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include "kfd_priv.h"
>> +#include "kfd_scheduler.h"
>> +#include "cik_regs.h"
>> +
>> +/* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
>> + * The first MEC is ME 1 with the GFX ME as ME 0.
>> + * We split the CP with the KGD, they take the first N pipes and we take the rest.
>> + */
>> +#define CIK_QUEUES_PER_PIPE 8
>> +#define CIK_PIPES_PER_MEC 4
>> +
>> +#define CIK_MAX_PIPES (2 * CIK_PIPES_PER_MEC)
>> +
>> +#define CIK_NUM_VMID 16
>> +
>> +#define CIK_HPD_SIZE_LOG2 11
>> +#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
>> +#define CIK_HPD_ALIGNMENT 256
>> +#define CIK_MQD_ALIGNMENT 4
>> +
>> +#pragma pack(push, 4)
>> +
>> +struct cik_hqd_registers {
>> +	u32 cp_mqd_base_addr;
>> +	u32 cp_mqd_base_addr_hi;
>> +	u32 cp_hqd_active;
>> +	u32 cp_hqd_vmid;
>> +	u32 cp_hqd_persistent_state;
>> +	u32 cp_hqd_pipe_priority;
>> +	u32 cp_hqd_queue_priority;
>> +	u32 cp_hqd_quantum;
>> +	u32 cp_hqd_pq_base;
>> +	u32 cp_hqd_pq_base_hi;
>> +	u32 cp_hqd_pq_rptr;
>> +	u32 cp_hqd_pq_rptr_report_addr;
>> +	u32 cp_hqd_pq_rptr_report_addr_hi;
>> +	u32 cp_hqd_pq_wptr_poll_addr;
>> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
>> +	u32 cp_hqd_pq_doorbell_control;
>> +	u32 cp_hqd_pq_wptr;
>> +	u32 cp_hqd_pq_control;
>> +	u32 cp_hqd_ib_base_addr;
>> +	u32 cp_hqd_ib_base_addr_hi;
>> +	u32 cp_hqd_ib_rptr;
>> +	u32 cp_hqd_ib_control;
>> +	u32 cp_hqd_iq_timer;
>> +	u32 cp_hqd_iq_rptr;
>> +	u32 cp_hqd_dequeue_request;
>> +	u32 cp_hqd_dma_offload;
>> +	u32 cp_hqd_sema_cmd;
>> +	u32 cp_hqd_msg_type;
>> +	u32 cp_hqd_atomic0_preop_lo;
>> +	u32 cp_hqd_atomic0_preop_hi;
>> +	u32 cp_hqd_atomic1_preop_lo;
>> +	u32 cp_hqd_atomic1_preop_hi;
>> +	u32 cp_hqd_hq_scheduler0;
>> +	u32 cp_hqd_hq_scheduler1;
>> +	u32 cp_mqd_control;
>> +};
>> +
>> +struct cik_mqd {
>> +	u32 header;
>> +	u32 dispatch_initiator;
>> +	u32 dimensions[3];
>> +	u32 start_idx[3];
>> +	u32 num_threads[3];
>> +	u32 pipeline_stat_enable;
>> +	u32 perf_counter_enable;
>> +	u32 pgm[2];
>> +	u32 tba[2];
>> +	u32 tma[2];
>> +	u32 pgm_rsrc[2];
>> +	u32 vmid;
>> +	u32 resource_limits;
>> +	u32 static_thread_mgmt01[2];
>> +	u32 tmp_ring_size;
>> +	u32 static_thread_mgmt23[2];
>> +	u32 restart[3];
>> +	u32 thread_trace_enable;
>> +	u32 reserved1;
>> +	u32 user_data[16];
>> +	u32 vgtcs_invoke_count[2];
>> +	struct cik_hqd_registers queue_state;
>> +	u32 dequeue_cntr;
>> +	u32 interrupt_queue[64];
>> +};
>> +
>> +struct cik_mqd_padded {
>> +	struct cik_mqd mqd;
>> +	u8 padding[1024 - sizeof(struct cik_mqd)]; /* Pad MQD out to 1KB. (HW requires 4-byte alignment.) */
>> +};
>> +
>> +#pragma pack(pop)
>> +
>> +struct cik_static_private {
>> +	struct kfd_dev *dev;
>> +
>> +	struct mutex mutex;
>> +
>> +	unsigned int first_pipe;
>> +	unsigned int num_pipes;
>> +
>> +	unsigned long free_vmid_mask; /* unsigned long to make set/clear_bit happy */
>> +
>> +	/* Everything below here is offset by first_pipe. E.g. bit 0 in
>> +	 * free_queues is queue 0 in pipe first_pipe
>> +	 */
>> +
>> +	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
>> +	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
>> +
>> +	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
>> +	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
>> +				 * pipes. This is actually struct cik_mqd_padded. */
>> +	uint64_t hpd_addr;	/* GPU address for hpd_mem. */
>> +	uint64_t mqd_addr;	/* GPU address for mqd_mem. */
>> +	 /*
>> +	  * Pointer for mqd_mem.
>> +	  * We keep this mapped because multiple processes may need to access it
>> +	  * in parallel and this is simpler than controlling concurrent kmaps
>> +	  */
>> +	struct cik_mqd_padded *mqds;
>> +};
>> +
>> +struct cik_static_process {
>> +	unsigned int vmid;
>> +	pasid_t pasid;
>> +};
>> +
>> +struct cik_static_queue {
>> +	unsigned int queue; /* + first_pipe * QUEUES_PER_PIPE */
>> +
>> +	uint64_t mqd_addr;
>> +	struct cik_mqd *mqd;
>> +
>> +	void __user *pq_addr;
>> +	void __user *rptr_address;
>> +	doorbell_t __user *wptr_address;
>> +	uint32_t doorbell_index;
>> +
>> +	uint32_t queue_size_encoded; /* CP_HQD_PQ_CONTROL.QUEUE_SIZE takes the queue size as log2(size) - 3. */
>> +};
>> +
>> +static uint32_t lower_32(uint64_t x)
>> +{
>> +	return (uint32_t)x;
>> +}
>> +
>> +static uint32_t upper_32(uint64_t x)
>> +{
>> +	return (uint32_t)(x >> 32);
>> +}
>> +
>> +/* SRBM_GFX_CNTL provides the MEC/pipe/queue and vmid for many registers that are
>> + * In particular, CP_HQD_* and CP_MQD_* are instanced for each queue. CP_HPD_* are instanced for each pipe.
>> + * SH_MEM_* are instanced per-VMID.
>> + *
>> + * We provide queue_select, pipe_select and vmid_select helpers that should be used before accessing
>> + * registers from those groups. Note that these overwrite each other, e.g. after vmid_select the current
>> + * selected MEC/pipe/queue is undefined.
>> + *
>> + * SRBM_GFX_CNTL and the registers it indexes are shared with KGD. You must be holding the srbm_gfx_cntl
>> + * lock via lock_srbm_index before setting SRBM_GFX_CNTL or accessing any of the instanced registers.
>> + */
>> +static uint32_t make_srbm_gfx_cntl_mpqv(unsigned int me, unsigned int pipe, unsigned int queue, unsigned int vmid)
>> +{
>> +	return QUEUEID(queue) | VMID(vmid) | MEID(me) | PIPEID(pipe);
>> +}
>> +
>> +static void pipe_select(struct cik_static_private *priv, unsigned int pipe)
>> +{
>> +	unsigned int pipe_in_mec = (pipe + priv->first_pipe) % CIK_PIPES_PER_MEC;
>> +	unsigned int mec = (pipe + priv->first_pipe) / CIK_PIPES_PER_MEC;
>> +
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, 0, 0));
>> +}
>> +
>> +static void queue_select(struct cik_static_private *priv, unsigned int queue)
>> +{
>> +	unsigned int queue_in_pipe = queue % CIK_QUEUES_PER_PIPE;
>> +	unsigned int pipe = queue / CIK_QUEUES_PER_PIPE + priv->first_pipe;
>> +	unsigned int pipe_in_mec = pipe % CIK_PIPES_PER_MEC;
>> +	unsigned int mec = pipe / CIK_PIPES_PER_MEC;
>> +
>> +#if 0
>> +	dev_err(radeon_kfd_chardev(), "queue select %d = %u/%u/%u = 0x%08x\n", queue, mec+1, pipe_in_mec, queue_in_pipe,
>> +		make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
>> +#endif
>> +
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
>> +}
>> +
>> +static void vmid_select(struct cik_static_private *priv, unsigned int vmid)
>> +{
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(0, 0, 0, vmid));
>> +}
>> +
>> +static void lock_srbm_index(struct cik_static_private *priv)
>> +{
>> +	radeon_kfd_lock_srbm_index(priv->dev);
>> +}
>> +
>> +static void unlock_srbm_index(struct cik_static_private *priv)
>> +{
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, 0);	/* Be nice to KGD, reset indexed CP registers to the GFX pipe. */
>> +	radeon_kfd_unlock_srbm_index(priv->dev);
>> +}
>> +
>> +/* One-time setup for all compute pipes. They need to be programmed with the address & size of the HPD EOP buffer. */
>> +static void init_pipes(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	lock_srbm_index(priv);
>> +
>> +	for (i = 0; i < priv->num_pipes; i++) {
>> +		uint64_t pipe_hpd_addr = priv->hpd_addr + i * CIK_HPD_SIZE;
>> +
>> +		pipe_select(priv, i);
>> +
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR, lower_32(pipe_hpd_addr >> 8));
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR_HI, upper_32(pipe_hpd_addr >> 8));
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_VMID, 0);
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_CONTROL, CIK_HPD_SIZE_LOG2 - 1);
>> +	}
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +/* Program the VMID -> PASID mapping for one VMID.
>> + * PASID 0 is special: it means to associate no PASID with that VMID.
>> + * This function waits for the VMID/PASID mapping to complete.
>> + */
>> +static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int vmid, pasid_t pasid)
>> +{
>> +	/* We have to assume that there is no outstanding mapping.
>> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
>> +	 * is in progress or because a mapping finished and the SW cleared it.
>> +	 * So the protocol is to always wait & clear.
>> +	 */
>> +
>> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
>> +
>> +	WRITE_REG(priv->dev, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
>> +
>> +	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
>> +		cpu_relax();
>> +	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
>> +}
>> +
>> +static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
>> +{
>> +	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
>> +	 * The hardware fills in the remaining 59 bits according to the following pattern:
>> +	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
>> +	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
>> +	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
>> +	 *
>> +	 * (where X/Y is the configurable nybble with the low-bit 0)
>> +	 *
>> +	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
>> +	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
>> +	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
>> +	 */
>> +
>> +	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE);
>> +
>> +	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
>> +}
>> +
>> +/* Initial programming for all ATS registers.
>> + * - enable ATS for all compute VMIDs
>> + * - clear the VMID/PASID mapping for all compute VMIDS
>> + * - program the shader core flat address settings:
>> + * -- 64-bit mode
>> + * -- unaligned access allowed
>> + * -- noncached (this is the only CPU-coherent mode in CIK)
>> + * -- APE 1 disabled
>> + */
>> +static void init_ats(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	/* Enable self-ringing doorbell recognition and direct the BIF to send
>> +	 * untranslated writes to the IOMMU before comparing to the aperture.*/
>> +	WRITE_REG(priv->dev, BIF_DOORBELL_CNTL, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_ALWAYS);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, priv->free_vmid_mask);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_LOW_ADDR, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_HIGH_ADDR, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL2, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_LOW_ADDR, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_HIGH_ADDR, 0);
>> +
>> +	lock_srbm_index(priv);
>> +
>> +	for (i = 0; i < CIK_NUM_VMID; i++) {
>> +		if (priv->free_vmid_mask & (1U << i)) {
>> +			uint32_t sh_mem_config;
>> +
>> +			set_vmid_pasid_mapping(priv, i, 0);
>> +
>> +			vmid_select(priv, i);
>> +
>> +			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
>> +			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
>> +
>> +			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
>> +
>> +			/* Configure apertures:
>> +			 * LDS:		0x60000000'00000000 - 0x60000001'00000000 (4GB)
>> +			 * Scratch:	0x60000001'00000000 - 0x60000002'00000000 (4GB)
>> +			 * GPUVM:	0x60010000'00000000 - 0x60020000'00000000 (1TB)
>> +			 */
>> +			WRITE_REG(priv->dev, SH_MEM_BASES, compute_sh_mem_bases_64bit(6));
>> +
>> +			/* Scratch aperture is not supported for now. */
>> +			WRITE_REG(priv->dev, SH_STATIC_MEM_CONFIG, 0);
>> +
>> +			/* APE1 disabled for now. */
>> +			WRITE_REG(priv->dev, SH_MEM_APE1_BASE, 1);
>> +			WRITE_REG(priv->dev, SH_MEM_APE1_LIMIT, 0);
>> +		}
>> +	}
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +static void exit_ats(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < CIK_NUM_VMID; i++)
>> +		if (priv->free_vmid_mask & (1U << i))
>> +			set_vmid_pasid_mapping(priv, i, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_NEVER);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, 0);
>> +}
>> +
>> +static struct cik_static_private *kfd_scheduler_to_private(struct kfd_scheduler *scheduler)
>> +{
>> +	return (struct cik_static_private *)scheduler;
>> +}
>> +
>> +static struct cik_static_process *kfd_process_to_private(struct kfd_scheduler_process *process)
>> +{
>> +	return (struct cik_static_process *)process;
>> +}
>> +
>> +static struct cik_static_queue *kfd_queue_to_private(struct kfd_scheduler_queue *queue)
>> +{
>> +	return (struct cik_static_queue *)queue;
>> +}
>> +
>> +static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **scheduler)
>> +{
>> +	struct cik_static_private *priv;
>> +	unsigned int i;
>> +	int err;
>> +	void *hpdptr;
>> +
>> +	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
>> +	if (priv == NULL)
>> +		return -ENOMEM;
>> +
>> +	mutex_init(&priv->mutex);
>> +
>> +	priv->dev = dev;
>> +
>> +	priv->first_pipe = dev->shared_resources.first_compute_pipe;
>> +	priv->num_pipes = dev->shared_resources.compute_pipe_count;
>> +
>> +	for (i = 0; i < priv->num_pipes * CIK_QUEUES_PER_PIPE; i++)
>> +		__set_bit(i, priv->free_queues);
>> +
>> +	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
>> +
>> +	/*
>> +	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
>> +	 * The driver never accesses this memory after zeroing it. It doesn't even have
>> +	 * to be saved/restored on suspend/resume because it contains no data when there
>> +	 * are no active queues.
>> +	 */
>> +	err = radeon_kfd_vidmem_alloc(dev,
>> +				      CIK_HPD_SIZE * priv->num_pipes * 2,
>> +				      PAGE_SIZE,
>> +				      KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
>> +				      &priv->hpd_mem);
>> +	if (err)
>> +		goto err_hpd_alloc;
>> +
>> +	err = radeon_kfd_vidmem_kmap(dev, priv->hpd_mem, &hpdptr);
>> +	if (err)
>> +		goto err_hpd_kmap;
>> +	memset(hpdptr, 0, CIK_HPD_SIZE * priv->num_pipes);
>> +	radeon_kfd_vidmem_unkmap(dev, priv->hpd_mem);
>> +
>> +	/*
>> +	 * Allocate memory for all the MQDs.
>> +	 * These are per-queue data that is hardware owned but with driver init.
>> +	 * The driver has to copy this data into HQD registers when a
>> +	 * pipe is (re)activated.
>> +	 */
>> +	err = radeon_kfd_vidmem_alloc(dev,
>> +				      sizeof(struct cik_mqd_padded) * priv->num_pipes * CIK_QUEUES_PER_PIPE,
>> +				      PAGE_SIZE,
>> +				      KFD_MEMPOOL_SYSTEM_CACHEABLE,
>> +				      &priv->mqd_mem);
>> +	if (err)
>> +		goto err_mqd_alloc;
>> +	radeon_kfd_vidmem_kmap(dev, priv->mqd_mem, (void **)&priv->mqds);
>> +	if (err)
>> +		goto err_mqd_kmap;
>> +
>> +	*scheduler = (struct kfd_scheduler *)priv;
>> +
>> +	return 0;
>> +
>> +err_mqd_kmap:
>> +	radeon_kfd_vidmem_free(dev, priv->mqd_mem);
>> +err_mqd_alloc:
>> +err_hpd_kmap:
>> +	radeon_kfd_vidmem_free(dev, priv->hpd_mem);
>> +err_hpd_alloc:
>> +	mutex_destroy(&priv->mutex);
>> +	kfree(priv);
>> +	return err;
>> +}
>> +
>> +static void cik_static_destroy(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	radeon_kfd_vidmem_unkmap(priv->dev, priv->mqd_mem);
>> +	radeon_kfd_vidmem_free(priv->dev, priv->mqd_mem);
>> +	radeon_kfd_vidmem_free(priv->dev, priv->hpd_mem);
>> +
>> +	mutex_destroy(&priv->mutex);
>> +
>> +	kfree(priv);
>> +}
>> +
>> +static void cik_static_start(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->hpd_mem, &priv->hpd_addr);
>> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->mqd_mem, &priv->mqd_addr);
>> +
>> +	init_pipes(priv);
>> +	init_ats(priv);
>> +}
>> +
>> +static void cik_static_stop(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	exit_ats(priv);
>> +
>> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
>> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
>> +}
>> +
>> +static bool allocate_vmid(struct cik_static_private *priv, unsigned int *vmid)
>> +{
>> +	bool ok = false;
>> +
>> +	mutex_lock(&priv->mutex);
>> +
>> +	if (priv->free_vmid_mask != 0) {
>> +		unsigned int v = __ffs64(priv->free_vmid_mask);
>> +
>> +		clear_bit(v, &priv->free_vmid_mask);
>> +		*vmid = v;
>> +
>> +		ok = true;
>> +	}
>> +
>> +	mutex_unlock(&priv->mutex);
>> +
>> +	return ok;
>> +}
>> +
>> +static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
>> +{
>> +	/* It's okay to race against allocate_vmid because this only adds bits to free_vmid_mask.
>> +	 * And set_bit/clear_bit are atomic wrt each other. */
>> +	set_bit(vmid, &priv->free_vmid_mask);
>> +}
>> +
>> +static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
>> +{
>> +	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
>> +
>> +	/*
>> +	 * SH_MEM_CONFIG and others need to be programmed differently
>> +	 * for 32/64-bit processes. And maybe other reasons.
>> +	 */
>> +}
>> +
>> +static int
>> +cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process *process,
>> +			    struct kfd_scheduler_process **scheduler_process)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	struct cik_static_process *hwp;
>> +
>> +	hwp = kmalloc(sizeof(*hwp), GFP_KERNEL);
>> +	if (hwp == NULL)
>> +		return -ENOMEM;
>> +
>> +	if (!allocate_vmid(priv, &hwp->vmid)) {
>> +		kfree(hwp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	hwp->pasid = process->pasid;
>> +
>> +	setup_vmid_for_process(priv, hwp);
>> +
>> +	*scheduler_process = (struct kfd_scheduler_process *)hwp;
>> +
>> +	return 0;
>> +}
>> +
>> +static void cik_static_deregister_process(struct kfd_scheduler *scheduler,
>> +				struct kfd_scheduler_process *scheduler_process)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_process *pp = kfd_process_to_private(scheduler_process);
>> +
>> +	release_vmid(priv, pp->vmid);
>> +	kfree(pp);
>> +}
>> +
>> +static bool allocate_hqd(struct cik_static_private *priv, unsigned int *queue)
>> +{
>> +	bool ok = false;
>> +	unsigned int q;
>> +
>> +	mutex_lock(&priv->mutex);
>> +
>> +	q = find_first_bit(priv->free_queues, priv->num_pipes * CIK_QUEUES_PER_PIPE);
>> +
>> +	if (q != priv->num_pipes * CIK_QUEUES_PER_PIPE) {
>> +		clear_bit(q, priv->free_queues);
>> +		*queue = q;
>> +
>> +		ok = true;
>> +	}
>> +
>> +	mutex_unlock(&priv->mutex);
>> +
>> +	return ok;
>> +}
>> +
>> +static void release_hqd(struct cik_static_private *priv, unsigned int queue)
>> +{
>> +	/* It's okay to race against allocate_hqd because this only adds bits to free_queues.
>> +	 * And set_bit/clear_bit are atomic wrt each other. */
>> +	set_bit(queue, priv->free_queues);
>> +}
>> +
>> +static void init_mqd(const struct cik_static_queue *queue, const struct cik_static_process *process)
>> +{
>> +	struct cik_mqd *mqd = queue->mqd;
>> +
>> +	memset(mqd, 0, sizeof(*mqd));
>> +
>> +	mqd->header = 0xC0310800;
>> +	mqd->pipeline_stat_enable = 1;
>> +	mqd->static_thread_mgmt01[0] = 0xffffffff;
>> +	mqd->static_thread_mgmt01[1] = 0xffffffff;
>> +	mqd->static_thread_mgmt23[0] = 0xffffffff;
>> +	mqd->static_thread_mgmt23[1] = 0xffffffff;
>> +
>> +	mqd->queue_state.cp_mqd_base_addr = lower_32(queue->mqd_addr);
>> +	mqd->queue_state.cp_mqd_base_addr_hi = upper_32(queue->mqd_addr);
>> +	mqd->queue_state.cp_mqd_control = MQD_CONTROL_PRIV_STATE_EN;
>> +
>> +	mqd->queue_state.cp_hqd_pq_base = lower_32((uintptr_t)queue->pq_addr >> 8);
>> +	mqd->queue_state.cp_hqd_pq_base_hi = upper_32((uintptr_t)queue->pq_addr >> 8);
>> +	mqd->queue_state.cp_hqd_pq_control = QUEUE_SIZE(queue->queue_size_encoded) | DEFAULT_RPTR_BLOCK_SIZE
>> +					    | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
>> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uintptr_t)queue->rptr_address);
>> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uintptr_t)queue->rptr_address);
>> +	mqd->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(queue->doorbell_index) | DOORBELL_EN;
>> +	mqd->queue_state.cp_hqd_vmid = process->vmid;
>> +	mqd->queue_state.cp_hqd_active = 1;
>> +
>> +	mqd->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
>> +
>> +	/* The values for these 3 are from WinKFD. */
>> +	mqd->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
>> +	mqd->queue_state.cp_hqd_pipe_priority = 1;
>> +	mqd->queue_state.cp_hqd_queue_priority = 15;
>> +
>> +	mqd->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
>> +}
>> +
>> +/* Write the HQD registers and activate the queue.
>> + * Requires that SRBM_GFX_CNTL has already been programmed for the queue.
>> + */
>> +static void load_hqd(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	struct kfd_dev *dev = priv->dev;
>> +	const struct cik_hqd_registers *qs = &queue->mqd->queue_state;
>> +
>> +	WRITE_REG(dev, CP_MQD_BASE_ADDR, qs->cp_mqd_base_addr);
>> +	WRITE_REG(dev, CP_MQD_BASE_ADDR_HI, qs->cp_mqd_base_addr_hi);
>> +	WRITE_REG(dev, CP_MQD_CONTROL, qs->cp_mqd_control);
>> +
>> +	WRITE_REG(dev, CP_HQD_PQ_BASE, qs->cp_hqd_pq_base);
>> +	WRITE_REG(dev, CP_HQD_PQ_BASE_HI, qs->cp_hqd_pq_base_hi);
>> +	WRITE_REG(dev, CP_HQD_PQ_CONTROL, qs->cp_hqd_pq_control);
>> +	/* DOORBELL_CONTROL before WPTR because WPTR writes are dropped if DOORBELL_HIT is set. */
>> +	WRITE_REG(dev, CP_HQD_PQ_DOORBELL_CONTROL, qs->cp_hqd_pq_doorbell_control);
>> +	WRITE_REG(dev, CP_HQD_PQ_WPTR, qs->cp_hqd_pq_wptr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR, qs->cp_hqd_pq_rptr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR, qs->cp_hqd_pq_rptr_report_addr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, qs->cp_hqd_pq_rptr_report_addr_hi);
>> +
>> +	WRITE_REG(dev, CP_HQD_VMID, qs->cp_hqd_vmid);
>> +	WRITE_REG(dev, CP_HQD_PERSISTENT_STATE, qs->cp_hqd_persistent_state);
>> +	WRITE_REG(dev, CP_HQD_QUANTUM, qs->cp_hqd_quantum);
>> +	WRITE_REG(dev, CP_HQD_PIPE_PRIORITY, qs->cp_hqd_pipe_priority);
>> +	WRITE_REG(dev, CP_HQD_QUEUE_PRIORITY, qs->cp_hqd_queue_priority);
>> +
>> +	WRITE_REG(dev, CP_HQD_IB_CONTROL, qs->cp_hqd_ib_control);
>> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR, qs->cp_hqd_ib_base_addr);
>> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR_HI, qs->cp_hqd_ib_base_addr_hi);
>> +	WRITE_REG(dev, CP_HQD_IB_RPTR, qs->cp_hqd_ib_rptr);
>> +	WRITE_REG(dev, CP_HQD_SEMA_CMD, qs->cp_hqd_sema_cmd);
>> +	WRITE_REG(dev, CP_HQD_MSG_TYPE, qs->cp_hqd_msg_type);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_LO, qs->cp_hqd_atomic0_preop_lo);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_HI, qs->cp_hqd_atomic0_preop_hi);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_LO, qs->cp_hqd_atomic1_preop_lo);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_HI, qs->cp_hqd_atomic1_preop_hi);
>> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER0, qs->cp_hqd_hq_scheduler0);
>> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER1, qs->cp_hqd_hq_scheduler1);
>> +
>> +	WRITE_REG(dev, CP_HQD_ACTIVE, 1);
>> +}
>> +
>> +static void activate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	bool wptr_shadow_valid;
>> +	doorbell_t wptr_shadow;
>> +
>> +	/* Avoid sleeping while holding the SRBM lock. */
>> +	wptr_shadow_valid = !get_user(wptr_shadow, queue->wptr_address);
>> +
>> +	lock_srbm_index(priv);
>> +	queue_select(priv, queue->queue);
>> +
>> +	load_hqd(priv, queue);
>> +
>> +	/* Doorbell and wptr are special because there is a race when reactivating a queue.
>> +	 * Since doorbell writes to deactivated queues are ignored by hardware, the application
>> +	 * shadows the doorbell into memory at queue->wptr_address.
>> +	 *
>> +	 * We want the queue to automatically resume processing as if it were always active,
>> +	 * so we want to copy from queue->wptr_address into the wptr/doorbell.
>> +	 *
>> +	 * The race is that the app could write a new wptr into the doorbell before we
>> +	 * write the shadowed wptr, resulting in an old wptr written later.
>> +	 *
>> +	 * The hardware solves this ignoring CP_HQD_WPTR writes after a doorbell write.
>> +	 * So the KFD can activate the doorbell then write the shadow wptr to CP_HQD_WPTR
>> +	 * knowing it will be ignored if the user has written a more-recent doorbell.
>> +	 */
>> +	if (wptr_shadow_valid)
>> +		WRITE_REG(priv->dev, CP_HQD_PQ_WPTR, wptr_shadow);
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +static void drain_hqd(struct cik_static_private *priv)
>> +{
>> +	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
>> +}
>> +
>> +static void wait_hqd_inactive(struct cik_static_private *priv)
>> +{
>> +	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
>> +		cpu_relax();
>> +}
>> +
>> +static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	lock_srbm_index(priv);
>> +	queue_select(priv, queue->queue);
>> +
>> +	drain_hqd(priv);
>> +	wait_hqd_inactive(priv);
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +#define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
>> +#define RING_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 8))
>> +#define RWPTR_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 2))
>> +
>> +#define MAX_QUEUE_SIZE (1ULL << 32)
>> +#define MIN_QUEUE_SIZE (1ULL << 10)
>> +
>> +static int
>> +cik_static_create_queue(struct kfd_scheduler *scheduler,
>> +			struct kfd_scheduler_process *process,
>> +			struct kfd_scheduler_queue *queue,
>> +			void __user *ring_address,
>> +			uint64_t ring_size,
>> +			void __user *rptr_address,
>> +			void __user *wptr_address,
>> +			unsigned int doorbell)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_process *hwp = kfd_process_to_private(process);
>> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
>> +
>> +	if ((uint64_t)ring_address & RING_ADDRESS_BAD_BIT_MASK
>> +	    || (uint64_t)rptr_address & RWPTR_ADDRESS_BAD_BIT_MASK
>> +	    || (uint64_t)wptr_address & RWPTR_ADDRESS_BAD_BIT_MASK)
>> +		return -EINVAL;
>> +
>> +	if (ring_size > MAX_QUEUE_SIZE || ring_size < MIN_QUEUE_SIZE || !is_power_of_2(ring_size))
>> +		return -EINVAL;
>> +
>> +	if (!allocate_hqd(priv, &hwq->queue))
>> +		return -ENOMEM;
>> +
>> +	hwq->mqd_addr = priv->mqd_addr + sizeof(struct cik_mqd_padded) * hwq->queue;
>> +	hwq->mqd = &priv->mqds[hwq->queue].mqd;
>> +	hwq->pq_addr = ring_address;
>> +	hwq->rptr_address = rptr_address;
>> +	hwq->wptr_address = wptr_address;
>> +	hwq->doorbell_index = doorbell;
>> +	hwq->queue_size_encoded = ilog2(ring_size) - 3;
>> +
>> +	init_mqd(hwq, hwp);
>> +	activate_queue(priv, hwq);
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_queue *queue)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
>> +
>> +	deactivate_queue(priv, hwq);
>> +
>> +	release_hqd(priv, hwq->queue);
>> +}
>> +
>> +const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
>> +	.name = "CIK static scheduler",
>> +	.create = cik_static_create,
>> +	.destroy = cik_static_destroy,
>> +	.start = cik_static_start,
>> +	.stop = cik_static_stop,
>> +	.register_process = cik_static_register_process,
>> +	.deregister_process = cik_static_deregister_process,
>> +	.queue_size = sizeof(struct cik_static_queue),
>> +	.create_queue = cik_static_create_queue,
>> +	.destroy_queue = cik_static_destroy_queue,
>> +};
>> diff --git a/drivers/gpu/hsa/radeon/kfd_vidmem.c b/drivers/gpu/hsa/radeon/kfd_vidmem.c
>> new file mode 100644
>> index 0000000..c8d3770
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_vidmem.c
>> @@ -0,0 +1,61 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include "kfd_priv.h"
>> +
>> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
>> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj)
>> +{
>> +	return kfd2kgd->allocate_mem(kfd->kgd,
>> +					size,
>> +					alignment,
>> +					(enum kgd_memory_pool)pool,
>> +					(struct kgd_mem **)mem_obj);
>> +}
>> +
>> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
>> +				uint64_t *vmid0_address)
>> +{
>> +	return kfd2kgd->gpumap_mem(kfd->kgd,
>> +					(struct kgd_mem *)mem_obj,
>> +					vmid0_address);
>> +}
>> +
>> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
>> +{
>> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
>> +}
>> +
>> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> --
>> 1.9.1
>>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 11/83] hsa/radeon: Add scheduler code
@ 2014-07-17 11:57       ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 11:57 UTC (permalink / raw)
  To: Jerome Glisse, Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On 11/07/14 21:25, Jerome Glisse wrote:
> On Fri, Jul 11, 2014 at 12:50:11AM +0300, Oded Gabbay wrote:
>> This patch adds the code base of the scheduler, which handles queue
>> creation, deletion and scheduling on the CP of the GPU.
>>
>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>
> I would rather see all this squashed, this gave feeling that driver
> can access register which is latter remove. I know jungling with
> patch squashing can be daunting but really it makes reviewing hard
> here because i have to jump back and forth to see if thing i am looking
> at really matter in the final version.
>
> Cheers,
> Jérôme
Squashed and restructured in v2 of the patchset.
	Oded
>
>> ---
>>   drivers/gpu/hsa/radeon/Makefile               |   3 +-
>>   drivers/gpu/hsa/radeon/cik_regs.h             | 213 +++++++
>>   drivers/gpu/hsa/radeon/kfd_device.c           |   1 +
>>   drivers/gpu/hsa/radeon/kfd_registers.c        |  50 ++
>>   drivers/gpu/hsa/radeon/kfd_sched_cik_static.c | 800 ++++++++++++++++++++++++++
>>   drivers/gpu/hsa/radeon/kfd_vidmem.c           |  61 ++
>>   6 files changed, 1127 insertions(+), 1 deletion(-)
>>   create mode 100644 drivers/gpu/hsa/radeon/cik_regs.h
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_registers.c
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>>   create mode 100644 drivers/gpu/hsa/radeon/kfd_vidmem.c
>>
>> diff --git a/drivers/gpu/hsa/radeon/Makefile b/drivers/gpu/hsa/radeon/Makefile
>> index 989518a..28da10c 100644
>> --- a/drivers/gpu/hsa/radeon/Makefile
>> +++ b/drivers/gpu/hsa/radeon/Makefile
>> @@ -4,6 +4,7 @@
>>
>>   radeon_kfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o \
>>   		kfd_pasid.o kfd_topology.o kfd_process.o \
>> -		kfd_doorbell.o
>> +		kfd_doorbell.o kfd_sched_cik_static.o kfd_registers.o \
>> +		kfd_vidmem.o
>>
>>   obj-$(CONFIG_HSA_RADEON)	+= radeon_kfd.o
>> diff --git a/drivers/gpu/hsa/radeon/cik_regs.h b/drivers/gpu/hsa/radeon/cik_regs.h
>> new file mode 100644
>> index 0000000..d0cdc57
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/cik_regs.h
>> @@ -0,0 +1,213 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef CIK_REGS_H
>> +#define CIK_REGS_H
>> +
>> +#define BIF_DOORBELL_CNTL				0x530Cu
>> +
>> +#define	SRBM_GFX_CNTL					0xE44
>> +#define	PIPEID(x)					((x) << 0)
>> +#define	MEID(x)						((x) << 2)
>> +#define	VMID(x)						((x) << 4)
>> +#define	QUEUEID(x)					((x) << 8)
>> +
>> +#define	SQ_CONFIG					0x8C00
>> +
>> +#define	SH_MEM_BASES					0x8C28
>> +/* if PTR32, these are the bases for scratch and lds */
>> +#define	PRIVATE_BASE(x)					((x) << 0) /* scratch */
>> +#define	SHARED_BASE(x)					((x) << 16) /* LDS */
>> +#define	SH_MEM_APE1_BASE				0x8C2C
>> +/* if PTR32, this is the base location of GPUVM */
>> +#define	SH_MEM_APE1_LIMIT				0x8C30
>> +/* if PTR32, this is the upper limit of GPUVM */
>> +#define	SH_MEM_CONFIG					0x8C34
>> +#define	PTR32						(1 << 0)
>> +#define	ALIGNMENT_MODE(x)				((x) << 2)
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD			0
>> +#define	SH_MEM_ALIGNMENT_MODE_DWORD_STRICT		1
>> +#define	SH_MEM_ALIGNMENT_MODE_STRICT			2
>> +#define	SH_MEM_ALIGNMENT_MODE_UNALIGNED			3
>> +#define	DEFAULT_MTYPE(x)				((x) << 4)
>> +#define	APE1_MTYPE(x)					((x) << 7)
>> +
>> +/* valid for both DEFAULT_MTYPE and APE1_MTYPE */
>> +#define	MTYPE_NONCACHED					3
>> +
>> +
>> +#define SH_STATIC_MEM_CONFIG				0x9604u
>> +
>> +#define	TC_CFG_L1_LOAD_POLICY0				0xAC68
>> +#define	TC_CFG_L1_LOAD_POLICY1				0xAC6C
>> +#define	TC_CFG_L1_STORE_POLICY				0xAC70
>> +#define	TC_CFG_L2_LOAD_POLICY0				0xAC74
>> +#define	TC_CFG_L2_LOAD_POLICY1				0xAC78
>> +#define	TC_CFG_L2_STORE_POLICY0				0xAC7C
>> +#define	TC_CFG_L2_STORE_POLICY1				0xAC80
>> +#define	TC_CFG_L2_ATOMIC_POLICY				0xAC84
>> +#define	TC_CFG_L1_VOLATILE				0xAC88
>> +#define	TC_CFG_L2_VOLATILE				0xAC8C
>> +
>> +#define CP_PQ_WPTR_POLL_CNTL				0xC20C
>> +#define	WPTR_POLL_EN					(1 << 31)
>> +
>> +#define CP_ME1_PIPE0_INT_CNTL				0xC214
>> +#define CP_ME1_PIPE1_INT_CNTL				0xC218
>> +#define CP_ME1_PIPE2_INT_CNTL				0xC21C
>> +#define CP_ME1_PIPE3_INT_CNTL				0xC220
>> +#define CP_ME2_PIPE0_INT_CNTL				0xC224
>> +#define CP_ME2_PIPE1_INT_CNTL				0xC228
>> +#define CP_ME2_PIPE2_INT_CNTL				0xC22C
>> +#define CP_ME2_PIPE3_INT_CNTL				0xC230
>> +#define DEQUEUE_REQUEST_INT_ENABLE			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_ENABLE			(1 << 17)
>> +#define PRIV_REG_INT_ENABLE				(1 << 23)
>> +#define TIME_STAMP_INT_ENABLE				(1 << 26)
>> +#define GENERIC2_INT_ENABLE				(1 << 29)
>> +#define GENERIC1_INT_ENABLE				(1 << 30)
>> +#define GENERIC0_INT_ENABLE				(1 << 31)
>> +#define CP_ME1_PIPE0_INT_STATUS				0xC214
>> +#define CP_ME1_PIPE1_INT_STATUS				0xC218
>> +#define CP_ME1_PIPE2_INT_STATUS				0xC21C
>> +#define CP_ME1_PIPE3_INT_STATUS				0xC220
>> +#define CP_ME2_PIPE0_INT_STATUS				0xC224
>> +#define CP_ME2_PIPE1_INT_STATUS				0xC228
>> +#define CP_ME2_PIPE2_INT_STATUS				0xC22C
>> +#define CP_ME2_PIPE3_INT_STATUS				0xC230
>> +#define DEQUEUE_REQUEST_INT_STATUS			(1 << 13)
>> +#define WRM_POLL_TIMEOUT_INT_STATUS			(1 << 17)
>> +#define PRIV_REG_INT_STATUS				(1 << 23)
>> +#define TIME_STAMP_INT_STATUS				(1 << 26)
>> +#define GENERIC2_INT_STATUS				(1 << 29)
>> +#define GENERIC1_INT_STATUS				(1 << 30)
>> +#define GENERIC0_INT_STATUS				(1 << 31)
>> +
>> +#define CP_HPD_EOP_BASE_ADDR				0xC904
>> +#define CP_HPD_EOP_BASE_ADDR_HI				0xC908
>> +#define CP_HPD_EOP_VMID					0xC90C
>> +#define CP_HPD_EOP_CONTROL				0xC910
>> +#define	EOP_SIZE(x)					((x) << 0)
>> +#define	EOP_SIZE_MASK					(0x3f << 0)
>> +#define CP_MQD_BASE_ADDR				0xC914
>> +#define CP_MQD_BASE_ADDR_HI				0xC918
>> +#define CP_HQD_ACTIVE					0xC91C
>> +#define CP_HQD_VMID					0xC920
>> +
>> +#define CP_HQD_PERSISTENT_STATE				0xC924u
>> +#define	DEFAULT_CP_HQD_PERSISTENT_STATE			(0x33U << 8)
>> +
>> +#define CP_HQD_PIPE_PRIORITY				0xC928u
>> +#define CP_HQD_QUEUE_PRIORITY				0xC92Cu
>> +#define CP_HQD_QUANTUM					0xC930u
>> +#define	QUANTUM_EN					1U
>> +#define	QUANTUM_SCALE_1MS				(1U << 4)
>> +#define	QUANTUM_DURATION(x)				((x) << 8)
>> +
>> +#define CP_HQD_PQ_BASE					0xC934
>> +#define CP_HQD_PQ_BASE_HI				0xC938
>> +#define CP_HQD_PQ_RPTR					0xC93C
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR			0xC940
>> +#define CP_HQD_PQ_RPTR_REPORT_ADDR_HI			0xC944
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR			0xC948
>> +#define CP_HQD_PQ_WPTR_POLL_ADDR_HI			0xC94C
>> +#define CP_HQD_PQ_DOORBELL_CONTROL			0xC950
>> +#define	DOORBELL_OFFSET(x)				((x) << 2)
>> +#define	DOORBELL_OFFSET_MASK				(0x1fffff << 2)
>> +#define	DOORBELL_SOURCE					(1 << 28)
>> +#define	DOORBELL_SCHD_HIT				(1 << 29)
>> +#define	DOORBELL_EN					(1 << 30)
>> +#define	DOORBELL_HIT					(1 << 31)
>> +#define CP_HQD_PQ_WPTR					0xC954
>> +#define CP_HQD_PQ_CONTROL				0xC958
>> +#define	QUEUE_SIZE(x)					((x) << 0)
>> +#define	QUEUE_SIZE_MASK					(0x3f << 0)
>> +#define	RPTR_BLOCK_SIZE(x)				((x) << 8)
>> +#define	RPTR_BLOCK_SIZE_MASK				(0x3f << 8)
>> +#define	MIN_AVAIL_SIZE(x)				((x) << 20)
>> +#define	PQ_ATC_EN					(1 << 23)
>> +#define	PQ_VOLATILE					(1 << 26)
>> +#define	NO_UPDATE_RPTR					(1 << 27)
>> +#define	UNORD_DISPATCH					(1 << 28)
>> +#define	ROQ_PQ_IB_FLIP					(1 << 29)
>> +#define	PRIV_STATE					(1 << 30)
>> +#define	KMD_QUEUE					(1 << 31)
>> +
>> +#define	DEFAULT_RPTR_BLOCK_SIZE				RPTR_BLOCK_SIZE(5)
>> +#define	DEFAULT_MIN_AVAIL_SIZE				MIN_AVAIL_SIZE(3)
>> +
>> +#define CP_HQD_IB_BASE_ADDR				0xC95Cu
>> +#define CP_HQD_IB_BASE_ADDR_HI				0xC960u
>> +#define CP_HQD_IB_RPTR					0xC964u
>> +#define CP_HQD_IB_CONTROL				0xC968u
>> +#define	IB_ATC_EN					(1U << 23)
>> +#define	DEFAULT_MIN_IB_AVAIL_SIZE			(3U << 20)
>> +
>> +#define CP_HQD_DEQUEUE_REQUEST				0xC974
>> +#define	DEQUEUE_REQUEST_DRAIN				1
>> +
>> +#define CP_HQD_SEMA_CMD					0xC97Cu
>> +#define CP_HQD_MSG_TYPE					0xC980u
>> +#define CP_HQD_ATOMIC0_PREOP_LO				0xC984u
>> +#define CP_HQD_ATOMIC0_PREOP_HI				0xC988u
>> +#define CP_HQD_ATOMIC1_PREOP_LO				0xC98Cu
>> +#define CP_HQD_ATOMIC1_PREOP_HI				0xC990u
>> +#define CP_HQD_HQ_SCHEDULER0				0xC994u
>> +#define CP_HQD_HQ_SCHEDULER1				0xC998u
>> +
>> +
>> +#define CP_MQD_CONTROL					0xC99C
>> +#define	MQD_VMID(x)					((x) << 0)
>> +#define	MQD_VMID_MASK					(0xf << 0)
>> +#define	MQD_CONTROL_PRIV_STATE_EN			(1U << 8)
>> +
>> +#define GRBM_GFX_INDEX					0x30800
>> +#define	INSTANCE_INDEX(x)				((x) << 0)
>> +#define	SH_INDEX(x)					((x) << 8)
>> +#define	SE_INDEX(x)					((x) << 16)
>> +#define	SH_BROADCAST_WRITES				(1 << 29)
>> +#define	INSTANCE_BROADCAST_WRITES			(1 << 30)
>> +#define	SE_BROADCAST_WRITES				(1 << 31)
>> +
>> +#define SQC_CACHES					0x30d20
>> +#define SQC_POLICY					0x8C38u
>> +#define SQC_VOLATILE					0x8C3Cu
>> +
>> +#define CP_PERFMON_CNTL					0x36020
>> +
>> +#define ATC_VMID0_PASID_MAPPING				0x339Cu
>> +#define	ATC_VMID_PASID_MAPPING_UPDATE_STATUS		0x3398u
>> +#define	ATC_VMID_PASID_MAPPING_VALID			(1U << 31)
>> +
>> +#define ATC_VM_APERTURE0_CNTL				0x3310u
>> +#define	ATS_ACCESS_MODE_NEVER				0
>> +#define	ATS_ACCESS_MODE_ALWAYS				1
>> +
>> +#define ATC_VM_APERTURE0_CNTL2				0x3318u
>> +#define ATC_VM_APERTURE0_HIGH_ADDR			0x3308u
>> +#define ATC_VM_APERTURE0_LOW_ADDR			0x3300u
>> +#define ATC_VM_APERTURE1_CNTL				0x3314u
>> +#define ATC_VM_APERTURE1_CNTL2				0x331Cu
>> +#define ATC_VM_APERTURE1_HIGH_ADDR			0x330Cu
>> +#define ATC_VM_APERTURE1_LOW_ADDR			0x3304u
>> +
>> +#endif
>> diff --git a/drivers/gpu/hsa/radeon/kfd_device.c b/drivers/gpu/hsa/radeon/kfd_device.c
>> index 4e9fe6c..465c822 100644
>> --- a/drivers/gpu/hsa/radeon/kfd_device.c
>> +++ b/drivers/gpu/hsa/radeon/kfd_device.c
>> @@ -28,6 +28,7 @@
>>   #include "kfd_scheduler.h"
>>
>>   static const struct kfd_device_info bonaire_device_info = {
>> +	.scheduler_class = &radeon_kfd_cik_static_scheduler_class,
>>   	.max_pasid_bits = 16,
>>   };
>>
>> diff --git a/drivers/gpu/hsa/radeon/kfd_registers.c b/drivers/gpu/hsa/radeon/kfd_registers.c
>> new file mode 100644
>> index 0000000..223debd
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_registers.c
>> @@ -0,0 +1,50 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/io.h>
>> +#include "kfd_priv.h"
>> +
>> +/* In KFD, "reg" is the byte offset of the register. */
>> +static void __iomem *reg_address(struct kfd_dev *dev, uint32_t reg)
>> +{
>> +	return dev->regs + reg;
>> +}
>> +
>> +void radeon_kfd_write_reg(struct kfd_dev *dev, uint32_t reg, uint32_t value)
>> +{
>> +	writel(value, reg_address(dev, reg));
>> +}
>> +
>> +uint32_t radeon_kfd_read_reg(struct kfd_dev *dev, uint32_t reg)
>> +{
>> +	return readl(reg_address(dev, reg));
>> +}
>> +
>> +void radeon_kfd_lock_srbm_index(struct kfd_dev *dev)
>> +{
>> +	kfd2kgd->lock_srbm_gfx_cntl(dev->kgd);
>> +}
>> +
>> +void radeon_kfd_unlock_srbm_index(struct kfd_dev *dev)
>> +{
>> +	kfd2kgd->unlock_srbm_gfx_cntl(dev->kgd);
>> +}
>> diff --git a/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>> new file mode 100644
>> index 0000000..b986ff9
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_sched_cik_static.c
>> @@ -0,0 +1,800 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/log2.h>
>> +#include <linux/mutex.h>
>> +#include <linux/slab.h>
>> +#include <linux/types.h>
>> +#include <linux/uaccess.h>
>> +#include "kfd_priv.h"
>> +#include "kfd_scheduler.h"
>> +#include "cik_regs.h"
>> +
>> +/* CIK CP hardware is arranged with 8 queues per pipe and 8 pipes per MEC (microengine for compute).
>> + * The first MEC is ME 1 with the GFX ME as ME 0.
>> + * We split the CP with the KGD, they take the first N pipes and we take the rest.
>> + */
>> +#define CIK_QUEUES_PER_PIPE 8
>> +#define CIK_PIPES_PER_MEC 4
>> +
>> +#define CIK_MAX_PIPES (2 * CIK_PIPES_PER_MEC)
>> +
>> +#define CIK_NUM_VMID 16
>> +
>> +#define CIK_HPD_SIZE_LOG2 11
>> +#define CIK_HPD_SIZE (1U << CIK_HPD_SIZE_LOG2)
>> +#define CIK_HPD_ALIGNMENT 256
>> +#define CIK_MQD_ALIGNMENT 4
>> +
>> +#pragma pack(push, 4)
>> +
>> +struct cik_hqd_registers {
>> +	u32 cp_mqd_base_addr;
>> +	u32 cp_mqd_base_addr_hi;
>> +	u32 cp_hqd_active;
>> +	u32 cp_hqd_vmid;
>> +	u32 cp_hqd_persistent_state;
>> +	u32 cp_hqd_pipe_priority;
>> +	u32 cp_hqd_queue_priority;
>> +	u32 cp_hqd_quantum;
>> +	u32 cp_hqd_pq_base;
>> +	u32 cp_hqd_pq_base_hi;
>> +	u32 cp_hqd_pq_rptr;
>> +	u32 cp_hqd_pq_rptr_report_addr;
>> +	u32 cp_hqd_pq_rptr_report_addr_hi;
>> +	u32 cp_hqd_pq_wptr_poll_addr;
>> +	u32 cp_hqd_pq_wptr_poll_addr_hi;
>> +	u32 cp_hqd_pq_doorbell_control;
>> +	u32 cp_hqd_pq_wptr;
>> +	u32 cp_hqd_pq_control;
>> +	u32 cp_hqd_ib_base_addr;
>> +	u32 cp_hqd_ib_base_addr_hi;
>> +	u32 cp_hqd_ib_rptr;
>> +	u32 cp_hqd_ib_control;
>> +	u32 cp_hqd_iq_timer;
>> +	u32 cp_hqd_iq_rptr;
>> +	u32 cp_hqd_dequeue_request;
>> +	u32 cp_hqd_dma_offload;
>> +	u32 cp_hqd_sema_cmd;
>> +	u32 cp_hqd_msg_type;
>> +	u32 cp_hqd_atomic0_preop_lo;
>> +	u32 cp_hqd_atomic0_preop_hi;
>> +	u32 cp_hqd_atomic1_preop_lo;
>> +	u32 cp_hqd_atomic1_preop_hi;
>> +	u32 cp_hqd_hq_scheduler0;
>> +	u32 cp_hqd_hq_scheduler1;
>> +	u32 cp_mqd_control;
>> +};
>> +
>> +struct cik_mqd {
>> +	u32 header;
>> +	u32 dispatch_initiator;
>> +	u32 dimensions[3];
>> +	u32 start_idx[3];
>> +	u32 num_threads[3];
>> +	u32 pipeline_stat_enable;
>> +	u32 perf_counter_enable;
>> +	u32 pgm[2];
>> +	u32 tba[2];
>> +	u32 tma[2];
>> +	u32 pgm_rsrc[2];
>> +	u32 vmid;
>> +	u32 resource_limits;
>> +	u32 static_thread_mgmt01[2];
>> +	u32 tmp_ring_size;
>> +	u32 static_thread_mgmt23[2];
>> +	u32 restart[3];
>> +	u32 thread_trace_enable;
>> +	u32 reserved1;
>> +	u32 user_data[16];
>> +	u32 vgtcs_invoke_count[2];
>> +	struct cik_hqd_registers queue_state;
>> +	u32 dequeue_cntr;
>> +	u32 interrupt_queue[64];
>> +};
>> +
>> +struct cik_mqd_padded {
>> +	struct cik_mqd mqd;
>> +	u8 padding[1024 - sizeof(struct cik_mqd)]; /* Pad MQD out to 1KB. (HW requires 4-byte alignment.) */
>> +};
>> +
>> +#pragma pack(pop)
>> +
>> +struct cik_static_private {
>> +	struct kfd_dev *dev;
>> +
>> +	struct mutex mutex;
>> +
>> +	unsigned int first_pipe;
>> +	unsigned int num_pipes;
>> +
>> +	unsigned long free_vmid_mask; /* unsigned long to make set/clear_bit happy */
>> +
>> +	/* Everything below here is offset by first_pipe. E.g. bit 0 in
>> +	 * free_queues is queue 0 in pipe first_pipe
>> +	 */
>> +
>> +	 /* Queue q on pipe p is at bit QUEUES_PER_PIPE * p + q. */
>> +	unsigned long free_queues[DIV_ROUND_UP(CIK_MAX_PIPES * CIK_QUEUES_PER_PIPE, BITS_PER_LONG)];
>> +
>> +	kfd_mem_obj hpd_mem;	/* Single allocation for HPDs for all KFD pipes. */
>> +	kfd_mem_obj mqd_mem;	/* Single allocation for all MQDs for all KFD
>> +				 * pipes. This is actually struct cik_mqd_padded. */
>> +	uint64_t hpd_addr;	/* GPU address for hpd_mem. */
>> +	uint64_t mqd_addr;	/* GPU address for mqd_mem. */
>> +	 /*
>> +	  * Pointer for mqd_mem.
>> +	  * We keep this mapped because multiple processes may need to access it
>> +	  * in parallel and this is simpler than controlling concurrent kmaps
>> +	  */
>> +	struct cik_mqd_padded *mqds;
>> +};
>> +
>> +struct cik_static_process {
>> +	unsigned int vmid;
>> +	pasid_t pasid;
>> +};
>> +
>> +struct cik_static_queue {
>> +	unsigned int queue; /* + first_pipe * QUEUES_PER_PIPE */
>> +
>> +	uint64_t mqd_addr;
>> +	struct cik_mqd *mqd;
>> +
>> +	void __user *pq_addr;
>> +	void __user *rptr_address;
>> +	doorbell_t __user *wptr_address;
>> +	uint32_t doorbell_index;
>> +
>> +	uint32_t queue_size_encoded; /* CP_HQD_PQ_CONTROL.QUEUE_SIZE takes the queue size as log2(size) - 3. */
>> +};
>> +
>> +static uint32_t lower_32(uint64_t x)
>> +{
>> +	return (uint32_t)x;
>> +}
>> +
>> +static uint32_t upper_32(uint64_t x)
>> +{
>> +	return (uint32_t)(x >> 32);
>> +}
>> +
>> +/* SRBM_GFX_CNTL provides the MEC/pipe/queue and vmid for many registers that are
>> + * In particular, CP_HQD_* and CP_MQD_* are instanced for each queue. CP_HPD_* are instanced for each pipe.
>> + * SH_MEM_* are instanced per-VMID.
>> + *
>> + * We provide queue_select, pipe_select and vmid_select helpers that should be used before accessing
>> + * registers from those groups. Note that these overwrite each other, e.g. after vmid_select the current
>> + * selected MEC/pipe/queue is undefined.
>> + *
>> + * SRBM_GFX_CNTL and the registers it indexes are shared with KGD. You must be holding the srbm_gfx_cntl
>> + * lock via lock_srbm_index before setting SRBM_GFX_CNTL or accessing any of the instanced registers.
>> + */
>> +static uint32_t make_srbm_gfx_cntl_mpqv(unsigned int me, unsigned int pipe, unsigned int queue, unsigned int vmid)
>> +{
>> +	return QUEUEID(queue) | VMID(vmid) | MEID(me) | PIPEID(pipe);
>> +}
>> +
>> +static void pipe_select(struct cik_static_private *priv, unsigned int pipe)
>> +{
>> +	unsigned int pipe_in_mec = (pipe + priv->first_pipe) % CIK_PIPES_PER_MEC;
>> +	unsigned int mec = (pipe + priv->first_pipe) / CIK_PIPES_PER_MEC;
>> +
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, 0, 0));
>> +}
>> +
>> +static void queue_select(struct cik_static_private *priv, unsigned int queue)
>> +{
>> +	unsigned int queue_in_pipe = queue % CIK_QUEUES_PER_PIPE;
>> +	unsigned int pipe = queue / CIK_QUEUES_PER_PIPE + priv->first_pipe;
>> +	unsigned int pipe_in_mec = pipe % CIK_PIPES_PER_MEC;
>> +	unsigned int mec = pipe / CIK_PIPES_PER_MEC;
>> +
>> +#if 0
>> +	dev_err(radeon_kfd_chardev(), "queue select %d = %u/%u/%u = 0x%08x\n", queue, mec+1, pipe_in_mec, queue_in_pipe,
>> +		make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
>> +#endif
>> +
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(mec+1, pipe_in_mec, queue_in_pipe, 0));
>> +}
>> +
>> +static void vmid_select(struct cik_static_private *priv, unsigned int vmid)
>> +{
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, make_srbm_gfx_cntl_mpqv(0, 0, 0, vmid));
>> +}
>> +
>> +static void lock_srbm_index(struct cik_static_private *priv)
>> +{
>> +	radeon_kfd_lock_srbm_index(priv->dev);
>> +}
>> +
>> +static void unlock_srbm_index(struct cik_static_private *priv)
>> +{
>> +	WRITE_REG(priv->dev, SRBM_GFX_CNTL, 0);	/* Be nice to KGD, reset indexed CP registers to the GFX pipe. */
>> +	radeon_kfd_unlock_srbm_index(priv->dev);
>> +}
>> +
>> +/* One-time setup for all compute pipes. They need to be programmed with the address & size of the HPD EOP buffer. */
>> +static void init_pipes(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	lock_srbm_index(priv);
>> +
>> +	for (i = 0; i < priv->num_pipes; i++) {
>> +		uint64_t pipe_hpd_addr = priv->hpd_addr + i * CIK_HPD_SIZE;
>> +
>> +		pipe_select(priv, i);
>> +
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR, lower_32(pipe_hpd_addr >> 8));
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_BASE_ADDR_HI, upper_32(pipe_hpd_addr >> 8));
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_VMID, 0);
>> +		WRITE_REG(priv->dev, CP_HPD_EOP_CONTROL, CIK_HPD_SIZE_LOG2 - 1);
>> +	}
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +/* Program the VMID -> PASID mapping for one VMID.
>> + * PASID 0 is special: it means to associate no PASID with that VMID.
>> + * This function waits for the VMID/PASID mapping to complete.
>> + */
>> +static void set_vmid_pasid_mapping(struct cik_static_private *priv, unsigned int vmid, pasid_t pasid)
>> +{
>> +	/* We have to assume that there is no outstanding mapping.
>> +	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because a mapping
>> +	 * is in progress or because a mapping finished and the SW cleared it.
>> +	 * So the protocol is to always wait & clear.
>> +	 */
>> +
>> +	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid | ATC_VMID_PASID_MAPPING_VALID;
>> +
>> +	WRITE_REG(priv->dev, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t), pasid_mapping);
>> +
>> +	while (!(READ_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
>> +		cpu_relax();
>> +	WRITE_REG(priv->dev, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
>> +}
>> +
>> +static uint32_t compute_sh_mem_bases_64bit(unsigned int top_address_nybble)
>> +{
>> +	/* In 64-bit mode, we can only control the top 3 bits of the LDS, scratch and GPUVM apertures.
>> +	 * The hardware fills in the remaining 59 bits according to the following pattern:
>> +	 * LDS:		X0000000'00000000 - X0000001'00000000 (4GB)
>> +	 * Scratch:	X0000001'00000000 - X0000002'00000000 (4GB)
>> +	 * GPUVM:	Y0010000'00000000 - Y0020000'00000000 (1TB)
>> +	 *
>> +	 * (where X/Y is the configurable nybble with the low-bit 0)
>> +	 *
>> +	 * LDS and scratch will have the same top nybble programmed in the top 3 bits of SH_MEM_BASES.PRIVATE_BASE.
>> +	 * GPUVM can have a different top nybble programmed in the top 3 bits of SH_MEM_BASES.SHARED_BASE.
>> +	 * We don't bother to support different top nybbles for LDS/Scratch and GPUVM.
>> +	 */
>> +
>> +	BUG_ON((top_address_nybble & 1) || top_address_nybble > 0xE);
>> +
>> +	return PRIVATE_BASE(top_address_nybble << 12) | SHARED_BASE(top_address_nybble << 12);
>> +}
>> +
>> +/* Initial programming for all ATS registers.
>> + * - enable ATS for all compute VMIDs
>> + * - clear the VMID/PASID mapping for all compute VMIDS
>> + * - program the shader core flat address settings:
>> + * -- 64-bit mode
>> + * -- unaligned access allowed
>> + * -- noncached (this is the only CPU-coherent mode in CIK)
>> + * -- APE 1 disabled
>> + */
>> +static void init_ats(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	/* Enable self-ringing doorbell recognition and direct the BIF to send
>> +	 * untranslated writes to the IOMMU before comparing to the aperture.*/
>> +	WRITE_REG(priv->dev, BIF_DOORBELL_CNTL, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_ALWAYS);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, priv->free_vmid_mask);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_LOW_ADDR, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_HIGH_ADDR, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_CNTL2, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_LOW_ADDR, 0);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE1_HIGH_ADDR, 0);
>> +
>> +	lock_srbm_index(priv);
>> +
>> +	for (i = 0; i < CIK_NUM_VMID; i++) {
>> +		if (priv->free_vmid_mask & (1U << i)) {
>> +			uint32_t sh_mem_config;
>> +
>> +			set_vmid_pasid_mapping(priv, i, 0);
>> +
>> +			vmid_select(priv, i);
>> +
>> +			sh_mem_config = ALIGNMENT_MODE(SH_MEM_ALIGNMENT_MODE_UNALIGNED);
>> +			sh_mem_config |= DEFAULT_MTYPE(MTYPE_NONCACHED);
>> +
>> +			WRITE_REG(priv->dev, SH_MEM_CONFIG, sh_mem_config);
>> +
>> +			/* Configure apertures:
>> +			 * LDS:		0x60000000'00000000 - 0x60000001'00000000 (4GB)
>> +			 * Scratch:	0x60000001'00000000 - 0x60000002'00000000 (4GB)
>> +			 * GPUVM:	0x60010000'00000000 - 0x60020000'00000000 (1TB)
>> +			 */
>> +			WRITE_REG(priv->dev, SH_MEM_BASES, compute_sh_mem_bases_64bit(6));
>> +
>> +			/* Scratch aperture is not supported for now. */
>> +			WRITE_REG(priv->dev, SH_STATIC_MEM_CONFIG, 0);
>> +
>> +			/* APE1 disabled for now. */
>> +			WRITE_REG(priv->dev, SH_MEM_APE1_BASE, 1);
>> +			WRITE_REG(priv->dev, SH_MEM_APE1_LIMIT, 0);
>> +		}
>> +	}
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +static void exit_ats(struct cik_static_private *priv)
>> +{
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < CIK_NUM_VMID; i++)
>> +		if (priv->free_vmid_mask & (1U << i))
>> +			set_vmid_pasid_mapping(priv, i, 0);
>> +
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL, ATS_ACCESS_MODE_NEVER);
>> +	WRITE_REG(priv->dev, ATC_VM_APERTURE0_CNTL2, 0);
>> +}
>> +
>> +static struct cik_static_private *kfd_scheduler_to_private(struct kfd_scheduler *scheduler)
>> +{
>> +	return (struct cik_static_private *)scheduler;
>> +}
>> +
>> +static struct cik_static_process *kfd_process_to_private(struct kfd_scheduler_process *process)
>> +{
>> +	return (struct cik_static_process *)process;
>> +}
>> +
>> +static struct cik_static_queue *kfd_queue_to_private(struct kfd_scheduler_queue *queue)
>> +{
>> +	return (struct cik_static_queue *)queue;
>> +}
>> +
>> +static int cik_static_create(struct kfd_dev *dev, struct kfd_scheduler **scheduler)
>> +{
>> +	struct cik_static_private *priv;
>> +	unsigned int i;
>> +	int err;
>> +	void *hpdptr;
>> +
>> +	priv = kmalloc(sizeof(*priv), GFP_KERNEL);
>> +	if (priv == NULL)
>> +		return -ENOMEM;
>> +
>> +	mutex_init(&priv->mutex);
>> +
>> +	priv->dev = dev;
>> +
>> +	priv->first_pipe = dev->shared_resources.first_compute_pipe;
>> +	priv->num_pipes = dev->shared_resources.compute_pipe_count;
>> +
>> +	for (i = 0; i < priv->num_pipes * CIK_QUEUES_PER_PIPE; i++)
>> +		__set_bit(i, priv->free_queues);
>> +
>> +	priv->free_vmid_mask = dev->shared_resources.compute_vmid_bitmap;
>> +
>> +	/*
>> +	 * Allocate memory for the HPDs. This is hardware-owned per-pipe data.
>> +	 * The driver never accesses this memory after zeroing it. It doesn't even have
>> +	 * to be saved/restored on suspend/resume because it contains no data when there
>> +	 * are no active queues.
>> +	 */
>> +	err = radeon_kfd_vidmem_alloc(dev,
>> +				      CIK_HPD_SIZE * priv->num_pipes * 2,
>> +				      PAGE_SIZE,
>> +				      KFD_MEMPOOL_SYSTEM_WRITECOMBINE,
>> +				      &priv->hpd_mem);
>> +	if (err)
>> +		goto err_hpd_alloc;
>> +
>> +	err = radeon_kfd_vidmem_kmap(dev, priv->hpd_mem, &hpdptr);
>> +	if (err)
>> +		goto err_hpd_kmap;
>> +	memset(hpdptr, 0, CIK_HPD_SIZE * priv->num_pipes);
>> +	radeon_kfd_vidmem_unkmap(dev, priv->hpd_mem);
>> +
>> +	/*
>> +	 * Allocate memory for all the MQDs.
>> +	 * These are per-queue data that is hardware owned but with driver init.
>> +	 * The driver has to copy this data into HQD registers when a
>> +	 * pipe is (re)activated.
>> +	 */
>> +	err = radeon_kfd_vidmem_alloc(dev,
>> +				      sizeof(struct cik_mqd_padded) * priv->num_pipes * CIK_QUEUES_PER_PIPE,
>> +				      PAGE_SIZE,
>> +				      KFD_MEMPOOL_SYSTEM_CACHEABLE,
>> +				      &priv->mqd_mem);
>> +	if (err)
>> +		goto err_mqd_alloc;
>> +	radeon_kfd_vidmem_kmap(dev, priv->mqd_mem, (void **)&priv->mqds);
>> +	if (err)
>> +		goto err_mqd_kmap;
>> +
>> +	*scheduler = (struct kfd_scheduler *)priv;
>> +
>> +	return 0;
>> +
>> +err_mqd_kmap:
>> +	radeon_kfd_vidmem_free(dev, priv->mqd_mem);
>> +err_mqd_alloc:
>> +err_hpd_kmap:
>> +	radeon_kfd_vidmem_free(dev, priv->hpd_mem);
>> +err_hpd_alloc:
>> +	mutex_destroy(&priv->mutex);
>> +	kfree(priv);
>> +	return err;
>> +}
>> +
>> +static void cik_static_destroy(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	radeon_kfd_vidmem_unkmap(priv->dev, priv->mqd_mem);
>> +	radeon_kfd_vidmem_free(priv->dev, priv->mqd_mem);
>> +	radeon_kfd_vidmem_free(priv->dev, priv->hpd_mem);
>> +
>> +	mutex_destroy(&priv->mutex);
>> +
>> +	kfree(priv);
>> +}
>> +
>> +static void cik_static_start(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->hpd_mem, &priv->hpd_addr);
>> +	radeon_kfd_vidmem_gpumap(priv->dev, priv->mqd_mem, &priv->mqd_addr);
>> +
>> +	init_pipes(priv);
>> +	init_ats(priv);
>> +}
>> +
>> +static void cik_static_stop(struct kfd_scheduler *scheduler)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	exit_ats(priv);
>> +
>> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->hpd_mem);
>> +	radeon_kfd_vidmem_ungpumap(priv->dev, priv->mqd_mem);
>> +}
>> +
>> +static bool allocate_vmid(struct cik_static_private *priv, unsigned int *vmid)
>> +{
>> +	bool ok = false;
>> +
>> +	mutex_lock(&priv->mutex);
>> +
>> +	if (priv->free_vmid_mask != 0) {
>> +		unsigned int v = __ffs64(priv->free_vmid_mask);
>> +
>> +		clear_bit(v, &priv->free_vmid_mask);
>> +		*vmid = v;
>> +
>> +		ok = true;
>> +	}
>> +
>> +	mutex_unlock(&priv->mutex);
>> +
>> +	return ok;
>> +}
>> +
>> +static void release_vmid(struct cik_static_private *priv, unsigned int vmid)
>> +{
>> +	/* It's okay to race against allocate_vmid because this only adds bits to free_vmid_mask.
>> +	 * And set_bit/clear_bit are atomic wrt each other. */
>> +	set_bit(vmid, &priv->free_vmid_mask);
>> +}
>> +
>> +static void setup_vmid_for_process(struct cik_static_private *priv, struct cik_static_process *p)
>> +{
>> +	set_vmid_pasid_mapping(priv, p->vmid, p->pasid);
>> +
>> +	/*
>> +	 * SH_MEM_CONFIG and others need to be programmed differently
>> +	 * for 32/64-bit processes. And maybe other reasons.
>> +	 */
>> +}
>> +
>> +static int
>> +cik_static_register_process(struct kfd_scheduler *scheduler, struct kfd_process *process,
>> +			    struct kfd_scheduler_process **scheduler_process)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +
>> +	struct cik_static_process *hwp;
>> +
>> +	hwp = kmalloc(sizeof(*hwp), GFP_KERNEL);
>> +	if (hwp == NULL)
>> +		return -ENOMEM;
>> +
>> +	if (!allocate_vmid(priv, &hwp->vmid)) {
>> +		kfree(hwp);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	hwp->pasid = process->pasid;
>> +
>> +	setup_vmid_for_process(priv, hwp);
>> +
>> +	*scheduler_process = (struct kfd_scheduler_process *)hwp;
>> +
>> +	return 0;
>> +}
>> +
>> +static void cik_static_deregister_process(struct kfd_scheduler *scheduler,
>> +				struct kfd_scheduler_process *scheduler_process)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_process *pp = kfd_process_to_private(scheduler_process);
>> +
>> +	release_vmid(priv, pp->vmid);
>> +	kfree(pp);
>> +}
>> +
>> +static bool allocate_hqd(struct cik_static_private *priv, unsigned int *queue)
>> +{
>> +	bool ok = false;
>> +	unsigned int q;
>> +
>> +	mutex_lock(&priv->mutex);
>> +
>> +	q = find_first_bit(priv->free_queues, priv->num_pipes * CIK_QUEUES_PER_PIPE);
>> +
>> +	if (q != priv->num_pipes * CIK_QUEUES_PER_PIPE) {
>> +		clear_bit(q, priv->free_queues);
>> +		*queue = q;
>> +
>> +		ok = true;
>> +	}
>> +
>> +	mutex_unlock(&priv->mutex);
>> +
>> +	return ok;
>> +}
>> +
>> +static void release_hqd(struct cik_static_private *priv, unsigned int queue)
>> +{
>> +	/* It's okay to race against allocate_hqd because this only adds bits to free_queues.
>> +	 * And set_bit/clear_bit are atomic wrt each other. */
>> +	set_bit(queue, priv->free_queues);
>> +}
>> +
>> +static void init_mqd(const struct cik_static_queue *queue, const struct cik_static_process *process)
>> +{
>> +	struct cik_mqd *mqd = queue->mqd;
>> +
>> +	memset(mqd, 0, sizeof(*mqd));
>> +
>> +	mqd->header = 0xC0310800;
>> +	mqd->pipeline_stat_enable = 1;
>> +	mqd->static_thread_mgmt01[0] = 0xffffffff;
>> +	mqd->static_thread_mgmt01[1] = 0xffffffff;
>> +	mqd->static_thread_mgmt23[0] = 0xffffffff;
>> +	mqd->static_thread_mgmt23[1] = 0xffffffff;
>> +
>> +	mqd->queue_state.cp_mqd_base_addr = lower_32(queue->mqd_addr);
>> +	mqd->queue_state.cp_mqd_base_addr_hi = upper_32(queue->mqd_addr);
>> +	mqd->queue_state.cp_mqd_control = MQD_CONTROL_PRIV_STATE_EN;
>> +
>> +	mqd->queue_state.cp_hqd_pq_base = lower_32((uintptr_t)queue->pq_addr >> 8);
>> +	mqd->queue_state.cp_hqd_pq_base_hi = upper_32((uintptr_t)queue->pq_addr >> 8);
>> +	mqd->queue_state.cp_hqd_pq_control = QUEUE_SIZE(queue->queue_size_encoded) | DEFAULT_RPTR_BLOCK_SIZE
>> +					    | DEFAULT_MIN_AVAIL_SIZE | PQ_ATC_EN;
>> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr = lower_32((uintptr_t)queue->rptr_address);
>> +	mqd->queue_state.cp_hqd_pq_rptr_report_addr_hi = upper_32((uintptr_t)queue->rptr_address);
>> +	mqd->queue_state.cp_hqd_pq_doorbell_control = DOORBELL_OFFSET(queue->doorbell_index) | DOORBELL_EN;
>> +	mqd->queue_state.cp_hqd_vmid = process->vmid;
>> +	mqd->queue_state.cp_hqd_active = 1;
>> +
>> +	mqd->queue_state.cp_hqd_persistent_state = DEFAULT_CP_HQD_PERSISTENT_STATE;
>> +
>> +	/* The values for these 3 are from WinKFD. */
>> +	mqd->queue_state.cp_hqd_quantum = QUANTUM_EN | QUANTUM_SCALE_1MS | QUANTUM_DURATION(10);
>> +	mqd->queue_state.cp_hqd_pipe_priority = 1;
>> +	mqd->queue_state.cp_hqd_queue_priority = 15;
>> +
>> +	mqd->queue_state.cp_hqd_ib_control = IB_ATC_EN | DEFAULT_MIN_IB_AVAIL_SIZE;
>> +}
>> +
>> +/* Write the HQD registers and activate the queue.
>> + * Requires that SRBM_GFX_CNTL has already been programmed for the queue.
>> + */
>> +static void load_hqd(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	struct kfd_dev *dev = priv->dev;
>> +	const struct cik_hqd_registers *qs = &queue->mqd->queue_state;
>> +
>> +	WRITE_REG(dev, CP_MQD_BASE_ADDR, qs->cp_mqd_base_addr);
>> +	WRITE_REG(dev, CP_MQD_BASE_ADDR_HI, qs->cp_mqd_base_addr_hi);
>> +	WRITE_REG(dev, CP_MQD_CONTROL, qs->cp_mqd_control);
>> +
>> +	WRITE_REG(dev, CP_HQD_PQ_BASE, qs->cp_hqd_pq_base);
>> +	WRITE_REG(dev, CP_HQD_PQ_BASE_HI, qs->cp_hqd_pq_base_hi);
>> +	WRITE_REG(dev, CP_HQD_PQ_CONTROL, qs->cp_hqd_pq_control);
>> +	/* DOORBELL_CONTROL before WPTR because WPTR writes are dropped if DOORBELL_HIT is set. */
>> +	WRITE_REG(dev, CP_HQD_PQ_DOORBELL_CONTROL, qs->cp_hqd_pq_doorbell_control);
>> +	WRITE_REG(dev, CP_HQD_PQ_WPTR, qs->cp_hqd_pq_wptr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR, qs->cp_hqd_pq_rptr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR, qs->cp_hqd_pq_rptr_report_addr);
>> +	WRITE_REG(dev, CP_HQD_PQ_RPTR_REPORT_ADDR_HI, qs->cp_hqd_pq_rptr_report_addr_hi);
>> +
>> +	WRITE_REG(dev, CP_HQD_VMID, qs->cp_hqd_vmid);
>> +	WRITE_REG(dev, CP_HQD_PERSISTENT_STATE, qs->cp_hqd_persistent_state);
>> +	WRITE_REG(dev, CP_HQD_QUANTUM, qs->cp_hqd_quantum);
>> +	WRITE_REG(dev, CP_HQD_PIPE_PRIORITY, qs->cp_hqd_pipe_priority);
>> +	WRITE_REG(dev, CP_HQD_QUEUE_PRIORITY, qs->cp_hqd_queue_priority);
>> +
>> +	WRITE_REG(dev, CP_HQD_IB_CONTROL, qs->cp_hqd_ib_control);
>> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR, qs->cp_hqd_ib_base_addr);
>> +	WRITE_REG(dev, CP_HQD_IB_BASE_ADDR_HI, qs->cp_hqd_ib_base_addr_hi);
>> +	WRITE_REG(dev, CP_HQD_IB_RPTR, qs->cp_hqd_ib_rptr);
>> +	WRITE_REG(dev, CP_HQD_SEMA_CMD, qs->cp_hqd_sema_cmd);
>> +	WRITE_REG(dev, CP_HQD_MSG_TYPE, qs->cp_hqd_msg_type);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_LO, qs->cp_hqd_atomic0_preop_lo);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC0_PREOP_HI, qs->cp_hqd_atomic0_preop_hi);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_LO, qs->cp_hqd_atomic1_preop_lo);
>> +	WRITE_REG(dev, CP_HQD_ATOMIC1_PREOP_HI, qs->cp_hqd_atomic1_preop_hi);
>> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER0, qs->cp_hqd_hq_scheduler0);
>> +	WRITE_REG(dev, CP_HQD_HQ_SCHEDULER1, qs->cp_hqd_hq_scheduler1);
>> +
>> +	WRITE_REG(dev, CP_HQD_ACTIVE, 1);
>> +}
>> +
>> +static void activate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	bool wptr_shadow_valid;
>> +	doorbell_t wptr_shadow;
>> +
>> +	/* Avoid sleeping while holding the SRBM lock. */
>> +	wptr_shadow_valid = !get_user(wptr_shadow, queue->wptr_address);
>> +
>> +	lock_srbm_index(priv);
>> +	queue_select(priv, queue->queue);
>> +
>> +	load_hqd(priv, queue);
>> +
>> +	/* Doorbell and wptr are special because there is a race when reactivating a queue.
>> +	 * Since doorbell writes to deactivated queues are ignored by hardware, the application
>> +	 * shadows the doorbell into memory at queue->wptr_address.
>> +	 *
>> +	 * We want the queue to automatically resume processing as if it were always active,
>> +	 * so we want to copy from queue->wptr_address into the wptr/doorbell.
>> +	 *
>> +	 * The race is that the app could write a new wptr into the doorbell before we
>> +	 * write the shadowed wptr, resulting in an old wptr written later.
>> +	 *
>> +	 * The hardware solves this ignoring CP_HQD_WPTR writes after a doorbell write.
>> +	 * So the KFD can activate the doorbell then write the shadow wptr to CP_HQD_WPTR
>> +	 * knowing it will be ignored if the user has written a more-recent doorbell.
>> +	 */
>> +	if (wptr_shadow_valid)
>> +		WRITE_REG(priv->dev, CP_HQD_PQ_WPTR, wptr_shadow);
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +static void drain_hqd(struct cik_static_private *priv)
>> +{
>> +	WRITE_REG(priv->dev, CP_HQD_DEQUEUE_REQUEST, DEQUEUE_REQUEST_DRAIN);
>> +}
>> +
>> +static void wait_hqd_inactive(struct cik_static_private *priv)
>> +{
>> +	while (READ_REG(priv->dev, CP_HQD_ACTIVE) != 0)
>> +		cpu_relax();
>> +}
>> +
>> +static void deactivate_queue(struct cik_static_private *priv, struct cik_static_queue *queue)
>> +{
>> +	lock_srbm_index(priv);
>> +	queue_select(priv, queue->queue);
>> +
>> +	drain_hqd(priv);
>> +	wait_hqd_inactive(priv);
>> +
>> +	unlock_srbm_index(priv);
>> +}
>> +
>> +#define BIT_MASK_64(high, low) (((1ULL << (high)) - 1) & ~((1ULL << (low)) - 1))
>> +#define RING_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 8))
>> +#define RWPTR_ADDRESS_BAD_BIT_MASK (~BIT_MASK_64(48, 2))
>> +
>> +#define MAX_QUEUE_SIZE (1ULL << 32)
>> +#define MIN_QUEUE_SIZE (1ULL << 10)
>> +
>> +static int
>> +cik_static_create_queue(struct kfd_scheduler *scheduler,
>> +			struct kfd_scheduler_process *process,
>> +			struct kfd_scheduler_queue *queue,
>> +			void __user *ring_address,
>> +			uint64_t ring_size,
>> +			void __user *rptr_address,
>> +			void __user *wptr_address,
>> +			unsigned int doorbell)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_process *hwp = kfd_process_to_private(process);
>> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
>> +
>> +	if ((uint64_t)ring_address & RING_ADDRESS_BAD_BIT_MASK
>> +	    || (uint64_t)rptr_address & RWPTR_ADDRESS_BAD_BIT_MASK
>> +	    || (uint64_t)wptr_address & RWPTR_ADDRESS_BAD_BIT_MASK)
>> +		return -EINVAL;
>> +
>> +	if (ring_size > MAX_QUEUE_SIZE || ring_size < MIN_QUEUE_SIZE || !is_power_of_2(ring_size))
>> +		return -EINVAL;
>> +
>> +	if (!allocate_hqd(priv, &hwq->queue))
>> +		return -ENOMEM;
>> +
>> +	hwq->mqd_addr = priv->mqd_addr + sizeof(struct cik_mqd_padded) * hwq->queue;
>> +	hwq->mqd = &priv->mqds[hwq->queue].mqd;
>> +	hwq->pq_addr = ring_address;
>> +	hwq->rptr_address = rptr_address;
>> +	hwq->wptr_address = wptr_address;
>> +	hwq->doorbell_index = doorbell;
>> +	hwq->queue_size_encoded = ilog2(ring_size) - 3;
>> +
>> +	init_mqd(hwq, hwp);
>> +	activate_queue(priv, hwq);
>> +
>> +	return 0;
>> +}
>> +
>> +static void
>> +cik_static_destroy_queue(struct kfd_scheduler *scheduler, struct kfd_scheduler_queue *queue)
>> +{
>> +	struct cik_static_private *priv = kfd_scheduler_to_private(scheduler);
>> +	struct cik_static_queue *hwq = kfd_queue_to_private(queue);
>> +
>> +	deactivate_queue(priv, hwq);
>> +
>> +	release_hqd(priv, hwq->queue);
>> +}
>> +
>> +const struct kfd_scheduler_class radeon_kfd_cik_static_scheduler_class = {
>> +	.name = "CIK static scheduler",
>> +	.create = cik_static_create,
>> +	.destroy = cik_static_destroy,
>> +	.start = cik_static_start,
>> +	.stop = cik_static_stop,
>> +	.register_process = cik_static_register_process,
>> +	.deregister_process = cik_static_deregister_process,
>> +	.queue_size = sizeof(struct cik_static_queue),
>> +	.create_queue = cik_static_create_queue,
>> +	.destroy_queue = cik_static_destroy_queue,
>> +};
>> diff --git a/drivers/gpu/hsa/radeon/kfd_vidmem.c b/drivers/gpu/hsa/radeon/kfd_vidmem.c
>> new file mode 100644
>> index 0000000..c8d3770
>> --- /dev/null
>> +++ b/drivers/gpu/hsa/radeon/kfd_vidmem.c
>> @@ -0,0 +1,61 @@
>> +/*
>> + * Copyright 2014 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include "kfd_priv.h"
>> +
>> +int radeon_kfd_vidmem_alloc(struct kfd_dev *kfd, size_t size, size_t alignment,
>> +				enum kfd_mempool pool, kfd_mem_obj *mem_obj)
>> +{
>> +	return kfd2kgd->allocate_mem(kfd->kgd,
>> +					size,
>> +					alignment,
>> +					(enum kgd_memory_pool)pool,
>> +					(struct kgd_mem **)mem_obj);
>> +}
>> +
>> +void radeon_kfd_vidmem_free(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->free_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int radeon_kfd_vidmem_gpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj,
>> +				uint64_t *vmid0_address)
>> +{
>> +	return kfd2kgd->gpumap_mem(kfd->kgd,
>> +					(struct kgd_mem *)mem_obj,
>> +					vmid0_address);
>> +}
>> +
>> +void radeon_kfd_vidmem_ungpumap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->ungpumap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> +
>> +int radeon_kfd_vidmem_kmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj, void **ptr)
>> +{
>> +	return kfd2kgd->kmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj, ptr);
>> +}
>> +
>> +void radeon_kfd_vidmem_unkmap(struct kfd_dev *kfd, kfd_mem_obj mem_obj)
>> +{
>> +	kfd2kgd->unkmap_mem(kfd->kgd, (struct kgd_mem *)mem_obj);
>> +}
>> --
>> 1.9.1
>>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-17 11:57       ` Oded Gabbay
@ 2014-07-17 12:29         ` Christian König
  -1 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-17 12:29 UTC (permalink / raw)
  To: Oded Gabbay, Jerome Glisse, Oded Gabbay
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel

Am 17.07.2014 13:57, schrieb Oded Gabbay:
> On 11/07/14 19:36, Jerome Glisse wrote:
>> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>>> The KFD driver should be loaded when the radeon driver is loaded and
>>> should be finalized when the radeon driver is removed.
>>>
>>> This patch adds a function call to initialize kfd from radeon_init
>>> and a function call to finalize kfd from radeon_exit.
>>>
>>> If the KFD driver is not present in the system, the initialize call
>>> fails and the radeon driver continues normally.
>>>
>>> This patch also adds calls to probe, initialize and finalize a kfd 
>>> device
>>> per radeon device using the kgd-->kfd interface.
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>
>> It might be nice to allow to build radeon without HSA so i think an
>> CONFIG_HSA should be added and have other thing depends on it.
>> Otherwise this one is.
>>
>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
> We do allow it :)
> There is no problem building radeon without the kfd. In that case, 
> when radeon finds out that kfd is not available, it simply moves on 
> with its initialization procedure.

At least off hand I don't see how this should work. Radeon directly 
calls radeon_kfd_(probe|init|fini) and so has a direct dependency on it.

Christian.

>
>     Oded
>>
>>> ---
>>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>>   2 files changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
>>> b/drivers/gpu/drm/radeon/radeon_drv.c
>>> index cb14213..88a45a0 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>>> @@ -151,6 +151,9 @@ static inline void 
>>> radeon_register_atpx_handler(void) {}
>>>   static inline void radeon_unregister_atpx_handler(void) {}
>>>   #endif
>>>
>>> +extern bool radeon_kfd_init(void);
>>> +extern void radeon_kfd_fini(void);
>>> +
>>>   int radeon_no_wb;
>>>   int radeon_modeset = -1;
>>>   int radeon_dynclks = -1;
>>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>>   #endif
>>>       }
>>>
>>> +    radeon_kfd_init();
>>> +
>>>       /* let modprobe override vga console setting */
>>>       return drm_pci_init(driver, pdriver);
>>>   }
>>>
>>>   static void __exit radeon_exit(void)
>>>   {
>>> +    radeon_kfd_fini();
>>>       drm_pci_exit(driver, pdriver);
>>>       radeon_unregister_atpx_handler();
>>>   }
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
>>> b/drivers/gpu/drm/radeon/radeon_kms.c
>>> index 35d9318..0748284 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>> @@ -34,6 +34,10 @@
>>>   #include <linux/slab.h>
>>>   #include <linux/pm_runtime.h>
>>>
>>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>>> +
>>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>>   bool radeon_has_atpx(void);
>>>   #else
>>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>>
>>>       pm_runtime_get_sync(dev->dev);
>>>
>>> +    radeon_kfd_device_fini(rdev);
>>> +
>>>       radeon_acpi_fini(rdev);
>>>
>>>       radeon_modeset_fini(rdev);
>>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device 
>>> *dev, unsigned long flags)
>>>                   "Error during ACPI methods call\n");
>>>       }
>>>
>>> +    radeon_kfd_device_probe(rdev);
>>> +    radeon_kfd_device_init(rdev);
>>> +
>>>       if (radeon_is_px(dev)) {
>>>           pm_runtime_use_autosuspend(dev->dev);
>>>           pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>>> -- 
>>> 1.9.1
>>>
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
@ 2014-07-17 12:29         ` Christian König
  0 siblings, 0 replies; 116+ messages in thread
From: Christian König @ 2014-07-17 12:29 UTC (permalink / raw)
  To: Oded Gabbay, Jerome Glisse, Oded Gabbay
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

Am 17.07.2014 13:57, schrieb Oded Gabbay:
> On 11/07/14 19:36, Jerome Glisse wrote:
>> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>>> The KFD driver should be loaded when the radeon driver is loaded and
>>> should be finalized when the radeon driver is removed.
>>>
>>> This patch adds a function call to initialize kfd from radeon_init
>>> and a function call to finalize kfd from radeon_exit.
>>>
>>> If the KFD driver is not present in the system, the initialize call
>>> fails and the radeon driver continues normally.
>>>
>>> This patch also adds calls to probe, initialize and finalize a kfd 
>>> device
>>> per radeon device using the kgd-->kfd interface.
>>>
>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>
>> It might be nice to allow to build radeon without HSA so i think an
>> CONFIG_HSA should be added and have other thing depends on it.
>> Otherwise this one is.
>>
>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>
> We do allow it :)
> There is no problem building radeon without the kfd. In that case, 
> when radeon finds out that kfd is not available, it simply moves on 
> with its initialization procedure.

At least off hand I don't see how this should work. Radeon directly 
calls radeon_kfd_(probe|init|fini) and so has a direct dependency on it.

Christian.

>
>     Oded
>>
>>> ---
>>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>>   2 files changed, 15 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c 
>>> b/drivers/gpu/drm/radeon/radeon_drv.c
>>> index cb14213..88a45a0 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>>> @@ -151,6 +151,9 @@ static inline void 
>>> radeon_register_atpx_handler(void) {}
>>>   static inline void radeon_unregister_atpx_handler(void) {}
>>>   #endif
>>>
>>> +extern bool radeon_kfd_init(void);
>>> +extern void radeon_kfd_fini(void);
>>> +
>>>   int radeon_no_wb;
>>>   int radeon_modeset = -1;
>>>   int radeon_dynclks = -1;
>>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>>   #endif
>>>       }
>>>
>>> +    radeon_kfd_init();
>>> +
>>>       /* let modprobe override vga console setting */
>>>       return drm_pci_init(driver, pdriver);
>>>   }
>>>
>>>   static void __exit radeon_exit(void)
>>>   {
>>> +    radeon_kfd_fini();
>>>       drm_pci_exit(driver, pdriver);
>>>       radeon_unregister_atpx_handler();
>>>   }
>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c 
>>> b/drivers/gpu/drm/radeon/radeon_kms.c
>>> index 35d9318..0748284 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>> @@ -34,6 +34,10 @@
>>>   #include <linux/slab.h>
>>>   #include <linux/pm_runtime.h>
>>>
>>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>>> +
>>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>>   bool radeon_has_atpx(void);
>>>   #else
>>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>>
>>>       pm_runtime_get_sync(dev->dev);
>>>
>>> +    radeon_kfd_device_fini(rdev);
>>> +
>>>       radeon_acpi_fini(rdev);
>>>
>>>       radeon_modeset_fini(rdev);
>>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device 
>>> *dev, unsigned long flags)
>>>                   "Error during ACPI methods call\n");
>>>       }
>>>
>>> +    radeon_kfd_device_probe(rdev);
>>> +    radeon_kfd_device_init(rdev);
>>> +
>>>       if (radeon_is_px(dev)) {
>>>           pm_runtime_use_autosuspend(dev->dev);
>>>           pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>>> -- 
>>> 1.9.1
>>>
>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-17 12:29         ` Christian König
@ 2014-07-17 12:30           ` Oded Gabbay
  -1 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 12:30 UTC (permalink / raw)
  To: Christian König, Jerome Glisse
  Cc: David Airlie, Alex Deucher, linux-kernel, dri-devel,
	John Bridgman, Andrew Lewycky, Joerg Roedel

On 17/07/14 15:29, Christian König wrote:
> Am 17.07.2014 13:57, schrieb Oded Gabbay:
>> On 11/07/14 19:36, Jerome Glisse wrote:
>>> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>>>> The KFD driver should be loaded when the radeon driver is loaded and
>>>> should be finalized when the radeon driver is removed.
>>>>
>>>> This patch adds a function call to initialize kfd from radeon_init
>>>> and a function call to finalize kfd from radeon_exit.
>>>>
>>>> If the KFD driver is not present in the system, the initialize call
>>>> fails and the radeon driver continues normally.
>>>>
>>>> This patch also adds calls to probe, initialize and finalize a kfd device
>>>> per radeon device using the kgd-->kfd interface.
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>
>>> It might be nice to allow to build radeon without HSA so i think an
>>> CONFIG_HSA should be added and have other thing depends on it.
>>> Otherwise this one is.
>>>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>>
>> We do allow it :)
>> There is no problem building radeon without the kfd. In that case, when radeon
>> finds out that kfd is not available, it simply moves on with its
>> initialization procedure.
>
> At least off hand I don't see how this should work. Radeon directly calls
> radeon_kfd_(probe|init|fini) and so has a direct dependency on it.
>
> Christian.
But radeon_kfd.c is now a permanent part of the radeon driver. I talked with 
Alex about it and we both agreed on that. So radeon_kfd_* functions are *always* 
there when you build radeon.
	Oded
>
>>
>>     Oded
>>>
>>>> ---
>>>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>>>   2 files changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c
>>>> b/drivers/gpu/drm/radeon/radeon_drv.c
>>>> index cb14213..88a45a0 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>>>> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>>>>   static inline void radeon_unregister_atpx_handler(void) {}
>>>>   #endif
>>>>
>>>> +extern bool radeon_kfd_init(void);
>>>> +extern void radeon_kfd_fini(void);
>>>> +
>>>>   int radeon_no_wb;
>>>>   int radeon_modeset = -1;
>>>>   int radeon_dynclks = -1;
>>>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>>>   #endif
>>>>       }
>>>>
>>>> +    radeon_kfd_init();
>>>> +
>>>>       /* let modprobe override vga console setting */
>>>>       return drm_pci_init(driver, pdriver);
>>>>   }
>>>>
>>>>   static void __exit radeon_exit(void)
>>>>   {
>>>> +    radeon_kfd_fini();
>>>>       drm_pci_exit(driver, pdriver);
>>>>       radeon_unregister_atpx_handler();
>>>>   }
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c
>>>> b/drivers/gpu/drm/radeon/radeon_kms.c
>>>> index 35d9318..0748284 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>>> @@ -34,6 +34,10 @@
>>>>   #include <linux/slab.h>
>>>>   #include <linux/pm_runtime.h>
>>>>
>>>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>>>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>>>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>>>> +
>>>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>>>   bool radeon_has_atpx(void);
>>>>   #else
>>>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>>>
>>>>       pm_runtime_get_sync(dev->dev);
>>>>
>>>> +    radeon_kfd_device_fini(rdev);
>>>> +
>>>>       radeon_acpi_fini(rdev);
>>>>
>>>>       radeon_modeset_fini(rdev);
>>>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev,
>>>> unsigned long flags)
>>>>                   "Error during ACPI methods call\n");
>>>>       }
>>>>
>>>> +    radeon_kfd_device_probe(rdev);
>>>> +    radeon_kfd_device_init(rdev);
>>>> +
>>>>       if (radeon_is_px(dev)) {
>>>>           pm_runtime_use_autosuspend(dev->dev);
>>>>           pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>>>> --
>>>> 1.9.1
>>>>
>>
>


^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
@ 2014-07-17 12:30           ` Oded Gabbay
  0 siblings, 0 replies; 116+ messages in thread
From: Oded Gabbay @ 2014-07-17 12:30 UTC (permalink / raw)
  To: Christian König, Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On 17/07/14 15:29, Christian König wrote:
> Am 17.07.2014 13:57, schrieb Oded Gabbay:
>> On 11/07/14 19:36, Jerome Glisse wrote:
>>> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>>>> The KFD driver should be loaded when the radeon driver is loaded and
>>>> should be finalized when the radeon driver is removed.
>>>>
>>>> This patch adds a function call to initialize kfd from radeon_init
>>>> and a function call to finalize kfd from radeon_exit.
>>>>
>>>> If the KFD driver is not present in the system, the initialize call
>>>> fails and the radeon driver continues normally.
>>>>
>>>> This patch also adds calls to probe, initialize and finalize a kfd device
>>>> per radeon device using the kgd-->kfd interface.
>>>>
>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>
>>> It might be nice to allow to build radeon without HSA so i think an
>>> CONFIG_HSA should be added and have other thing depends on it.
>>> Otherwise this one is.
>>>
>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>>
>> We do allow it :)
>> There is no problem building radeon without the kfd. In that case, when radeon
>> finds out that kfd is not available, it simply moves on with its
>> initialization procedure.
>
> At least off hand I don't see how this should work. Radeon directly calls
> radeon_kfd_(probe|init|fini) and so has a direct dependency on it.
>
> Christian.
But radeon_kfd.c is now a permanent part of the radeon driver. I talked with 
Alex about it and we both agreed on that. So radeon_kfd_* functions are *always* 
there when you build radeon.
	Oded
>
>>
>>     Oded
>>>
>>>> ---
>>>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>>>   2 files changed, 15 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c
>>>> b/drivers/gpu/drm/radeon/radeon_drv.c
>>>> index cb14213..88a45a0 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>>>> @@ -151,6 +151,9 @@ static inline void radeon_register_atpx_handler(void) {}
>>>>   static inline void radeon_unregister_atpx_handler(void) {}
>>>>   #endif
>>>>
>>>> +extern bool radeon_kfd_init(void);
>>>> +extern void radeon_kfd_fini(void);
>>>> +
>>>>   int radeon_no_wb;
>>>>   int radeon_modeset = -1;
>>>>   int radeon_dynclks = -1;
>>>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>>>   #endif
>>>>       }
>>>>
>>>> +    radeon_kfd_init();
>>>> +
>>>>       /* let modprobe override vga console setting */
>>>>       return drm_pci_init(driver, pdriver);
>>>>   }
>>>>
>>>>   static void __exit radeon_exit(void)
>>>>   {
>>>> +    radeon_kfd_fini();
>>>>       drm_pci_exit(driver, pdriver);
>>>>       radeon_unregister_atpx_handler();
>>>>   }
>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c
>>>> b/drivers/gpu/drm/radeon/radeon_kms.c
>>>> index 35d9318..0748284 100644
>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>>> @@ -34,6 +34,10 @@
>>>>   #include <linux/slab.h>
>>>>   #include <linux/pm_runtime.h>
>>>>
>>>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>>>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>>>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>>>> +
>>>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>>>   bool radeon_has_atpx(void);
>>>>   #else
>>>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device *dev)
>>>>
>>>>       pm_runtime_get_sync(dev->dev);
>>>>
>>>> +    radeon_kfd_device_fini(rdev);
>>>> +
>>>>       radeon_acpi_fini(rdev);
>>>>
>>>>       radeon_modeset_fini(rdev);
>>>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device *dev,
>>>> unsigned long flags)
>>>>                   "Error during ACPI methods call\n");
>>>>       }
>>>>
>>>> +    radeon_kfd_device_probe(rdev);
>>>> +    radeon_kfd_device_init(rdev);
>>>> +
>>>>       if (radeon_is_px(dev)) {
>>>>           pm_runtime_use_autosuspend(dev->dev);
>>>>           pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>>>> --
>>>> 1.9.1
>>>>
>>
>

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-17 12:30           ` Oded Gabbay
  (?)
@ 2014-07-17 12:45           ` Christian König
  2014-07-17 13:31               ` Daniel Vetter
  -1 siblings, 1 reply; 116+ messages in thread
From: Christian König @ 2014-07-17 12:45 UTC (permalink / raw)
  To: Oded Gabbay, Jerome Glisse
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher


[-- Attachment #1.1: Type: text/plain, Size: 6238 bytes --]

Am 17.07.2014 14:30, schrieb Oded Gabbay:
> On 17/07/14 15:29, Christian König wrote:
>> Am 17.07.2014 13:57, schrieb Oded Gabbay:
>>> On 11/07/14 19:36, Jerome Glisse wrote:
>>>> On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
>>>>> The KFD driver should be loaded when the radeon driver is loaded and
>>>>> should be finalized when the radeon driver is removed.
>>>>>
>>>>> This patch adds a function call to initialize kfd from radeon_init
>>>>> and a function call to finalize kfd from radeon_exit.
>>>>>
>>>>> If the KFD driver is not present in the system, the initialize call
>>>>> fails and the radeon driver continues normally.
>>>>>
>>>>> This patch also adds calls to probe, initialize and finalize a kfd 
>>>>> device
>>>>> per radeon device using the kgd-->kfd interface.
>>>>>
>>>>> Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
>>>>
>>>> It might be nice to allow to build radeon without HSA so i think an
>>>> CONFIG_HSA should be added and have other thing depends on it.
>>>> Otherwise this one is.
>>>>
>>>> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
>>>>
>>> We do allow it :)
>>> There is no problem building radeon without the kfd. In that case, 
>>> when radeon
>>> finds out that kfd is not available, it simply moves on with its
>>> initialization procedure.
>>
>> At least off hand I don't see how this should work. Radeon directly 
>> calls
>> radeon_kfd_(probe|init|fini) and so has a direct dependency on it.
>>
>> Christian.
> But radeon_kfd.c is now a permanent part of the radeon driver. I 
> talked with Alex about it and we both agreed on that. So radeon_kfd_* 
> functions are *always* there when you build radeon.

Ah, I see. So radeon_kfd_init then tries to load the other module 
through symbol_request(). Long story short that's a bad idea for a 
couple of reasons.

First of all it only works when you build everything as module and 
second by doing so the radeon<->kfd interface must be handled as 
internal stable interface.

Only a very few drivers/subsystem do use symbol_request() and to see how 
to use it correctly please take a look at (for example) 
sound/pci/hda/hda_codec.c.

Essentially you need to handle all different combination of module vs. 
builtin like this:
> 1660  <http://lxr.free-electrons.com/source/sound/pci/hda/hda_codec.c#L1660>  #ifIS_MODULE  <http://lxr.free-electrons.com/ident?i=IS_MODULE>(CONFIG_SND_HDA_GENERIC)
> 1661  <http://lxr.free-electrons.com/source/sound/pci/hda/hda_codec.c#L1661>                          patch  <http://lxr.free-electrons.com/ident?i=patch>  =load_parser  <http://lxr.free-electrons.com/ident?i=load_parser>(codec  <http://lxr.free-electrons.com/ident?i=codec>,snd_hda_parse_generic_codec  <http://lxr.free-electrons.com/ident?i=snd_hda_parse_generic_codec>);
> 1662  <http://lxr.free-electrons.com/source/sound/pci/hda/hda_codec.c#L1662>  #elifIS_BUILTIN  <http://lxr.free-electrons.com/ident?i=IS_BUILTIN>(CONFIG_SND_HDA_GENERIC)
> 1663  <http://lxr.free-electrons.com/source/sound/pci/hda/hda_codec.c#L1663>                          patch  <http://lxr.free-electrons.com/ident?i=patch>  =snd_hda_parse_generic_codec  <http://lxr.free-electrons.com/ident?i=snd_hda_parse_generic_codec>;
> 1664  <http://lxr.free-electrons.com/source/sound/pci/hda/hda_codec.c#L1664>  #endif

I strongly suggest to just make the radeon module depend directly on the 
KFD module through a CONFIG_RADEON_KFD option.

Regards,
Christian.

>     Oded
>>
>>>
>>>     Oded
>>>>
>>>>> ---
>>>>>   drivers/gpu/drm/radeon/radeon_drv.c | 6 ++++++
>>>>>   drivers/gpu/drm/radeon/radeon_kms.c | 9 +++++++++
>>>>>   2 files changed, 15 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_drv.c
>>>>> b/drivers/gpu/drm/radeon/radeon_drv.c
>>>>> index cb14213..88a45a0 100644
>>>>> --- a/drivers/gpu/drm/radeon/radeon_drv.c
>>>>> +++ b/drivers/gpu/drm/radeon/radeon_drv.c
>>>>> @@ -151,6 +151,9 @@ static inline void 
>>>>> radeon_register_atpx_handler(void) {}
>>>>>   static inline void radeon_unregister_atpx_handler(void) {}
>>>>>   #endif
>>>>>
>>>>> +extern bool radeon_kfd_init(void);
>>>>> +extern void radeon_kfd_fini(void);
>>>>> +
>>>>>   int radeon_no_wb;
>>>>>   int radeon_modeset = -1;
>>>>>   int radeon_dynclks = -1;
>>>>> @@ -630,12 +633,15 @@ static int __init radeon_init(void)
>>>>>   #endif
>>>>>       }
>>>>>
>>>>> +    radeon_kfd_init();
>>>>> +
>>>>>       /* let modprobe override vga console setting */
>>>>>       return drm_pci_init(driver, pdriver);
>>>>>   }
>>>>>
>>>>>   static void __exit radeon_exit(void)
>>>>>   {
>>>>> +    radeon_kfd_fini();
>>>>>       drm_pci_exit(driver, pdriver);
>>>>>       radeon_unregister_atpx_handler();
>>>>>   }
>>>>> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c
>>>>> b/drivers/gpu/drm/radeon/radeon_kms.c
>>>>> index 35d9318..0748284 100644
>>>>> --- a/drivers/gpu/drm/radeon/radeon_kms.c
>>>>> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
>>>>> @@ -34,6 +34,10 @@
>>>>>   #include <linux/slab.h>
>>>>>   #include <linux/pm_runtime.h>
>>>>>
>>>>> +extern void radeon_kfd_device_probe(struct radeon_device *rdev);
>>>>> +extern void radeon_kfd_device_init(struct radeon_device *rdev);
>>>>> +extern void radeon_kfd_device_fini(struct radeon_device *rdev);
>>>>> +
>>>>>   #if defined(CONFIG_VGA_SWITCHEROO)
>>>>>   bool radeon_has_atpx(void);
>>>>>   #else
>>>>> @@ -63,6 +67,8 @@ int radeon_driver_unload_kms(struct drm_device 
>>>>> *dev)
>>>>>
>>>>>       pm_runtime_get_sync(dev->dev);
>>>>>
>>>>> +    radeon_kfd_device_fini(rdev);
>>>>> +
>>>>>       radeon_acpi_fini(rdev);
>>>>>
>>>>>       radeon_modeset_fini(rdev);
>>>>> @@ -142,6 +148,9 @@ int radeon_driver_load_kms(struct drm_device 
>>>>> *dev,
>>>>> unsigned long flags)
>>>>>                   "Error during ACPI methods call\n");
>>>>>       }
>>>>>
>>>>> +    radeon_kfd_device_probe(rdev);
>>>>> +    radeon_kfd_device_init(rdev);
>>>>> +
>>>>>       if (radeon_is_px(dev)) {
>>>>>           pm_runtime_use_autosuspend(dev->dev);
>>>>>           pm_runtime_set_autosuspend_delay(dev->dev, 5000);
>>>>> -- 
>>>>> 1.9.1
>>>>>
>>>
>>
>


[-- Attachment #1.2: Type: text/html, Size: 10959 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
  2014-07-17 12:45           ` Christian König
@ 2014-07-17 13:31               ` Daniel Vetter
  0 siblings, 0 replies; 116+ messages in thread
From: Daniel Vetter @ 2014-07-17 13:31 UTC (permalink / raw)
  To: Christian König
  Cc: Oded Gabbay, Jerome Glisse, Andrew Lewycky, linux-kernel,
	dri-devel, Alex Deucher

On Thu, Jul 17, 2014 at 02:45:09PM +0200, Christian König wrote:
> Am 17.07.2014 14:30, schrieb Oded Gabbay:
> >On 17/07/14 15:29, Christian König wrote:
> >>Am 17.07.2014 13:57, schrieb Oded Gabbay:
> >>>On 11/07/14 19:36, Jerome Glisse wrote:
> >>>>On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
> >>>>>The KFD driver should be loaded when the radeon driver is loaded and
> >>>>>should be finalized when the radeon driver is removed.
> >>>>>
> >>>>>This patch adds a function call to initialize kfd from radeon_init
> >>>>>and a function call to finalize kfd from radeon_exit.
> >>>>>
> >>>>>If the KFD driver is not present in the system, the initialize call
> >>>>>fails and the radeon driver continues normally.
> >>>>>
> >>>>>This patch also adds calls to probe, initialize and finalize a kfd
> >>>>>device
> >>>>>per radeon device using the kgd-->kfd interface.
> >>>>>
> >>>>>Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >>>>
> >>>>It might be nice to allow to build radeon without HSA so i think an
> >>>>CONFIG_HSA should be added and have other thing depends on it.
> >>>>Otherwise this one is.
> >>>>
> >>>>Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> >>>>
> >>>We do allow it :)
> >>>There is no problem building radeon without the kfd. In that case,
> >>>when radeon
> >>>finds out that kfd is not available, it simply moves on with its
> >>>initialization procedure.
> >>
> >>At least off hand I don't see how this should work. Radeon directly
> >>calls
> >>radeon_kfd_(probe|init|fini) and so has a direct dependency on it.
> >>
> >>Christian.
> >But radeon_kfd.c is now a permanent part of the radeon driver. I talked
> >with Alex about it and we both agreed on that. So radeon_kfd_* functions
> >are *always* there when you build radeon.
> 
> Ah, I see. So radeon_kfd_init then tries to load the other module through
> symbol_request(). Long story short that's a bad idea for a couple of
> reasons.
> 
> First of all it only works when you build everything as module and second by
> doing so the radeon<->kfd interface must be handled as internal stable
> interface.
> 
> Only a very few drivers/subsystem do use symbol_request() and to see how to
> use it correctly please take a look at (for example)
> sound/pci/hda/hda_codec.c.

We do this in i915 to coordinate a bunch of things with the snd_hda
driver. And it's a major pain. Imo the proper way to do this is for one
driver to expose a platform driver with a bunch of specific interfaces and
for the other driver to register as a platform driver against that device.

Then all the usual linux hotplug infrastructure will make sure that this
all works and there's a clear runtime depency. For i915 that's what I've
requested the audio guys to look into, and also what I'll require for
other such sub-driver stuff (e.g. we have a non-intel video codec on vlv
gfx). Well for audio it will be a bit fancier since we also want some
standardized stuff to allow userspace to see the association between the
gfx output and the audio side. Atm you can only guess if you have more
than one screen connected.

This approach gives you full flexibility and you can e.g. blacklist the
subdriver for debugging, without a kernel recompile.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 116+ messages in thread

* Re: [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon
@ 2014-07-17 13:31               ` Daniel Vetter
  0 siblings, 0 replies; 116+ messages in thread
From: Daniel Vetter @ 2014-07-17 13:31 UTC (permalink / raw)
  To: Christian König
  Cc: Andrew Lewycky, linux-kernel, dri-devel, Alex Deucher

On Thu, Jul 17, 2014 at 02:45:09PM +0200, Christian König wrote:
> Am 17.07.2014 14:30, schrieb Oded Gabbay:
> >On 17/07/14 15:29, Christian König wrote:
> >>Am 17.07.2014 13:57, schrieb Oded Gabbay:
> >>>On 11/07/14 19:36, Jerome Glisse wrote:
> >>>>On Fri, Jul 11, 2014 at 12:50:08AM +0300, Oded Gabbay wrote:
> >>>>>The KFD driver should be loaded when the radeon driver is loaded and
> >>>>>should be finalized when the radeon driver is removed.
> >>>>>
> >>>>>This patch adds a function call to initialize kfd from radeon_init
> >>>>>and a function call to finalize kfd from radeon_exit.
> >>>>>
> >>>>>If the KFD driver is not present in the system, the initialize call
> >>>>>fails and the radeon driver continues normally.
> >>>>>
> >>>>>This patch also adds calls to probe, initialize and finalize a kfd
> >>>>>device
> >>>>>per radeon device using the kgd-->kfd interface.
> >>>>>
> >>>>>Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
> >>>>
> >>>>It might be nice to allow to build radeon without HSA so i think an
> >>>>CONFIG_HSA should be added and have other thing depends on it.
> >>>>Otherwise this one is.
> >>>>
> >>>>Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
> >>>>
> >>>We do allow it :)
> >>>There is no problem building radeon without the kfd. In that case,
> >>>when radeon
> >>>finds out that kfd is not available, it simply moves on with its
> >>>initialization procedure.
> >>
> >>At least off hand I don't see how this should work. Radeon directly
> >>calls
> >>radeon_kfd_(probe|init|fini) and so has a direct dependency on it.
> >>
> >>Christian.
> >But radeon_kfd.c is now a permanent part of the radeon driver. I talked
> >with Alex about it and we both agreed on that. So radeon_kfd_* functions
> >are *always* there when you build radeon.
> 
> Ah, I see. So radeon_kfd_init then tries to load the other module through
> symbol_request(). Long story short that's a bad idea for a couple of
> reasons.
> 
> First of all it only works when you build everything as module and second by
> doing so the radeon<->kfd interface must be handled as internal stable
> interface.
> 
> Only a very few drivers/subsystem do use symbol_request() and to see how to
> use it correctly please take a look at (for example)
> sound/pci/hda/hda_codec.c.

We do this in i915 to coordinate a bunch of things with the snd_hda
driver. And it's a major pain. Imo the proper way to do this is for one
driver to expose a platform driver with a bunch of specific interfaces and
for the other driver to register as a platform driver against that device.

Then all the usual linux hotplug infrastructure will make sure that this
all works and there's a clear runtime depency. For i915 that's what I've
requested the audio guys to look into, and also what I'll require for
other such sub-driver stuff (e.g. we have a non-intel video codec on vlv
gfx). Well for audio it will be a bit fancier since we also want some
standardized stuff to allow userspace to see the association between the
gfx output and the audio side. Atm you can only guess if you have more
than one screen connected.

This approach gives you full flexibility and you can e.g. blacklist the
subdriver for debugging, without a kernel recompile.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 116+ messages in thread

end of thread, other threads:[~2014-07-17 13:31 UTC | newest]

Thread overview: 116+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-10 21:50 [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Oded Gabbay
2014-07-10 21:50 ` [PATCH 03/83] drm/radeon: Report doorbell configuration to kfd Oded Gabbay
2014-07-11 16:16   ` Jerome Glisse
2014-07-11 16:16     ` Jerome Glisse
2014-07-10 21:50 ` [PATCH 04/83] drm/radeon: Add radeon <--> kfd interface Oded Gabbay
2014-07-10 22:38   ` Joe Perches
2014-07-10 22:38     ` Joe Perches
2014-07-11 16:24     ` Jerome Glisse
2014-07-11 16:24       ` Jerome Glisse
2014-07-17 11:55       ` Oded Gabbay
2014-07-10 21:50 ` [PATCH 05/83] drm/radeon: Add kfd-->kgd interface to get virtual ram size Oded Gabbay
2014-07-11 16:27   ` Jerome Glisse
2014-07-11 16:27     ` Jerome Glisse
2014-07-10 21:50 ` [PATCH 06/83] drm/radeon: Add kfd-->kgd interfaces of memory allocation/mapping Oded Gabbay
2014-07-11 16:32   ` Jerome Glisse
2014-07-11 16:32     ` Jerome Glisse
2014-07-10 21:50 ` [PATCH 07/83] drm/radeon: Add kfd-->kgd interface of locking srbm_gfx_cntl register Oded Gabbay
2014-07-11 16:34   ` Jerome Glisse
2014-07-11 16:34     ` Jerome Glisse
2014-07-11 17:48     ` Bridgman, John
2014-07-11 17:48       ` Bridgman, John
2014-07-12  0:36       ` Bridgman, John
2014-07-12  0:36         ` Bridgman, John
2014-07-12  0:37       ` Bridgman, John
2014-07-12  0:37         ` Bridgman, John
2014-07-10 21:50 ` [PATCH 08/83] drm/radeon: Add calls to initialize and finalize kfd from radeon Oded Gabbay
2014-07-11 16:36   ` Jerome Glisse
2014-07-11 16:36     ` Jerome Glisse
2014-07-17 11:57     ` Oded Gabbay
2014-07-17 11:57       ` Oded Gabbay
2014-07-17 12:29       ` Christian König
2014-07-17 12:29         ` Christian König
2014-07-17 12:30         ` Oded Gabbay
2014-07-17 12:30           ` Oded Gabbay
2014-07-17 12:45           ` Christian König
2014-07-17 13:31             ` Daniel Vetter
2014-07-17 13:31               ` Daniel Vetter
2014-07-10 21:50 ` [PATCH 09/83] hsa/radeon: Add code base of hsa driver for AMD's GPUs Oded Gabbay
2014-07-11 17:04   ` Jerome Glisse
2014-07-11 17:04     ` Jerome Glisse
2014-07-11 17:28     ` Joe Perches
2014-07-11 17:28       ` Joe Perches
2014-07-17 11:51       ` Oded Gabbay
2014-07-17 11:51         ` Oded Gabbay
2014-07-11 17:40     ` Daniel Vetter
2014-07-11 17:40       ` Daniel Vetter
2014-07-11 18:02     ` Bridgman, John
2014-07-11 18:02       ` Bridgman, John
2014-07-11 18:10       ` Jerome Glisse
2014-07-11 18:10         ` Jerome Glisse
2014-07-11 18:46         ` Bridgman, John
2014-07-11 18:46           ` Bridgman, John
2014-07-11 18:51           ` Jerome Glisse
2014-07-11 18:51             ` Jerome Glisse
2014-07-11 18:56             ` Bridgman, John
2014-07-11 18:56               ` Bridgman, John
2014-07-11 19:22               ` Jerome Glisse
2014-07-11 19:22                 ` Jerome Glisse
2014-07-11 19:38                 ` Joe Perches
2014-07-11 19:38                   ` Joe Perches
2014-07-17 11:51                 ` Oded Gabbay
2014-07-17 11:51                   ` Oded Gabbay
2014-07-10 21:50 ` [PATCH 10/83] hsa/radeon: Add initialization and unmapping of doorbell aperture Oded Gabbay
2014-07-10 21:50 ` [PATCH 11/83] hsa/radeon: Add scheduler code Oded Gabbay
2014-07-11 18:25   ` Jerome Glisse
2014-07-11 18:25     ` Jerome Glisse
2014-07-17 11:57     ` Oded Gabbay
2014-07-17 11:57       ` Oded Gabbay
2014-07-10 21:50 ` [PATCH 12/83] hsa/radeon: Add kfd mmap handler Oded Gabbay
2014-07-11 18:47   ` Jerome Glisse
2014-07-11 18:47     ` Jerome Glisse
2014-07-10 21:50 ` [PATCH 13/83] hsa/radeon: Add 2 new IOCTL to kfd, CREATE_QUEUE and DESTROY_QUEUE Oded Gabbay
2014-07-11 19:19   ` Jerome Glisse
2014-07-11 19:19     ` Jerome Glisse
2014-07-11 21:01   ` Jerome Glisse
2014-07-11 21:01     ` Jerome Glisse
2014-07-11 21:42   ` Dave Airlie
2014-07-11 21:42     ` Dave Airlie
2014-07-14  7:33     ` Gabbay, Oded
2014-07-14  7:33       ` Gabbay, Oded
2014-07-10 21:50 ` [PATCH 14/83] hsa/radeon: Update MAINTAINERS and CREDITS files Oded Gabbay
2014-07-10 21:50 ` [PATCH 15/83] hsa/radeon: Add interrupt handling module Oded Gabbay
2014-07-11 19:57   ` Jerome Glisse
2014-07-11 19:57     ` Jerome Glisse
2014-07-10 21:50 ` [PATCH 16/83] hsa/radeon: Add the isr function of the KFD scehduler Oded Gabbay
2014-07-10 21:50 ` [PATCH 17/83] hsa/radeon: Handle deactivation of queues using interrupts Oded Gabbay
2014-07-10 21:50 ` [PATCH 18/83] hsa/radeon: Enable interrupts in KFD scheduler Oded Gabbay
2014-07-10 21:50 ` [PATCH 19/83] hsa/radeon: Enable/Disable KFD interrupt module Oded Gabbay
2014-07-10 21:50 ` [PATCH 20/83] hsa/radeon: Add interrupt callback function to kgd2kfd interface Oded Gabbay
2014-07-10 21:50 ` [PATCH 21/83] hsa/radeon: Add kgd-->kfd interfaces for suspend and resume Oded Gabbay
2014-07-10 21:50 ` [PATCH 22/83] drm/radeon: Add calls to suspend and resume of kfd driver Oded Gabbay
2014-07-10 21:50 ` [PATCH 23/83] drm/radeon/cik: Don't touch int of pipes 1-7 Oded Gabbay
2014-07-10 21:50 ` [PATCH 24/83] drm/radeon/cik: Call kfd isr function Oded Gabbay
2014-07-10 21:50 ` [PATCH 25/83] hsa/radeon: fix the OEMID assignment in kfd_topology Oded Gabbay
2014-07-10 21:50 ` [PATCH 26/83] hsa/radeon: Make binding of process to device permanent Oded Gabbay
2014-07-10 21:50 ` [PATCH 27/83] hsa/radeon: Implement hsaKmtSetMemoryPolicy Oded Gabbay
2014-07-10 21:50   ` Oded Gabbay
2014-07-11 16:05 ` [PATCH 02/83] drm/radeon: reduce number of free VMIDs and pipes in KV Jerome Glisse
2014-07-11 16:05   ` Jerome Glisse
2014-07-11 16:18   ` Christian König
2014-07-11 16:18     ` Christian König
2014-07-11 16:22     ` Alex Deucher
2014-07-11 16:22       ` Alex Deucher
2014-07-11 17:07       ` Bridgman, John
2014-07-11 17:07         ` Bridgman, John
2014-07-11 17:59         ` Ilyes Gouta
2014-07-11 22:54           ` Bridgman, John
2014-07-11 22:54             ` Bridgman, John
2014-07-12  9:00       ` Christian König
2014-07-12  9:00         ` Christian König
2014-07-14  7:31         ` Michel Dänzer
2014-07-14  7:31           ` Michel Dänzer
2014-07-14  7:38 ` Michel Dänzer
2014-07-14  7:58   ` Christian König
2014-07-17 11:47     ` Oded Gabbay
2014-07-17 11:47       ` Oded Gabbay

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.