All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/21] GFXv9/Vega10 support for KFD
@ 2018-04-10 21:32 Felix Kuehling
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This patch series adds support for GFXv9 GPUs to KFD. In this series it
enables support for Vega10. Raven support requires some extra work that
will follow shortly, but Raven support is already included and I didn't
go out of my way to keep it out.

Felix Kuehling (19):
  drm/amdgpu: Remove unused interface from kfd2kgd interface
  drm/amd: Update GFXv9 SDMA MQD structure
  drm/amdgpu: Add GFXv9 TLB invalidation packet definition
  drm/amdgpu: Add GFXv9 kfd2kgd interface functions
  drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
  drm/amdkfd: Make doorbell size ASIC-dependent
  drm/amdkfd: Implement doorbell allocation for SOC15
  drm/amdkfd: Move packet writer functions into ASIC-specific file
  drm/amdkfd: Add GFXv9 PM4 packet writer functions
  drm/amdkfd: Add GFXv9 MQD manager
  drm/amdkfd: Add GFXv9 device queue manager
  drm/amdkfd: Add SOC15 interrupt processing support
  drm/amdkfd: Fix goto usage
  drm/amdkfd: Fix kernel queue rollback_packet
  drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
  drm/amdkfd: Remove limit on number of GPUs (follow-up)
  drm/amdkfd: Support flat memory apertures for GFXv9
  drm/amdkfd: Add GFXv9 CWSR trap handler
  drm/amdkfd: Add Vega10 topology and device info

Harish Kasiviswanathan (1):
  drm/amdkfd: Clean up KFD_MMAP_ offset handling

welu (1):
  drm/amdkfd: Try to enable atomics for all GPUs

 MAINTAINERS                                        |    2 +
 drivers/gpu/drm/amd/amdgpu/Makefile                |    3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   26 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c              |    1 +
 drivers/gpu/drm/amd/amdgpu/soc15d.h                |    5 +
 drivers/gpu/drm/amd/amdkfd/Makefile                |   10 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |   42 +-
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   11 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   89 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    2 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          |   65 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  119 +-
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c    |   84 ++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |   39 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h      |    7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |    9 +
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    3 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    |  443 ++++++
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  385 +----
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h    |  583 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  106 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   40 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    6 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
 drivers/gpu/drm/amd/amdkfd/soc15_int.h             |   47 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   20 +-
 drivers/gpu/drm/amd/include/v9_structs.h           |   48 +-
 39 files changed, 5118 insertions(+), 501 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
 create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-10 21:32   ` Felix Kuehling
  2018-04-10 21:32   ` [PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure Felix Kuehling
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 10 ----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 ----------
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  5 -----
 3 files changed, 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index ea54e53..0ff36d4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -98,8 +98,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 					unsigned int vmid);
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-				uint32_t hpd_size, uint64_t hpd_gpu_addr);
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
 			uint32_t queue_id, uint32_t __user *wptr,
@@ -183,7 +181,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.free_pasid = amdgpu_pasid_free,
 	.program_sh_mem_settings = kgd_program_sh_mem_settings,
 	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
-	.init_pipeline = kgd_init_pipeline,
 	.init_interrupts = kgd_init_interrupts,
 	.hqd_load = kgd_hqd_load,
 	.hqd_sdma_load = kgd_hqd_sdma_load,
@@ -309,13 +306,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 	return 0;
 }
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-				uint32_t hpd_size, uint64_t hpd_gpu_addr)
-{
-	/* amdgpu owns the per-pipe state */
-	return 0;
-}
-
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 89264c9..6ef9762 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -57,8 +57,6 @@ static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
 		uint32_t sh_mem_bases);
 static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 		unsigned int vmid);
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-		uint32_t hpd_size, uint64_t hpd_gpu_addr);
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
 			uint32_t queue_id, uint32_t __user *wptr,
@@ -141,7 +139,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.free_pasid = amdgpu_pasid_free,
 	.program_sh_mem_settings = kgd_program_sh_mem_settings,
 	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
-	.init_pipeline = kgd_init_pipeline,
 	.init_interrupts = kgd_init_interrupts,
 	.hqd_load = kgd_hqd_load,
 	.hqd_sdma_load = kgd_hqd_sdma_load,
@@ -270,13 +267,6 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
 	return 0;
 }
 
-static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
-				uint32_t hpd_size, uint64_t hpd_gpu_addr)
-{
-	/* amdgpu owns the per-pipe state */
-	return 0;
-}
-
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 286cfe7..7cf3506 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -173,8 +173,6 @@ struct tile_config {
  * @set_pasid_vmid_mapping: Exposes pasid/vmid pair to the H/W for no cp
  * scheduling mode. Only used for no cp scheduling mode.
  *
- * @init_pipeline: Initialized the compute pipelines.
- *
  * @hqd_load: Loads the mqd structure to a H/W hqd slot. used only for no cp
  * sceduling mode.
  *
@@ -274,9 +272,6 @@ struct kfd2kgd_calls {
 	int (*set_pasid_vmid_mapping)(struct kgd_dev *kgd, unsigned int pasid,
 					unsigned int vmid);
 
-	int (*init_pipeline)(struct kgd_dev *kgd, uint32_t pipe_id,
-				uint32_t hpd_size, uint64_t hpd_gpu_addr);
-
 	int (*init_interrupts)(struct kgd_dev *kgd, uint32_t pipe_id);
 
 	int (*hqd_load)(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:32   ` [PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface Felix Kuehling
@ 2018-04-10 21:32   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition Felix Kuehling
                     ` (21 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This matches what the HWS firmware expects on GFXv9 chips.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 MAINTAINERS                              |  1 +
 drivers/gpu/drm/amd/include/v9_structs.h | 48 ++++++++++++++++----------------
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 004d2c1..6804170 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -772,6 +772,7 @@ F:	drivers/gpu/drm/amd/amdkfd/
 F:	drivers/gpu/drm/amd/include/cik_structs.h
 F:	drivers/gpu/drm/amd/include/kgd_kfd_interface.h
 F:	drivers/gpu/drm/amd/include/vi_structs.h
+F:	drivers/gpu/drm/amd/include/v9_structs.h
 F:	include/uapi/linux/kfd_ioctl.h
 
 AMD SEATTLE DEVICE TREE SUPPORT
diff --git a/drivers/gpu/drm/amd/include/v9_structs.h b/drivers/gpu/drm/amd/include/v9_structs.h
index 2fb25ab..ceaf493 100644
--- a/drivers/gpu/drm/amd/include/v9_structs.h
+++ b/drivers/gpu/drm/amd/include/v9_structs.h
@@ -29,10 +29,10 @@ struct v9_sdma_mqd {
 	uint32_t sdmax_rlcx_rb_base;
 	uint32_t sdmax_rlcx_rb_base_hi;
 	uint32_t sdmax_rlcx_rb_rptr;
+	uint32_t sdmax_rlcx_rb_rptr_hi;
 	uint32_t sdmax_rlcx_rb_wptr;
+	uint32_t sdmax_rlcx_rb_wptr_hi;
 	uint32_t sdmax_rlcx_rb_wptr_poll_cntl;
-	uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi;
-	uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo;
 	uint32_t sdmax_rlcx_rb_rptr_addr_hi;
 	uint32_t sdmax_rlcx_rb_rptr_addr_lo;
 	uint32_t sdmax_rlcx_ib_cntl;
@@ -44,29 +44,29 @@ struct v9_sdma_mqd {
 	uint32_t sdmax_rlcx_skip_cntl;
 	uint32_t sdmax_rlcx_context_status;
 	uint32_t sdmax_rlcx_doorbell;
-	uint32_t sdmax_rlcx_virtual_addr;
-	uint32_t sdmax_rlcx_ape1_cntl;
+	uint32_t sdmax_rlcx_status;
 	uint32_t sdmax_rlcx_doorbell_log;
-	uint32_t reserved_22;
-	uint32_t reserved_23;
-	uint32_t reserved_24;
-	uint32_t reserved_25;
-	uint32_t reserved_26;
-	uint32_t reserved_27;
-	uint32_t reserved_28;
-	uint32_t reserved_29;
-	uint32_t reserved_30;
-	uint32_t reserved_31;
-	uint32_t reserved_32;
-	uint32_t reserved_33;
-	uint32_t reserved_34;
-	uint32_t reserved_35;
-	uint32_t reserved_36;
-	uint32_t reserved_37;
-	uint32_t reserved_38;
-	uint32_t reserved_39;
-	uint32_t reserved_40;
-	uint32_t reserved_41;
+	uint32_t sdmax_rlcx_watermark;
+	uint32_t sdmax_rlcx_doorbell_offset;
+	uint32_t sdmax_rlcx_csa_addr_lo;
+	uint32_t sdmax_rlcx_csa_addr_hi;
+	uint32_t sdmax_rlcx_ib_sub_remain;
+	uint32_t sdmax_rlcx_preempt;
+	uint32_t sdmax_rlcx_dummy_reg;
+	uint32_t sdmax_rlcx_rb_wptr_poll_addr_hi;
+	uint32_t sdmax_rlcx_rb_wptr_poll_addr_lo;
+	uint32_t sdmax_rlcx_rb_aql_cntl;
+	uint32_t sdmax_rlcx_minor_ptr_update;
+	uint32_t sdmax_rlcx_midcmd_data0;
+	uint32_t sdmax_rlcx_midcmd_data1;
+	uint32_t sdmax_rlcx_midcmd_data2;
+	uint32_t sdmax_rlcx_midcmd_data3;
+	uint32_t sdmax_rlcx_midcmd_data4;
+	uint32_t sdmax_rlcx_midcmd_data5;
+	uint32_t sdmax_rlcx_midcmd_data6;
+	uint32_t sdmax_rlcx_midcmd_data7;
+	uint32_t sdmax_rlcx_midcmd_data8;
+	uint32_t sdmax_rlcx_midcmd_cntl;
 	uint32_t reserved_42;
 	uint32_t reserved_43;
 	uint32_t reserved_44;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:32   ` [PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface Felix Kuehling
  2018-04-10 21:32   ` [PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions Felix Kuehling
                     ` (20 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Jay Cornwall, Shaoyun Liu

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/soc15d.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h b/drivers/gpu/drm/amd/amdgpu/soc15d.h
index 7f408f8..f22f7a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15d.h
+++ b/drivers/gpu/drm/amd/amdgpu/soc15d.h
@@ -268,6 +268,11 @@
 			 * x=1: tmz_end
 			 */
 
+#define	PACKET3_INVALIDATE_TLBS				0x98
+#              define PACKET3_INVALIDATE_TLBS_DST_SEL(x)     ((x) << 0)
+#              define PACKET3_INVALIDATE_TLBS_ALL_HUB(x)     ((x) << 4)
+#              define PACKET3_INVALIDATE_TLBS_PASID(x)       ((x) << 5)
+#              define PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(x)  ((x) << 29)
 #define PACKET3_SET_RESOURCES				0xA0
 /* 1. header
  * 2. CONTROL
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
       [not found]     ` <1523395998-31314-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:33   ` [PATCH 05/21] drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources Felix Kuehling
                     ` (19 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Yong Zhao, Jay Cornwall, Felix Kuehling, Shaoyun Liu, John Bridgman

Signed-off-by: John Bridgman <john.bridgman@amd.com>
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 MAINTAINERS                                       |    1 +
 drivers/gpu/drm/amd/amdgpu/Makefile               |    3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 1043 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c             |    1 +
 6 files changed, 1052 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6804170..9bfb765 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -766,6 +766,7 @@ F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
 F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
 F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
 F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
 F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
 F:	drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
 F:	drivers/gpu/drm/amd/amdkfd/
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2ca2b51..f300202 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -130,7 +130,8 @@ amdgpu-y += \
 	 amdgpu_amdkfd.o \
 	 amdgpu_amdkfd_fence.o \
 	 amdgpu_amdkfd_gpuvm.o \
-	 amdgpu_amdkfd_gfx_v8.o
+	 amdgpu_amdkfd_gfx_v8.o \
+	 amdgpu_amdkfd_gfx_v9.o
 
 # add cgs
 amdgpu-y += amdgpu_cgs.o
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 4d36203..fcd10db 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -92,6 +92,10 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 	case CHIP_POLARIS11:
 		kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions();
 		break;
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		kfd2kgd = amdgpu_amdkfd_gfx_9_0_get_functions();
+		break;
 	default:
 		dev_dbg(adev->dev, "kfd not supported on this ASIC\n");
 		return;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index c3024b1..12367a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -122,6 +122,7 @@ int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
+struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void);
 
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
new file mode 100644
index 0000000..8f37991
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -0,0 +1,1043 @@
+/*
+ * Copyright 2014-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#define pr_fmt(fmt) "kfd2kgd: " fmt
+
+#include <linux/module.h>
+#include <linux/fdtable.h>
+#include <linux/uaccess.h>
+#include <linux/firmware.h>
+#include <drm/drmP.h>
+#include "amdgpu.h"
+#include "amdgpu_amdkfd.h"
+#include "amdgpu_ucode.h"
+#include "soc15_hw_ip.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "vega10_enum.h"
+#include "sdma0/sdma0_4_0_offset.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+#include "sdma1/sdma1_4_0_offset.h"
+#include "sdma1/sdma1_4_0_sh_mask.h"
+#include "athub/athub_1_0_offset.h"
+#include "athub/athub_1_0_sh_mask.h"
+#include "oss/osssys_4_0_offset.h"
+#include "oss/osssys_4_0_sh_mask.h"
+#include "soc15_common.h"
+#include "v9_structs.h"
+#include "soc15.h"
+#include "soc15d.h"
+
+/* HACK: MMHUB and GC both have VM-related register with the same
+ * names but different offsets. Define the MMHUB register we need here
+ * with a prefix. A proper solution would be to move the functions
+ * programming these registers into gfx_v9_0.c and mmhub_v1_0.c
+ * respectively.
+ */
+#define mmMMHUB_VM_INVALIDATE_ENG16_REQ				0x06f3
+#define mmMMHUB_VM_INVALIDATE_ENG16_REQ_BASE_IDX		0
+
+#define mmMMHUB_VM_INVALIDATE_ENG16_ACK				0x0705
+#define mmMMHUB_VM_INVALIDATE_ENG16_ACK_BASE_IDX		0
+
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32		0x072b
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32_BASE_IDX	0
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32		0x072c
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32_BASE_IDX	0
+
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32		0x074b
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32_BASE_IDX	0
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_HI32		0x074c
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_HI32_BASE_IDX	0
+
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_LO32		0x076b
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_LO32_BASE_IDX	0
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_HI32		0x076c
+#define mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_HI32_BASE_IDX	0
+
+#define mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_LO32		0x0727
+#define mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_LO32_BASE_IDX	0
+#define mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_HI32		0x0728
+#define mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_HI32_BASE_IDX	0
+
+#define V9_PIPE_PER_MEC		(4)
+#define V9_QUEUES_PER_PIPE_MEC	(8)
+
+enum hqd_dequeue_request_type {
+	NO_ACTION = 0,
+	DRAIN_PIPE,
+	RESET_WAVES
+};
+
+/*
+ * Register access functions
+ */
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t sh_mem_config,
+		uint32_t sh_mem_ape1_base, uint32_t sh_mem_ape1_limit,
+		uint32_t sh_mem_bases);
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+		unsigned int vmid);
+static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id);
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr,
+			uint32_t wptr_shift, uint32_t wptr_mask,
+			struct mm_struct *mm);
+static int kgd_hqd_dump(struct kgd_dev *kgd,
+			uint32_t pipe_id, uint32_t queue_id,
+			uint32_t (**dump)[2], uint32_t *n_regs);
+static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd,
+			     uint32_t __user *wptr, struct mm_struct *mm);
+static int kgd_hqd_sdma_dump(struct kgd_dev *kgd,
+			     uint32_t engine_id, uint32_t queue_id,
+			     uint32_t (**dump)[2], uint32_t *n_regs);
+static bool kgd_hqd_is_occupied(struct kgd_dev *kgd, uint64_t queue_address,
+		uint32_t pipe_id, uint32_t queue_id);
+static bool kgd_hqd_sdma_is_occupied(struct kgd_dev *kgd, void *mqd);
+static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
+				enum kfd_preempt_type reset_type,
+				unsigned int utimeout, uint32_t pipe_id,
+				uint32_t queue_id);
+static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
+				unsigned int utimeout);
+static int kgd_address_watch_disable(struct kgd_dev *kgd);
+static int kgd_address_watch_execute(struct kgd_dev *kgd,
+					unsigned int watch_point_id,
+					uint32_t cntl_val,
+					uint32_t addr_hi,
+					uint32_t addr_lo);
+static int kgd_wave_control_execute(struct kgd_dev *kgd,
+					uint32_t gfx_index_val,
+					uint32_t sq_cmd);
+static uint32_t kgd_address_watch_get_offset(struct kgd_dev *kgd,
+					unsigned int watch_point_id,
+					unsigned int reg_offset);
+
+static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
+		uint8_t vmid);
+static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
+		uint8_t vmid);
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base);
+static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
+static void set_scratch_backing_va(struct kgd_dev *kgd,
+					uint64_t va, uint32_t vmid);
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
+
+/* Because of REG_GET_FIELD() being used, we put this function in the
+ * asic specific file.
+ */
+static int amdgpu_amdkfd_get_tile_config(struct kgd_dev *kgd,
+		struct tile_config *config)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+
+	config->gb_addr_config = adev->gfx.config.gb_addr_config;
+
+	config->tile_config_ptr = adev->gfx.config.tile_mode_array;
+	config->num_tile_configs =
+			ARRAY_SIZE(adev->gfx.config.tile_mode_array);
+	config->macro_tile_config_ptr =
+			adev->gfx.config.macrotile_mode_array;
+	config->num_macro_tile_configs =
+			ARRAY_SIZE(adev->gfx.config.macrotile_mode_array);
+
+	return 0;
+}
+
+static const struct kfd2kgd_calls kfd2kgd = {
+	.init_gtt_mem_allocation = alloc_gtt_mem,
+	.free_gtt_mem = free_gtt_mem,
+	.get_local_mem_info = get_local_mem_info,
+	.get_gpu_clock_counter = get_gpu_clock_counter,
+	.get_max_engine_clock_in_mhz = get_max_engine_clock_in_mhz,
+	.alloc_pasid = amdgpu_pasid_alloc,
+	.free_pasid = amdgpu_pasid_free,
+	.program_sh_mem_settings = kgd_program_sh_mem_settings,
+	.set_pasid_vmid_mapping = kgd_set_pasid_vmid_mapping,
+	.init_interrupts = kgd_init_interrupts,
+	.hqd_load = kgd_hqd_load,
+	.hqd_sdma_load = kgd_hqd_sdma_load,
+	.hqd_dump = kgd_hqd_dump,
+	.hqd_sdma_dump = kgd_hqd_sdma_dump,
+	.hqd_is_occupied = kgd_hqd_is_occupied,
+	.hqd_sdma_is_occupied = kgd_hqd_sdma_is_occupied,
+	.hqd_destroy = kgd_hqd_destroy,
+	.hqd_sdma_destroy = kgd_hqd_sdma_destroy,
+	.address_watch_disable = kgd_address_watch_disable,
+	.address_watch_execute = kgd_address_watch_execute,
+	.wave_control_execute = kgd_wave_control_execute,
+	.address_watch_get_offset = kgd_address_watch_get_offset,
+	.get_atc_vmid_pasid_mapping_pasid =
+			get_atc_vmid_pasid_mapping_pasid,
+	.get_atc_vmid_pasid_mapping_valid =
+			get_atc_vmid_pasid_mapping_valid,
+	.get_fw_version = get_fw_version,
+	.set_scratch_backing_va = set_scratch_backing_va,
+	.get_tile_config = amdgpu_amdkfd_get_tile_config,
+	.get_cu_info = get_cu_info,
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
+	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
+	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
+	.set_vm_context_page_table_base = set_vm_context_page_table_base,
+	.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
+	.free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
+	.map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
+	.unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
+	.sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
+	.map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
+	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
+	.invalidate_tlbs = invalidate_tlbs,
+	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
+	.submit_ib = amdgpu_amdkfd_submit_ib,
+};
+
+struct kfd2kgd_calls *amdgpu_amdkfd_gfx_9_0_get_functions(void)
+{
+	return (struct kfd2kgd_calls *)&kfd2kgd;
+}
+
+static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
+{
+	return (struct amdgpu_device *)kgd;
+}
+
+static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe,
+			uint32_t queue, uint32_t vmid)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	mutex_lock(&adev->srbm_mutex);
+	soc15_grbm_select(adev, mec, pipe, queue, vmid);
+}
+
+static void unlock_srbm(struct kgd_dev *kgd)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	soc15_grbm_select(adev, 0, 0, 0, 0);
+	mutex_unlock(&adev->srbm_mutex);
+}
+
+static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
+				uint32_t queue_id)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	uint32_t mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
+
+	lock_srbm(kgd, mec, pipe, queue_id, 0);
+}
+
+static uint32_t get_queue_mask(struct amdgpu_device *adev,
+			       uint32_t pipe_id, uint32_t queue_id)
+{
+	unsigned int bit = (pipe_id * adev->gfx.mec.num_queue_per_pipe +
+			    queue_id) & 31;
+
+	return ((uint32_t)1) << bit;
+}
+
+static void release_queue(struct kgd_dev *kgd)
+{
+	unlock_srbm(kgd);
+}
+
+static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
+					uint32_t sh_mem_config,
+					uint32_t sh_mem_ape1_base,
+					uint32_t sh_mem_ape1_limit,
+					uint32_t sh_mem_bases)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	lock_srbm(kgd, 0, 0, 0, vmid);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_CONFIG), sh_mem_config);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSH_MEM_BASES), sh_mem_bases);
+	/* APE1 no longer exists on GFX9 */
+
+	unlock_srbm(kgd);
+}
+
+static int kgd_set_pasid_vmid_mapping(struct kgd_dev *kgd, unsigned int pasid,
+					unsigned int vmid)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	/*
+	 * We have to assume that there is no outstanding mapping.
+	 * The ATC_VMID_PASID_MAPPING_UPDATE_STATUS bit could be 0 because
+	 * a mapping is in progress or because a mapping finished
+	 * and the SW cleared it.
+	 * So the protocol is to always wait & clear.
+	 */
+	uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid |
+			ATC_VMID0_PASID_MAPPING__VALID_MASK;
+
+	/*
+	 * need to do this twice, once for gfx and once for mmhub
+	 * for ATC add 16 to VMID for mmhub, for IH different registers.
+	 * ATC_VMID0..15 registers are separate from ATC_VMID16..31.
+	 */
+
+	WREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING) + vmid,
+	       pasid_mapping);
+
+	while (!(RREG32(SOC15_REG_OFFSET(
+				ATHUB, 0,
+				mmATC_VMID_PASID_MAPPING_UPDATE_STATUS)) &
+		 (1U << vmid)))
+		cpu_relax();
+
+	WREG32(SOC15_REG_OFFSET(ATHUB, 0,
+				mmATC_VMID_PASID_MAPPING_UPDATE_STATUS),
+	       1U << vmid);
+
+	/* Mapping vmid to pasid also for IH block */
+	WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT) + vmid,
+	       pasid_mapping);
+
+	WREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID16_PASID_MAPPING) + vmid,
+	       pasid_mapping);
+
+	while (!(RREG32(SOC15_REG_OFFSET(
+				ATHUB, 0,
+				mmATC_VMID_PASID_MAPPING_UPDATE_STATUS)) &
+		 (1U << (vmid + 16))))
+		cpu_relax();
+
+	WREG32(SOC15_REG_OFFSET(ATHUB, 0,
+				mmATC_VMID_PASID_MAPPING_UPDATE_STATUS),
+	       1U << (vmid + 16));
+
+	/* Mapping vmid to pasid also for IH block */
+	WREG32(SOC15_REG_OFFSET(OSSSYS, 0, mmIH_VMID_0_LUT_MM) + vmid,
+	       pasid_mapping);
+	return 0;
+}
+
+/* TODO - RING0 form of field is obsolete, seems to date back to SI
+ * but still works
+ */
+
+static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint32_t mec;
+	uint32_t pipe;
+
+	mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+	pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
+
+	lock_srbm(kgd, mec, pipe, 0, 0);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmCPC_INT_CNTL),
+		CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
+		CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
+
+	unlock_srbm(kgd);
+
+	return 0;
+}
+
+static uint32_t get_sdma_base_addr(struct amdgpu_device *adev,
+				unsigned int engine_id,
+				unsigned int queue_id)
+{
+	uint32_t base[2] = {
+		SOC15_REG_OFFSET(SDMA0, 0,
+				 mmSDMA0_RLC0_RB_CNTL) - mmSDMA0_RLC0_RB_CNTL,
+		SOC15_REG_OFFSET(SDMA1, 0,
+				 mmSDMA1_RLC0_RB_CNTL) - mmSDMA1_RLC0_RB_CNTL
+	};
+	uint32_t retval;
+
+	retval = base[engine_id] + queue_id * (mmSDMA0_RLC1_RB_CNTL -
+					       mmSDMA0_RLC0_RB_CNTL);
+
+	pr_debug("sdma base address: 0x%x\n", retval);
+
+	return retval;
+}
+
+static inline struct v9_mqd *get_mqd(void *mqd)
+{
+	return (struct v9_mqd *)mqd;
+}
+
+static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
+{
+	return (struct v9_sdma_mqd *)mqd;
+}
+
+static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
+			uint32_t queue_id, uint32_t __user *wptr,
+			uint32_t wptr_shift, uint32_t wptr_mask,
+			struct mm_struct *mm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct v9_mqd *m;
+	uint32_t *mqd_hqd;
+	uint32_t reg, hqd_base, data;
+
+	m = get_mqd(mqd);
+
+	acquire_queue(kgd, pipe_id, queue_id);
+
+	/* HIQ is set during driver init period with vmid set to 0*/
+	if (m->cp_hqd_vmid == 0) {
+		uint32_t value, mec, pipe;
+
+		mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+		pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
+
+		pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
+			mec, pipe, queue_id);
+		value = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS));
+		value = REG_SET_FIELD(value, RLC_CP_SCHEDULERS, scheduler1,
+			((mec << 5) | (pipe << 3) | queue_id | 0x80));
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS), value);
+	}
+
+	/* HQD registers extend from CP_MQD_BASE_ADDR to CP_HQD_EOP_WPTR_MEM. */
+	mqd_hqd = &m->cp_mqd_base_addr_lo;
+	hqd_base = SOC15_REG_OFFSET(GC, 0, mmCP_MQD_BASE_ADDR);
+
+	for (reg = hqd_base;
+	     reg <= SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI); reg++)
+		WREG32(reg, mqd_hqd[reg - hqd_base]);
+
+
+	/* Activate doorbell logic before triggering WPTR poll. */
+	data = REG_SET_FIELD(m->cp_hqd_pq_doorbell_control,
+			     CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), data);
+
+	if (wptr) {
+		/* Don't read wptr with get_user because the user
+		 * context may not be accessible (if this function
+		 * runs in a work queue). Instead trigger a one-shot
+		 * polling read from memory in the CP. This assumes
+		 * that wptr is GPU-accessible in the queue's VMID via
+		 * ATC or SVM. WPTR==RPTR before starting the poll so
+		 * the CP starts fetching new commands from the right
+		 * place.
+		 *
+		 * Guessing a 64-bit WPTR from a 32-bit RPTR is a bit
+		 * tricky. Assume that the queue didn't overflow. The
+		 * number of valid bits in the 32-bit RPTR depends on
+		 * the queue size. The remaining bits are taken from
+		 * the saved 64-bit WPTR. If the WPTR wrapped, add the
+		 * queue size.
+		 */
+		uint32_t queue_size =
+			2 << REG_GET_FIELD(m->cp_hqd_pq_control,
+					   CP_HQD_PQ_CONTROL, QUEUE_SIZE);
+		uint64_t guessed_wptr = m->cp_hqd_pq_rptr & (queue_size - 1);
+
+		if ((m->cp_hqd_pq_wptr_lo & (queue_size - 1)) < guessed_wptr)
+			guessed_wptr += queue_size;
+		guessed_wptr += m->cp_hqd_pq_wptr_lo & ~(queue_size - 1);
+		guessed_wptr += (uint64_t)m->cp_hqd_pq_wptr_hi << 32;
+
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
+		       lower_32_bits(guessed_wptr));
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
+		       upper_32_bits(guessed_wptr));
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
+		       lower_32_bits((uint64_t)wptr));
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
+		       upper_32_bits((uint64_t)wptr));
+		WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_PQ_WPTR_POLL_CNTL1),
+		       get_queue_mask(adev, pipe_id, queue_id));
+	}
+
+	/* Start the EOP fetcher */
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_EOP_RPTR),
+	       REG_SET_FIELD(m->cp_hqd_eop_rptr,
+			     CP_HQD_EOP_RPTR, INIT_FETCHER, 1));
+
+	data = REG_SET_FIELD(m->cp_hqd_active, CP_HQD_ACTIVE, ACTIVE, 1);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE), data);
+
+	release_queue(kgd);
+
+	return 0;
+}
+
+static int kgd_hqd_dump(struct kgd_dev *kgd,
+			uint32_t pipe_id, uint32_t queue_id,
+			uint32_t (**dump)[2], uint32_t *n_regs)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint32_t i = 0, reg;
+#define HQD_N_REGS 56
+#define DUMP_REG(addr) do {				\
+		if (WARN_ON_ONCE(i >= HQD_N_REGS))	\
+			break;				\
+		(*dump)[i][0] = (addr) << 2;		\
+		(*dump)[i++][1] = RREG32(addr);		\
+	} while (0)
+
+	*dump = kmalloc(HQD_N_REGS*2*sizeof(uint32_t), GFP_KERNEL);
+	if (*dump == NULL)
+		return -ENOMEM;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+
+	for (reg = SOC15_REG_OFFSET(GC, 0, mmCP_MQD_BASE_ADDR);
+	     reg <= SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI); reg++)
+		DUMP_REG(reg);
+
+	release_queue(kgd);
+
+	WARN_ON_ONCE(i != HQD_N_REGS);
+	*n_regs = i;
+
+	return 0;
+}
+
+static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd,
+			     uint32_t __user *wptr, struct mm_struct *mm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct v9_sdma_mqd *m;
+	uint32_t sdma_base_addr, sdmax_gfx_context_cntl;
+	unsigned long end_jiffies;
+	uint32_t data;
+	uint64_t data64;
+	uint64_t __user *wptr64 = (uint64_t __user *)wptr;
+
+	m = get_sdma_mqd(mqd);
+	sdma_base_addr = get_sdma_base_addr(adev, m->sdma_engine_id,
+					    m->sdma_queue_id);
+	sdmax_gfx_context_cntl = m->sdma_engine_id ?
+		SOC15_REG_OFFSET(SDMA1, 0, mmSDMA1_GFX_CONTEXT_CNTL) :
+		SOC15_REG_OFFSET(SDMA0, 0, mmSDMA0_GFX_CONTEXT_CNTL);
+
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL,
+		m->sdmax_rlcx_rb_cntl & (~SDMA0_RLC0_RB_CNTL__RB_ENABLE_MASK));
+
+	end_jiffies = msecs_to_jiffies(2000) + jiffies;
+	while (true) {
+		data = RREG32(sdma_base_addr + mmSDMA0_RLC0_CONTEXT_STATUS);
+		if (data & SDMA0_RLC0_CONTEXT_STATUS__IDLE_MASK)
+			break;
+		if (time_after(jiffies, end_jiffies))
+			return -ETIME;
+		usleep_range(500, 1000);
+	}
+	data = RREG32(sdmax_gfx_context_cntl);
+	data = REG_SET_FIELD(data, SDMA0_GFX_CONTEXT_CNTL,
+			     RESUME_CTX, 0);
+	WREG32(sdmax_gfx_context_cntl, data);
+
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_DOORBELL_OFFSET,
+	       m->sdmax_rlcx_doorbell_offset);
+
+	data = REG_SET_FIELD(m->sdmax_rlcx_doorbell, SDMA0_RLC0_DOORBELL,
+			     ENABLE, 1);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_DOORBELL, data);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR, m->sdmax_rlcx_rb_rptr);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR_HI,
+				m->sdmax_rlcx_rb_rptr_hi);
+
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_MINOR_PTR_UPDATE, 1);
+	if (read_user_wptr(mm, wptr64, data64)) {
+		WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_WPTR,
+		       lower_32_bits(data64));
+		WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_WPTR_HI,
+		       upper_32_bits(data64));
+	} else {
+		WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_WPTR,
+		       m->sdmax_rlcx_rb_rptr);
+		WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_WPTR_HI,
+		       m->sdmax_rlcx_rb_rptr_hi);
+	}
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_MINOR_PTR_UPDATE, 0);
+
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_BASE, m->sdmax_rlcx_rb_base);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_BASE_HI,
+			m->sdmax_rlcx_rb_base_hi);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR_ADDR_LO,
+			m->sdmax_rlcx_rb_rptr_addr_lo);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR_ADDR_HI,
+			m->sdmax_rlcx_rb_rptr_addr_hi);
+
+	data = REG_SET_FIELD(m->sdmax_rlcx_rb_cntl, SDMA0_RLC0_RB_CNTL,
+			     RB_ENABLE, 1);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL, data);
+
+	return 0;
+}
+
+static int kgd_hqd_sdma_dump(struct kgd_dev *kgd,
+			     uint32_t engine_id, uint32_t queue_id,
+			     uint32_t (**dump)[2], uint32_t *n_regs)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint32_t sdma_base_addr = get_sdma_base_addr(adev, engine_id, queue_id);
+	uint32_t i = 0, reg;
+#undef HQD_N_REGS
+#define HQD_N_REGS (19+6+7+10)
+
+	*dump = kmalloc(HQD_N_REGS*2*sizeof(uint32_t), GFP_KERNEL);
+	if (*dump == NULL)
+		return -ENOMEM;
+
+	for (reg = mmSDMA0_RLC0_RB_CNTL; reg <= mmSDMA0_RLC0_DOORBELL; reg++)
+		DUMP_REG(sdma_base_addr + reg);
+	for (reg = mmSDMA0_RLC0_STATUS; reg <= mmSDMA0_RLC0_CSA_ADDR_HI; reg++)
+		DUMP_REG(sdma_base_addr + reg);
+	for (reg = mmSDMA0_RLC0_IB_SUB_REMAIN;
+	     reg <= mmSDMA0_RLC0_MINOR_PTR_UPDATE; reg++)
+		DUMP_REG(sdma_base_addr + reg);
+	for (reg = mmSDMA0_RLC0_MIDCMD_DATA0;
+	     reg <= mmSDMA0_RLC0_MIDCMD_CNTL; reg++)
+		DUMP_REG(sdma_base_addr + reg);
+
+	WARN_ON_ONCE(i != HQD_N_REGS);
+	*n_regs = i;
+
+	return 0;
+}
+
+static bool kgd_hqd_is_occupied(struct kgd_dev *kgd, uint64_t queue_address,
+				uint32_t pipe_id, uint32_t queue_id)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint32_t act;
+	bool retval = false;
+	uint32_t low, high;
+
+	acquire_queue(kgd, pipe_id, queue_id);
+	act = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+	if (act) {
+		low = lower_32_bits(queue_address >> 8);
+		high = upper_32_bits(queue_address >> 8);
+
+		if (low == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE)) &&
+		   high == RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_BASE_HI)))
+			retval = true;
+	}
+	release_queue(kgd);
+	return retval;
+}
+
+static bool kgd_hqd_sdma_is_occupied(struct kgd_dev *kgd, void *mqd)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct v9_sdma_mqd *m;
+	uint32_t sdma_base_addr;
+	uint32_t sdma_rlc_rb_cntl;
+
+	m = get_sdma_mqd(mqd);
+	sdma_base_addr = get_sdma_base_addr(adev, m->sdma_engine_id,
+					    m->sdma_queue_id);
+
+	sdma_rlc_rb_cntl = RREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL);
+
+	if (sdma_rlc_rb_cntl & SDMA0_RLC0_RB_CNTL__RB_ENABLE_MASK)
+		return true;
+
+	return false;
+}
+
+static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
+				enum kfd_preempt_type reset_type,
+				unsigned int utimeout, uint32_t pipe_id,
+				uint32_t queue_id)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	enum hqd_dequeue_request_type type;
+	unsigned long end_jiffies;
+	uint32_t temp;
+	struct v9_mqd *m = get_mqd(mqd);
+
+	acquire_queue(kgd, pipe_id, queue_id);
+
+	if (m->cp_hqd_vmid == 0)
+		WREG32_FIELD15(GC, 0, RLC_CP_SCHEDULERS, scheduler1, 0);
+
+	switch (reset_type) {
+	case KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN:
+		type = DRAIN_PIPE;
+		break;
+	case KFD_PREEMPT_TYPE_WAVEFRONT_RESET:
+		type = RESET_WAVES;
+		break;
+	default:
+		type = DRAIN_PIPE;
+		break;
+	}
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_DEQUEUE_REQUEST), type);
+
+	end_jiffies = (utimeout * HZ / 1000) + jiffies;
+	while (true) {
+		temp = RREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_ACTIVE));
+		if (!(temp & CP_HQD_ACTIVE__ACTIVE_MASK))
+			break;
+		if (time_after(jiffies, end_jiffies)) {
+			pr_err("cp queue preemption time out.\n");
+			release_queue(kgd);
+			return -ETIME;
+		}
+		usleep_range(500, 1000);
+	}
+
+	release_queue(kgd);
+	return 0;
+}
+
+static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
+				unsigned int utimeout)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct v9_sdma_mqd *m;
+	uint32_t sdma_base_addr;
+	uint32_t temp;
+	unsigned long end_jiffies = (utimeout * HZ / 1000) + jiffies;
+
+	m = get_sdma_mqd(mqd);
+	sdma_base_addr = get_sdma_base_addr(adev, m->sdma_engine_id,
+					    m->sdma_queue_id);
+
+	temp = RREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL);
+	temp = temp & ~SDMA0_RLC0_RB_CNTL__RB_ENABLE_MASK;
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL, temp);
+
+	while (true) {
+		temp = RREG32(sdma_base_addr + mmSDMA0_RLC0_CONTEXT_STATUS);
+		if (temp & SDMA0_RLC0_CONTEXT_STATUS__IDLE_MASK)
+			break;
+		if (time_after(jiffies, end_jiffies))
+			return -ETIME;
+		usleep_range(500, 1000);
+	}
+
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_DOORBELL, 0);
+	WREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL,
+		RREG32(sdma_base_addr + mmSDMA0_RLC0_RB_CNTL) |
+		SDMA0_RLC0_RB_CNTL__RB_ENABLE_MASK);
+
+	m->sdmax_rlcx_rb_rptr = RREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR);
+	m->sdmax_rlcx_rb_rptr_hi =
+		RREG32(sdma_base_addr + mmSDMA0_RLC0_RB_RPTR_HI);
+
+	return 0;
+}
+
+static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
+							uint8_t vmid)
+{
+	uint32_t reg;
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	reg = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
+		     + vmid);
+	return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
+}
+
+static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
+								uint8_t vmid)
+{
+	uint32_t reg;
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	reg = RREG32(SOC15_REG_OFFSET(ATHUB, 0, mmATC_VMID0_PASID_MAPPING)
+		     + vmid);
+	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
+}
+
+static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	uint32_t req = (1 << vmid) |
+		(0 << VM_INVALIDATE_ENG16_REQ__FLUSH_TYPE__SHIFT) | /* legacy */
+		VM_INVALIDATE_ENG16_REQ__INVALIDATE_L2_PTES_MASK |
+		VM_INVALIDATE_ENG16_REQ__INVALIDATE_L2_PDE0_MASK |
+		VM_INVALIDATE_ENG16_REQ__INVALIDATE_L2_PDE1_MASK |
+		VM_INVALIDATE_ENG16_REQ__INVALIDATE_L2_PDE2_MASK |
+		VM_INVALIDATE_ENG16_REQ__INVALIDATE_L1_PTES_MASK;
+
+	mutex_lock(&adev->srbm_mutex);
+
+	/* Use legacy mode tlb invalidation.
+	 *
+	 * Currently on Raven the code below is broken for anything but
+	 * legacy mode due to a MMHUB power gating problem. A workaround
+	 * is for MMHUB to wait until the condition PER_VMID_INVALIDATE_REQ
+	 * == PER_VMID_INVALIDATE_ACK instead of simply waiting for the ack
+	 * bit.
+	 *
+	 * TODO 1: agree on the right set of invalidation registers for
+	 * KFD use. Use the last one for now. Invalidate both GC and
+	 * MMHUB.
+	 *
+	 * TODO 2: support range-based invalidation, requires kfg2kgd
+	 * interface change
+	 */
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_INVALIDATE_ENG16_ADDR_RANGE_LO32),
+				0xffffffff);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_INVALIDATE_ENG16_ADDR_RANGE_HI32),
+				0x0000001f);
+
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0,
+				mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_LO32),
+				0xffffffff);
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0,
+				mmMMHUB_VM_INVALIDATE_ENG16_ADDR_RANGE_HI32),
+				0x0000001f);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_INVALIDATE_ENG16_REQ), req);
+
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_INVALIDATE_ENG16_REQ),
+				req);
+
+	while (!(RREG32(SOC15_REG_OFFSET(GC, 0, mmVM_INVALIDATE_ENG16_ACK)) &
+					(1 << vmid)))
+		cpu_relax();
+
+	while (!(RREG32(SOC15_REG_OFFSET(MMHUB, 0,
+					mmMMHUB_VM_INVALIDATE_ENG16_ACK)) &
+					(1 << vmid)))
+		cpu_relax();
+
+	mutex_unlock(&adev->srbm_mutex);
+
+}
+
+static int invalidate_tlbs_with_kiq(struct amdgpu_device *adev, uint16_t pasid)
+{
+	signed long r;
+	uint32_t seq;
+	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
+
+	spin_lock(&adev->gfx.kiq.ring_lock);
+	amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
+	amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
+	amdgpu_ring_write(ring,
+			PACKET3_INVALIDATE_TLBS_DST_SEL(1) |
+			PACKET3_INVALIDATE_TLBS_ALL_HUB(1) |
+			PACKET3_INVALIDATE_TLBS_PASID(pasid) |
+			PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(0)); /* legacy */
+	amdgpu_fence_emit_polling(ring, &seq);
+	amdgpu_ring_commit(ring);
+	spin_unlock(&adev->gfx.kiq.ring_lock);
+
+	r = amdgpu_fence_wait_polling(ring, seq, adev->usec_timeout);
+	if (r < 1) {
+		DRM_ERROR("wait for kiq fence error: %ld.\n", r);
+		return -ETIME;
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	int vmid;
+	struct amdgpu_ring *ring = &adev->gfx.kiq.ring;
+
+	if (ring->ready)
+		return invalidate_tlbs_with_kiq(adev, pasid);
+
+	for (vmid = 0; vmid < 16; vmid++) {
+		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
+			continue;
+		if (get_atc_vmid_pasid_mapping_valid(kgd, vmid)) {
+			if (get_atc_vmid_pasid_mapping_pasid(kgd, vmid)
+				== pasid) {
+				write_vmid_invalidate_request(kgd, vmid);
+				break;
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("non kfd vmid %d\n", vmid);
+		return 0;
+	}
+
+	write_vmid_invalidate_request(kgd, vmid);
+	return 0;
+}
+
+static int kgd_address_watch_disable(struct kgd_dev *kgd)
+{
+	return 0;
+}
+
+static int kgd_address_watch_execute(struct kgd_dev *kgd,
+					unsigned int watch_point_id,
+					uint32_t cntl_val,
+					uint32_t addr_hi,
+					uint32_t addr_lo)
+{
+	return 0;
+}
+
+static int kgd_wave_control_execute(struct kgd_dev *kgd,
+					uint32_t gfx_index_val,
+					uint32_t sq_cmd)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint32_t data = 0;
+
+	mutex_lock(&adev->grbm_idx_mutex);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX), gfx_index_val);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmSQ_CMD), sq_cmd);
+
+	data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
+		INSTANCE_BROADCAST_WRITES, 1);
+	data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
+		SH_BROADCAST_WRITES, 1);
+	data = REG_SET_FIELD(data, GRBM_GFX_INDEX,
+		SE_BROADCAST_WRITES, 1);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmGRBM_GFX_INDEX), data);
+	mutex_unlock(&adev->grbm_idx_mutex);
+
+	return 0;
+}
+
+static uint32_t kgd_address_watch_get_offset(struct kgd_dev *kgd,
+					unsigned int watch_point_id,
+					unsigned int reg_offset)
+{
+	return 0;
+}
+
+static void set_scratch_backing_va(struct kgd_dev *kgd,
+					uint64_t va, uint32_t vmid)
+{
+	/* No longer needed on GFXv9. The scratch base address is
+	 * passed to the shader by the CP. It's the user mode driver's
+	 * responsibility.
+	 */
+}
+
+/* FIXME: Does this need to be ASIC-specific code? */
+static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	const union amdgpu_firmware_header *hdr;
+
+	switch (type) {
+	case KGD_ENGINE_PFP:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.pfp_fw->data;
+		break;
+
+	case KGD_ENGINE_ME:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.me_fw->data;
+		break;
+
+	case KGD_ENGINE_CE:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.ce_fw->data;
+		break;
+
+	case KGD_ENGINE_MEC1:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.mec_fw->data;
+		break;
+
+	case KGD_ENGINE_MEC2:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.mec2_fw->data;
+		break;
+
+	case KGD_ENGINE_RLC:
+		hdr = (const union amdgpu_firmware_header *)adev->gfx.rlc_fw->data;
+		break;
+
+	case KGD_ENGINE_SDMA1:
+		hdr = (const union amdgpu_firmware_header *)adev->sdma.instance[0].fw->data;
+		break;
+
+	case KGD_ENGINE_SDMA2:
+		hdr = (const union amdgpu_firmware_header *)adev->sdma.instance[1].fw->data;
+		break;
+
+	default:
+		return 0;
+	}
+
+	if (hdr == NULL)
+		return 0;
+
+	/* Only 12 bit in use*/
+	return hdr->common.ucode_version;
+}
+
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	uint64_t base = (uint64_t)page_table_base << PAGE_SHIFT |
+		AMDGPU_PTE_VALID;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("trying to set page table base for wrong VMID %u\n",
+		       vmid);
+		return;
+	}
+
+	/* TODO: take advantage of per-process address space size. For
+	 * now, all processes share the same address space size, like
+	 * on GFX8 and older.
+	 */
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32) + (vmid*2), 0);
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_START_ADDR_HI32) + (vmid*2), 0);
+
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_LO32) + (vmid*2),
+			lower_32_bits(adev->vm_manager.max_pfn - 1));
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_END_ADDR_HI32) + (vmid*2),
+			upper_32_bits(adev->vm_manager.max_pfn - 1));
+
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32) + (vmid*2), lower_32_bits(base));
+	WREG32(SOC15_REG_OFFSET(MMHUB, 0, mmMMHUB_VM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32) + (vmid*2), upper_32_bits(base));
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_START_ADDR_LO32) + (vmid*2), 0);
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_START_ADDR_HI32) + (vmid*2), 0);
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_END_ADDR_LO32) + (vmid*2),
+			lower_32_bits(adev->vm_manager.max_pfn - 1));
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_END_ADDR_HI32) + (vmid*2),
+			upper_32_bits(adev->vm_manager.max_pfn - 1));
+
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_LO32) + (vmid*2), lower_32_bits(base));
+	WREG32(SOC15_REG_OFFSET(GC, 0, mmVM_CONTEXT0_PAGE_TABLE_BASE_ADDR_HI32) + (vmid*2), upper_32_bits(base));
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 1ae3de1..68aad8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -4627,6 +4627,7 @@ static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
 
 	cu_info->number = active_cu_number;
 	cu_info->ao_cu_mask = ao_cu_mask;
+	cu_info->simd_per_cu = NUM_SIMD_PER_CU;
 
 	return 0;
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/21] drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent Felix Kuehling
                     ` (18 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This is needed for Vega10 and later ASICs to let KFD know which
doorbells can be used for SDMA and CP queues respectively.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 22 ++++++++++++++++++++++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 15 +++++++++++++++
 2 files changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index fcd10db..cd0e8f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -179,6 +179,28 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 				&gpu_resources.doorbell_physical_address,
 				&gpu_resources.doorbell_aperture_size,
 				&gpu_resources.doorbell_start_offset);
+		if (adev->asic_type >= CHIP_VEGA10) {
+			/* On SOC15 the BIF is involved in routing
+			 * doorbells using the low 12 bits of the
+			 * address. Communicate the assignments to
+			 * KFD. KFD uses two doorbell pages per
+			 * process in case of 64-bit doorbells so we
+			 * can use each doorbell assignment twice.
+			 */
+			gpu_resources.sdma_doorbell[0][0] =
+				AMDGPU_DOORBELL64_sDMA_ENGINE0;
+			gpu_resources.sdma_doorbell[0][1] =
+				AMDGPU_DOORBELL64_sDMA_ENGINE0 + 0x200;
+			gpu_resources.sdma_doorbell[1][0] =
+				AMDGPU_DOORBELL64_sDMA_ENGINE1;
+			gpu_resources.sdma_doorbell[1][1] =
+				AMDGPU_DOORBELL64_sDMA_ENGINE1 + 0x200;
+			/* Doorbells 0x0f0-0ff and 0x2f0-2ff are reserved for
+			 * SDMA, IH and VCN. So don't use them for the CP.
+			 */
+			gpu_resources.reserved_doorbell_mask = 0x1f0;
+			gpu_resources.reserved_doorbell_val  = 0x0f0;
+		}
 
 		kgd2kfd->device_init(adev->kfd, &gpu_resources);
 	}
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 7cf3506..5733fbe 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -100,6 +100,21 @@ struct kgd2kfd_shared_resources {
 	/* Bit n == 1 means Queue n is available for KFD */
 	DECLARE_BITMAP(queue_bitmap, KGD_MAX_QUEUES);
 
+	/* Doorbell assignments (SOC15 and later chips only). Only
+	 * specific doorbells are routed to each SDMA engine. Others
+	 * are routed to IH and VCN. They are not usable by the CP.
+	 *
+	 * Any doorbell number D that satisfies the following condition
+	 * is reserved: (D & reserved_doorbell_mask) == reserved_doorbell_val
+	 *
+	 * KFD currently uses 1024 (= 0x3ff) doorbells per process. If
+	 * doorbells 0x0f0-0x0f7 and 0x2f-0x2f7 are reserved, that means
+	 * mask would be set to 0x1f8 and val set to 0x0f0.
+	 */
+	unsigned int sdma_doorbell[2][2];
+	unsigned int reserved_doorbell_mask;
+	unsigned int reserved_doorbell_val;
+
 	/* Base address of doorbell aperture. */
 	phys_addr_t doorbell_physical_address;
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 05/21] drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling Felix Kuehling
                     ` (17 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This prepares for GFXv9 (Vega10), which has 64-bit doorbells.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 10 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c | 48 ++++++++++++++++---------------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  7 +++--
 3 files changed, 39 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 7b57995..f563acb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -41,6 +41,7 @@ static const struct kfd_device_info kaveri_device_info = {
 	.max_pasid_bits = 16,
 	/* max num of queues for KV.TODO should be a dynamic value */
 	.max_no_of_hqd	= 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -55,6 +56,7 @@ static const struct kfd_device_info carrizo_device_info = {
 	.max_pasid_bits = 16,
 	/* max num of queues for CZ.TODO should be a dynamic value */
 	.max_no_of_hqd	= 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -70,6 +72,7 @@ static const struct kfd_device_info hawaii_device_info = {
 	.max_pasid_bits = 16,
 	/* max num of queues for KV.TODO should be a dynamic value */
 	.max_no_of_hqd	= 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -83,6 +86,7 @@ static const struct kfd_device_info tonga_device_info = {
 	.asic_family = CHIP_TONGA,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -96,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = {
 	.asic_family = CHIP_TONGA,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -109,6 +114,7 @@ static const struct kfd_device_info fiji_device_info = {
 	.asic_family = CHIP_FIJI,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -122,6 +128,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
 	.asic_family = CHIP_FIJI,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -136,6 +143,7 @@ static const struct kfd_device_info polaris10_device_info = {
 	.asic_family = CHIP_POLARIS10,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -149,6 +157,7 @@ static const struct kfd_device_info polaris10_vf_device_info = {
 	.asic_family = CHIP_POLARIS10,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
@@ -162,6 +171,7 @@ static const struct kfd_device_info polaris11_device_info = {
 	.asic_family = CHIP_POLARIS11,
 	.max_pasid_bits = 16,
 	.max_no_of_hqd  = 24,
+	.doorbell_size  = 4,
 	.ih_ring_entry_size = 4 * sizeof(uint32_t),
 	.event_interrupt_class = &event_interrupt_class_cik,
 	.num_of_watch_points = 4,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index ebb4da14..4840314 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -33,7 +33,6 @@
 
 static DEFINE_IDA(doorbell_ida);
 static unsigned int max_doorbell_slices;
-#define KFD_SIZE_OF_DOORBELL_IN_BYTES 4
 
 /*
  * Each device exposes a doorbell aperture, a PCI MMIO aperture that
@@ -50,9 +49,9 @@ static unsigned int max_doorbell_slices;
  */
 
 /* # of doorbell bytes allocated for each process. */
-static inline size_t doorbell_process_allocation(void)
+static size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
 {
-	return roundup(KFD_SIZE_OF_DOORBELL_IN_BYTES *
+	return roundup(kfd->device_info->doorbell_size *
 			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
 			PAGE_SIZE);
 }
@@ -72,16 +71,16 @@ int kfd_doorbell_init(struct kfd_dev *kfd)
 
 	doorbell_start_offset =
 			roundup(kfd->shared_resources.doorbell_start_offset,
-					doorbell_process_allocation());
+					kfd_doorbell_process_slice(kfd));
 
 	doorbell_aperture_size =
 			rounddown(kfd->shared_resources.doorbell_aperture_size,
-					doorbell_process_allocation());
+					kfd_doorbell_process_slice(kfd));
 
 	if (doorbell_aperture_size > doorbell_start_offset)
 		doorbell_process_limit =
 			(doorbell_aperture_size - doorbell_start_offset) /
-						doorbell_process_allocation();
+						kfd_doorbell_process_slice(kfd);
 	else
 		return -ENOSPC;
 
@@ -95,7 +94,7 @@ int kfd_doorbell_init(struct kfd_dev *kfd)
 	kfd->doorbell_id_offset = doorbell_start_offset / sizeof(u32);
 
 	kfd->doorbell_kernel_ptr = ioremap(kfd->doorbell_base,
-						doorbell_process_allocation());
+					   kfd_doorbell_process_slice(kfd));
 
 	if (!kfd->doorbell_kernel_ptr)
 		return -ENOMEM;
@@ -132,16 +131,16 @@ int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
 	phys_addr_t address;
 	struct kfd_dev *dev;
 
+	/* Find kfd device according to gpu id */
+	dev = kfd_device_by_id(vma->vm_pgoff);
+	if (!dev)
+		return -EINVAL;
+
 	/*
 	 * For simplicitly we only allow mapping of the entire doorbell
 	 * allocation of a single device & process.
 	 */
-	if (vma->vm_end - vma->vm_start != doorbell_process_allocation())
-		return -EINVAL;
-
-	/* Find kfd device according to gpu id */
-	dev = kfd_device_by_id(vma->vm_pgoff);
-	if (!dev)
+	if (vma->vm_end - vma->vm_start != kfd_doorbell_process_slice(dev))
 		return -EINVAL;
 
 	/* Calculate physical address of doorbell */
@@ -158,19 +157,19 @@ int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
 		 "     vm_flags            == 0x%04lX\n"
 		 "     size                == 0x%04lX\n",
 		 (unsigned long long) vma->vm_start, address, vma->vm_flags,
-		 doorbell_process_allocation());
+		 kfd_doorbell_process_slice(dev));
 
 
 	return io_remap_pfn_range(vma,
 				vma->vm_start,
 				address >> PAGE_SHIFT,
-				doorbell_process_allocation(),
+				kfd_doorbell_process_slice(dev),
 				vma->vm_page_prot);
 }
 
 
 /* get kernel iomem pointer for a doorbell */
-u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
+void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 					unsigned int *doorbell_off)
 {
 	u32 inx;
@@ -185,6 +184,8 @@ u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 	if (inx >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS)
 		return NULL;
 
+	inx *= kfd->device_info->doorbell_size / sizeof(u32);
+
 	/*
 	 * Calculating the kernel doorbell offset using the first
 	 * doorbell page.
@@ -210,7 +211,7 @@ void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr)
 	mutex_unlock(&kfd->doorbell_mutex);
 }
 
-inline void write_kernel_doorbell(u32 __iomem *db, u32 value)
+void write_kernel_doorbell(void __iomem *db, u32 value)
 {
 	if (db) {
 		writel(value, db);
@@ -228,20 +229,21 @@ unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
 {
 	/*
 	 * doorbell_id_offset accounts for doorbells taken by KGD.
-	 * index * doorbell_process_allocation/sizeof(u32) adjusts to
-	 * the process's doorbells.
+	 * index * kfd_doorbell_process_slice/sizeof(u32) adjusts to
+	 * the process's doorbells. The offset returned is in dword
+	 * units regardless of the ASIC-dependent doorbell size.
 	 */
 	return kfd->doorbell_id_offset +
 		process->doorbell_index
-		* doorbell_process_allocation() / sizeof(u32) +
-		queue_id;
+		* kfd_doorbell_process_slice(kfd) / sizeof(u32) +
+		queue_id * kfd->device_info->doorbell_size / sizeof(u32);
 }
 
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
 {
 	uint64_t num_of_elems = (kfd->shared_resources.doorbell_aperture_size -
 				kfd->shared_resources.doorbell_start_offset) /
-					doorbell_process_allocation() + 1;
+					kfd_doorbell_process_slice(kfd) + 1;
 
 	return num_of_elems;
 
@@ -251,7 +253,7 @@ phys_addr_t kfd_get_process_doorbells(struct kfd_dev *dev,
 					struct kfd_process *process)
 {
 	return dev->doorbell_base +
-		process->doorbell_index * doorbell_process_allocation();
+		process->doorbell_index * kfd_doorbell_process_slice(dev);
 }
 
 int kfd_alloc_process_doorbells(struct kfd_process *process)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4d5c49e..d9c0fe12 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -160,6 +160,7 @@ struct kfd_device_info {
 	const struct kfd_event_interrupt_class *event_interrupt_class;
 	unsigned int max_pasid_bits;
 	unsigned int max_no_of_hqd;
+	unsigned int doorbell_size;
 	size_t ih_ring_entry_size;
 	uint8_t num_of_watch_points;
 	uint16_t mqd_size_aligned;
@@ -364,7 +365,7 @@ struct queue_properties {
 	uint32_t queue_percent;
 	uint32_t *read_ptr;
 	uint32_t *write_ptr;
-	uint32_t __iomem *doorbell_ptr;
+	void __iomem *doorbell_ptr;
 	uint32_t doorbell_off;
 	bool is_interop;
 	bool is_evicted;
@@ -728,11 +729,11 @@ void kfd_pasid_free(unsigned int pasid);
 int kfd_doorbell_init(struct kfd_dev *kfd);
 void kfd_doorbell_fini(struct kfd_dev *kfd);
 int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
-u32 __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
+void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 					unsigned int *doorbell_off);
 void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
 u32 read_kernel_doorbell(u32 __iomem *db);
-void write_kernel_doorbell(u32 __iomem *db, u32 value);
+void write_kernel_doorbell(void __iomem *db, u32 value);
 unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
 					struct kfd_process *process,
 					unsigned int queue_id);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
       [not found]     ` <1523395998-31314-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:33   ` [PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15 Felix Kuehling
                     ` (16 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Use bit-rotate for better clarity and remove _MASK from the #defines as
these represent mmap types.

Centralize all the parsing of the mmap offset in kfd_mmap and add device
parameter to doorbell and reserved_mem map functions.

Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits
for encoding the the doorbell ID on Vega10.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 35 ++++++++++++++++++----------
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  9 ++------
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     | 38 ++++++++++++++++++++++++-------
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  8 +++----
 5 files changed, 59 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index b5e5f0e..f6b35f4 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -292,7 +292,8 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 
 
 	/* Return gpu_id as doorbell offset for mmap usage */
-	args->doorbell_offset = (KFD_MMAP_DOORBELL_MASK | args->gpu_id);
+	args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
+	args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
 	args->doorbell_offset <<= PAGE_SHIFT;
 
 	mutex_unlock(&p->mutex);
@@ -1645,23 +1646,33 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
 static int kfd_mmap(struct file *filp, struct vm_area_struct *vma)
 {
 	struct kfd_process *process;
+	struct kfd_dev *dev = NULL;
+	unsigned long vm_pgoff;
+	unsigned int gpu_id;
 
 	process = kfd_get_process(current);
 	if (IS_ERR(process))
 		return PTR_ERR(process);
 
-	if ((vma->vm_pgoff & KFD_MMAP_DOORBELL_MASK) ==
-			KFD_MMAP_DOORBELL_MASK) {
-		vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_DOORBELL_MASK;
-		return kfd_doorbell_mmap(process, vma);
-	} else if ((vma->vm_pgoff & KFD_MMAP_EVENTS_MASK) ==
-			KFD_MMAP_EVENTS_MASK) {
-		vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_EVENTS_MASK;
+	vm_pgoff = vma->vm_pgoff;
+	vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
+	gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
+	if (gpu_id)
+		dev = kfd_device_by_id(gpu_id);
+
+	switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
+	case KFD_MMAP_TYPE_DOORBELL:
+		if (!dev)
+			return -ENODEV;
+		return kfd_doorbell_mmap(dev, process, vma);
+
+	case KFD_MMAP_TYPE_EVENTS:
 		return kfd_event_mmap(process, vma);
-	} else if ((vma->vm_pgoff & KFD_MMAP_RESERVED_MEM_MASK) ==
-			KFD_MMAP_RESERVED_MEM_MASK) {
-		vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_RESERVED_MEM_MASK;
-		return kfd_reserved_mem_mmap(process, vma);
+
+	case KFD_MMAP_TYPE_RESERVED_MEM:
+		if (!dev)
+			return -ENODEV;
+		return kfd_reserved_mem_mmap(dev, process, vma);
 	}
 
 	return -EFAULT;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 4840314..efc59de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -126,15 +126,10 @@ void kfd_doorbell_fini(struct kfd_dev *kfd)
 		iounmap(kfd->doorbell_kernel_ptr);
 }
 
-int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
+int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
+		      struct vm_area_struct *vma)
 {
 	phys_addr_t address;
-	struct kfd_dev *dev;
-
-	/* Find kfd device according to gpu id */
-	dev = kfd_device_by_id(vma->vm_pgoff);
-	if (!dev)
-		return -EINVAL;
 
 	/*
 	 * For simplicitly we only allow mapping of the entire doorbell
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 4890a90..bccf2f7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -345,7 +345,7 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 	case KFD_EVENT_TYPE_DEBUG:
 		ret = create_signal_event(devkfd, p, ev);
 		if (!ret) {
-			*event_page_offset = KFD_MMAP_EVENTS_MASK;
+			*event_page_offset = KFD_MMAP_TYPE_EVENTS;
 			*event_page_offset <<= PAGE_SHIFT;
 			*event_slot_index = ev->event_id;
 		}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d9c0fe12..2d575c0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -41,9 +41,33 @@
 
 #define KFD_SYSFS_FILE_MODE 0444
 
-#define KFD_MMAP_DOORBELL_MASK 0x8000000000000ull
-#define KFD_MMAP_EVENTS_MASK 0x4000000000000ull
-#define KFD_MMAP_RESERVED_MEM_MASK 0x2000000000000ull
+/* GPU ID hash width in bits */
+#define KFD_GPU_ID_HASH_WIDTH 16
+
+/* Use upper bits of mmap offset to store KFD driver specific information.
+ * BITS[63:62] - Encode MMAP type
+ * BITS[61:46] - Encode gpu_id. To identify to which GPU the offset belongs to
+ * BITS[45:0]  - MMAP offset value
+ *
+ * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
+ *  defines are w.r.t to PAGE_SIZE
+ */
+#define KFD_MMAP_TYPE_SHIFT	(62 - PAGE_SHIFT)
+#define KFD_MMAP_TYPE_MASK	(0x3ULL << KFD_MMAP_TYPE_SHIFT)
+#define KFD_MMAP_TYPE_DOORBELL	(0x3ULL << KFD_MMAP_TYPE_SHIFT)
+#define KFD_MMAP_TYPE_EVENTS	(0x2ULL << KFD_MMAP_TYPE_SHIFT)
+#define KFD_MMAP_TYPE_RESERVED_MEM	(0x1ULL << KFD_MMAP_TYPE_SHIFT)
+
+#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
+#define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
+				<< KFD_MMAP_GPU_ID_SHIFT)
+#define KFD_MMAP_GPU_ID(gpu_id) ((((uint64_t)gpu_id) << KFD_MMAP_GPU_ID_SHIFT)\
+				& KFD_MMAP_GPU_ID_MASK)
+#define KFD_MMAP_GPU_ID_GET(offset)    ((offset & KFD_MMAP_GPU_ID_MASK) \
+				>> KFD_MMAP_GPU_ID_SHIFT)
+
+#define KFD_MMAP_OFFSET_VALUE_MASK	(0x3FFFFFFFFFFFULL >> PAGE_SHIFT)
+#define KFD_MMAP_OFFSET_VALUE_GET(offset) (offset & KFD_MMAP_OFFSET_VALUE_MASK)
 
 /*
  * When working with cp scheduler we should assign the HIQ manually or via
@@ -55,9 +79,6 @@
 #define KFD_CIK_HIQ_PIPE 4
 #define KFD_CIK_HIQ_QUEUE 0
 
-/* GPU ID hash width in bits */
-#define KFD_GPU_ID_HASH_WIDTH 16
-
 /* Macro for allocating structures */
 #define kfd_alloc_struct(ptr_to_struct)	\
 	((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
@@ -698,7 +719,7 @@ struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
 
-int kfd_reserved_mem_mmap(struct kfd_process *process,
+int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
 			  struct vm_area_struct *vma);
 
 /* KFD process API for creating and translating handles */
@@ -728,7 +749,8 @@ void kfd_pasid_free(unsigned int pasid);
 /* Doorbells */
 int kfd_doorbell_init(struct kfd_dev *kfd);
 void kfd_doorbell_fini(struct kfd_dev *kfd);
-int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
+int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
+		      struct vm_area_struct *vma);
 void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 					unsigned int *doorbell_off);
 void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 2791e72..131fe2a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -451,7 +451,8 @@ static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
 		if (!dev->cwsr_enabled || qpd->cwsr_kaddr || qpd->cwsr_base)
 			continue;
 
-		offset = (dev->id | KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
+		offset = (KFD_MMAP_TYPE_RESERVED_MEM | KFD_MMAP_GPU_ID(dev->id))
+			<< PAGE_SHIFT;
 		qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
 			KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
 			MAP_SHARED, offset);
@@ -989,15 +990,12 @@ int kfd_resume_all_processes(void)
 	return ret;
 }
 
-int kfd_reserved_mem_mmap(struct kfd_process *process,
+int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
 			  struct vm_area_struct *vma)
 {
-	struct kfd_dev *dev = kfd_device_by_id(vma->vm_pgoff);
 	struct kfd_process_device *pdd;
 	struct qcm_process_device *qpd;
 
-	if (!dev)
-		return -EINVAL;
 	if ((vma->vm_end - vma->vm_start) != KFD_CWSR_TBA_TMA_SIZE) {
 		pr_err("Incorrect CWSR mapping size.\n");
 		return -EINVAL;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file Felix Kuehling
                     ` (15 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Shaoyun Liu

Allocate doorbells according to the doorbell routing information on
SOC15 ASICs (Vega10 and later). On older ASICs we continue to use the
queue_id as the doorbell ID to maintain compatibility with the Thunk.

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |  7 ++
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 82 ++++++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          | 12 ++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              | 11 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 32 +++++++++
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 12 +++-
 6 files changed, 139 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f6b35f4..1a4d8dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -295,6 +295,13 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
 	args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
 	args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
 	args->doorbell_offset <<= PAGE_SHIFT;
+	if (KFD_IS_SOC15(dev->device_info->asic_family))
+		/* On SOC15 ASICs, doorbell allocation must be
+		 * per-device, and independent from the per-process
+		 * queue_id. Return the doorbell offset within the
+		 * doorbell aperture to user mode.
+		 */
+		args->doorbell_offset |= q_properties.doorbell_off;
 
 	mutex_unlock(&p->mutex);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index d55d29d..e9c72d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -110,6 +110,57 @@ void program_sh_mem_settings(struct device_queue_manager *dqm,
 						qpd->sh_mem_bases);
 }
 
+static int allocate_doorbell(struct qcm_process_device *qpd, struct queue *q)
+{
+	struct kfd_dev *dev = qpd->dqm->dev;
+
+	if (!KFD_IS_SOC15(dev->device_info->asic_family)) {
+		/* On pre-SOC15 chips we need to use the queue ID to
+		 * preserve the user mode ABI.
+		 */
+		q->doorbell_id = q->properties.queue_id;
+	} else if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
+		/* For SDMA queues on SOC15, use static doorbell
+		 * assignments based on the engine and queue.
+		 */
+		q->doorbell_id = dev->shared_resources.sdma_doorbell
+			[q->properties.sdma_engine_id]
+			[q->properties.sdma_queue_id];
+	} else {
+		/* For CP queues on SOC15 reserve a free doorbell ID */
+		unsigned int found;
+
+		found = find_first_zero_bit(qpd->doorbell_bitmap,
+					    KFD_MAX_NUM_OF_QUEUES_PER_PROCESS);
+		if (found >= KFD_MAX_NUM_OF_QUEUES_PER_PROCESS) {
+			pr_debug("No doorbells available");
+			return -EBUSY;
+		}
+		set_bit(found, qpd->doorbell_bitmap);
+		q->doorbell_id = found;
+	}
+
+	q->properties.doorbell_off =
+		kfd_doorbell_id_to_offset(dev, q->process,
+					  q->doorbell_id);
+
+	return 0;
+}
+
+static void deallocate_doorbell(struct qcm_process_device *qpd,
+				struct queue *q)
+{
+	unsigned int old;
+	struct kfd_dev *dev = qpd->dqm->dev;
+
+	if (!KFD_IS_SOC15(dev->device_info->asic_family) ||
+	    q->properties.type == KFD_QUEUE_TYPE_SDMA)
+		return;
+
+	old = test_and_clear_bit(q->doorbell_id, qpd->doorbell_bitmap);
+	WARN_ON(!old);
+}
+
 static int allocate_vmid(struct device_queue_manager *dqm,
 			struct qcm_process_device *qpd,
 			struct queue *q)
@@ -301,10 +352,14 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
 	if (retval)
 		return retval;
 
+	retval = allocate_doorbell(qpd, q);
+	if (retval)
+		goto out_deallocate_hqd;
+
 	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
 	if (retval)
-		goto out_deallocate_hqd;
+		goto out_deallocate_doorbell;
 
 	pr_debug("Loading mqd to hqd on pipe %d, queue %d\n",
 			q->pipe, q->queue);
@@ -324,6 +379,8 @@ static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
 
 out_uninit_mqd:
 	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+out_deallocate_doorbell:
+	deallocate_doorbell(qpd, q);
 out_deallocate_hqd:
 	deallocate_hqd(dqm, q);
 
@@ -357,6 +414,8 @@ static int destroy_queue_nocpsch_locked(struct device_queue_manager *dqm,
 	}
 	dqm->total_queue_count--;
 
+	deallocate_doorbell(qpd, q);
+
 	retval = mqd->destroy_mqd(mqd, q->mqd,
 				KFD_PREEMPT_TYPE_WAVEFRONT_RESET,
 				KFD_UNMAP_LATENCY_MS,
@@ -861,6 +920,10 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 	q->properties.sdma_queue_id = q->sdma_id / CIK_SDMA_QUEUES_PER_ENGINE;
 	q->properties.sdma_engine_id = q->sdma_id % CIK_SDMA_QUEUES_PER_ENGINE;
 
+	retval = allocate_doorbell(qpd, q);
+	if (retval)
+		goto out_deallocate_sdma_queue;
+
 	pr_debug("SDMA id is:    %d\n", q->sdma_id);
 	pr_debug("SDMA queue id: %d\n", q->properties.sdma_queue_id);
 	pr_debug("SDMA engine id: %d\n", q->properties.sdma_engine_id);
@@ -869,7 +932,7 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
 	if (retval)
-		goto out_deallocate_sdma_queue;
+		goto out_deallocate_doorbell;
 
 	retval = mqd->load_mqd(mqd, q->mqd, 0, 0, &q->properties, NULL);
 	if (retval)
@@ -879,6 +942,8 @@ static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
 
 out_uninit_mqd:
 	mqd->uninit_mqd(mqd, q->mqd, q->mqd_mem_obj);
+out_deallocate_doorbell:
+	deallocate_doorbell(qpd, q);
 out_deallocate_sdma_queue:
 	deallocate_sdma_queue(dqm, q->sdma_id);
 
@@ -1070,12 +1135,17 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		q->properties.sdma_engine_id =
 			q->sdma_id % CIK_SDMA_QUEUES_PER_ENGINE;
 	}
+
+	retval = allocate_doorbell(qpd, q);
+	if (retval)
+		goto out_deallocate_sdma_queue;
+
 	mqd = dqm->ops.get_mqd_manager(dqm,
 			get_mqd_type_from_queue_type(q->properties.type));
 
 	if (!mqd) {
 		retval = -ENOMEM;
-		goto out_deallocate_sdma_queue;
+		goto out_deallocate_doorbell;
 	}
 	/*
 	 * Eviction state logic: we only mark active queues as evicted
@@ -1093,7 +1163,7 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	retval = mqd->init_mqd(mqd, &q->mqd, &q->mqd_mem_obj,
 				&q->gart_mqd_addr, &q->properties);
 	if (retval)
-		goto out_deallocate_sdma_queue;
+		goto out_deallocate_doorbell;
 
 	list_add(&q->list, &qpd->queues_list);
 	qpd->queue_count++;
@@ -1117,6 +1187,8 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 	mutex_unlock(&dqm->lock);
 	return retval;
 
+out_deallocate_doorbell:
+	deallocate_doorbell(qpd, q);
 out_deallocate_sdma_queue:
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA)
 		deallocate_sdma_queue(dqm, q->sdma_id);
@@ -1257,6 +1329,8 @@ static int destroy_queue_cpsch(struct device_queue_manager *dqm,
 		goto failed;
 	}
 
+	deallocate_doorbell(qpd, q);
+
 	if (q->properties.type == KFD_QUEUE_TYPE_SDMA) {
 		dqm->sdma_queue_count--;
 		deallocate_sdma_queue(dqm, q->sdma_id);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index efc59de..36c9269e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -49,7 +49,7 @@ static unsigned int max_doorbell_slices;
  */
 
 /* # of doorbell bytes allocated for each process. */
-static size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
+size_t kfd_doorbell_process_slice(struct kfd_dev *kfd)
 {
 	return roundup(kfd->device_info->doorbell_size *
 			KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
@@ -214,13 +214,9 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
 	}
 }
 
-/*
- * queue_ids are in the range [0,MAX_PROCESS_QUEUES) and are mapped 1:1
- * to doorbells with the process's doorbell page
- */
-unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
+unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
-					unsigned int queue_id)
+					unsigned int doorbell_id)
 {
 	/*
 	 * doorbell_id_offset accounts for doorbells taken by KGD.
@@ -231,7 +227,7 @@ unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
 	return kfd->doorbell_id_offset +
 		process->doorbell_index
 		* kfd_doorbell_process_slice(kfd) / sizeof(u32) +
-		queue_id * kfd->device_info->doorbell_size / sizeof(u32);
+		doorbell_id * kfd->device_info->doorbell_size / sizeof(u32);
 }
 
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 2d575c0..ddb3c8c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -169,6 +169,8 @@ enum cache_policy {
 	cache_policy_noncoherent
 };
 
+#define KFD_IS_SOC15(chip) ((chip) >= CHIP_VEGA10)
+
 struct kfd_event_interrupt_class {
 	bool (*interrupt_isr)(struct kfd_dev *dev,
 				const uint32_t *ih_ring_entry);
@@ -449,6 +451,7 @@ struct queue {
 	uint32_t queue;
 
 	unsigned int sdma_id;
+	unsigned int doorbell_id;
 
 	struct kfd_process	*process;
 	struct kfd_dev		*device;
@@ -523,6 +526,9 @@ struct qcm_process_device {
 	/* IB memory */
 	uint64_t ib_base;
 	void *ib_kaddr;
+
+	/* doorbell resources per process per device */
+	unsigned long *doorbell_bitmap;
 };
 
 /* KFD Memory Eviction */
@@ -747,6 +753,7 @@ unsigned int kfd_pasid_alloc(void);
 void kfd_pasid_free(unsigned int pasid);
 
 /* Doorbells */
+size_t kfd_doorbell_process_slice(struct kfd_dev *kfd);
 int kfd_doorbell_init(struct kfd_dev *kfd);
 void kfd_doorbell_fini(struct kfd_dev *kfd);
 int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
@@ -756,9 +763,9 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
 u32 read_kernel_doorbell(u32 __iomem *db);
 void write_kernel_doorbell(void __iomem *db, u32 value);
-unsigned int kfd_queue_id_to_doorbell(struct kfd_dev *kfd,
+unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
-					unsigned int queue_id);
+					unsigned int doorbell_id);
 phys_addr_t kfd_get_process_doorbells(struct kfd_dev *dev,
 					struct kfd_process *process);
 int kfd_alloc_process_doorbells(struct kfd_process *process);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 131fe2a..1d80b4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -332,6 +332,7 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
+		kfree(pdd->qpd.doorbell_bitmap);
 		idr_destroy(&pdd->alloc_idr);
 
 		kfree(pdd);
@@ -586,6 +587,31 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return ERR_PTR(err);
 }
 
+static int init_doorbell_bitmap(struct qcm_process_device *qpd,
+			struct kfd_dev *dev)
+{
+	unsigned int i;
+
+	if (!KFD_IS_SOC15(dev->device_info->asic_family))
+		return 0;
+
+	qpd->doorbell_bitmap =
+		kzalloc(DIV_ROUND_UP(KFD_MAX_NUM_OF_QUEUES_PER_PROCESS,
+				     BITS_PER_BYTE), GFP_KERNEL);
+	if (!qpd->doorbell_bitmap)
+		return -ENOMEM;
+
+	/* Mask out any reserved doorbells */
+	for (i = 0; i < KFD_MAX_NUM_OF_QUEUES_PER_PROCESS; i++)
+		if ((dev->shared_resources.reserved_doorbell_mask & i) ==
+		    dev->shared_resources.reserved_doorbell_val) {
+			set_bit(i, qpd->doorbell_bitmap);
+			pr_debug("reserved doorbell 0x%03x\n", i);
+		}
+
+	return 0;
+}
+
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p)
 {
@@ -607,6 +633,12 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	if (!pdd)
 		return NULL;
 
+	if (init_doorbell_bitmap(&pdd->qpd, dev)) {
+		pr_err("Failed to init doorbell for process\n");
+		kfree(pdd);
+		return NULL;
+	}
+
 	pdd->dev = dev;
 	INIT_LIST_HEAD(&pdd->qpd.queues_list);
 	INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 7817e32..3045aeb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -119,9 +119,6 @@ static int create_cp_queue(struct process_queue_manager *pqm,
 	/* Doorbell initialized in user space*/
 	q_properties->doorbell_ptr = NULL;
 
-	q_properties->doorbell_off =
-			kfd_queue_id_to_doorbell(dev, pqm->process, qid);
-
 	/* let DQM handle it*/
 	q_properties->vmid = 0;
 	q_properties->queue_id = qid;
@@ -248,6 +245,15 @@ int pqm_create_queue(struct process_queue_manager *pqm,
 		goto err_create_queue;
 	}
 
+	if (q)
+		/* Return the doorbell offset within the doorbell page
+		 * to the caller so it can be passed up to user mode
+		 * (in bytes).
+		 */
+		properties->doorbell_off =
+			(q->properties.doorbell_off * sizeof(uint32_t)) &
+			(kfd_doorbell_process_slice(dev) - 1);
+
 	pr_debug("PQM After DQM create queue\n");
 
 	list_add(&pqn->process_queue_list, &pqm->queues);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15 Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions Felix Kuehling
                     ` (14 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Shaoyun Liu

This is in preparation for GFXv9 (Vega10) which uses incompatible PM4
packet formats from previous ASIC generations.

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  10 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   | 310 +++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 381 ++++-----------------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  35 +-
 4 files changed, 420 insertions(+), 316 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index e9c72d8..500f022 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -196,15 +196,19 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 static int flush_texture_cache_nocpsch(struct kfd_dev *kdev,
 				struct qcm_process_device *qpd)
 {
-	uint32_t len;
+	const struct packet_manager_funcs *pmf = qpd->dqm->packets.pmf;
+	int ret;
 
 	if (!qpd->ib_kaddr)
 		return -ENOMEM;
 
-	len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+	ret = pmf->release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+	if (ret)
+		return ret;
 
 	return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid,
-				qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len);
+				qpd->ib_base, (uint32_t *)qpd->ib_kaddr,
+				pmf->release_mem_size / sizeof(uint32_t));
 }
 
 static void deallocate_vmid(struct device_queue_manager *dqm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index f1d4828..7ee326f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -22,6 +22,9 @@
  */
 
 #include "kfd_kernel_queue.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers_vi.h"
+#include "kfd_pm4_opcodes.h"
 
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
@@ -54,3 +57,310 @@ static void uninitialize_vi(struct kernel_queue *kq)
 {
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
+
+static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
+{
+	union PM4_MES_TYPE_3_HEADER header;
+
+	header.u32All = 0;
+	header.opcode = opcode;
+	header.count = packet_size / 4 - 2;
+	header.type = PM4_TYPE_3;
+
+	return header.u32All;
+}
+
+static int pm_map_process_vi(struct packet_manager *pm, uint32_t *buffer,
+				struct qcm_process_device *qpd)
+{
+	struct pm4_mes_map_process *packet;
+
+	packet = (struct pm4_mes_map_process *)buffer;
+
+	memset(buffer, 0, sizeof(struct pm4_mes_map_process));
+
+	packet->header.u32All = build_pm4_header(IT_MAP_PROCESS,
+					sizeof(struct pm4_mes_map_process));
+	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+	packet->bitfields2.process_quantum = 1;
+	packet->bitfields2.pasid = qpd->pqm->process->pasid;
+	packet->bitfields3.page_table_base = qpd->page_table_base;
+	packet->bitfields10.gds_size = qpd->gds_size;
+	packet->bitfields10.num_gws = qpd->num_gws;
+	packet->bitfields10.num_oac = qpd->num_oac;
+	packet->bitfields10.num_queues = (qpd->is_debug) ? 0 : qpd->queue_count;
+
+	packet->sh_mem_config = qpd->sh_mem_config;
+	packet->sh_mem_bases = qpd->sh_mem_bases;
+	packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base;
+	packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit;
+
+	packet->sh_hidden_private_base_vmid = qpd->sh_hidden_private_base;
+
+	packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area);
+	packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area);
+
+	return 0;
+}
+
+static int pm_runlist_vi(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t ib, size_t ib_size_in_dwords, bool chain)
+{
+	struct pm4_mes_runlist *packet;
+	int concurrent_proc_cnt = 0;
+	struct kfd_dev *kfd = pm->dqm->dev;
+
+	if (WARN_ON(!ib))
+		return -EFAULT;
+
+	/* Determine the number of processes to map together to HW:
+	 * it can not exceed the number of VMIDs available to the
+	 * scheduler, and it is determined by the smaller of the number
+	 * of processes in the runlist and kfd module parameter
+	 * hws_max_conc_proc.
+	 * Note: the arbitration between the number of VMIDs and
+	 * hws_max_conc_proc has been done in
+	 * kgd2kfd_device_init().
+	 */
+	concurrent_proc_cnt = min(pm->dqm->processes_count,
+			kfd->max_proc_per_quantum);
+
+	packet = (struct pm4_mes_runlist *)buffer;
+
+	memset(buffer, 0, sizeof(struct pm4_mes_runlist));
+	packet->header.u32All = build_pm4_header(IT_RUN_LIST,
+						sizeof(struct pm4_mes_runlist));
+
+	packet->bitfields4.ib_size = ib_size_in_dwords;
+	packet->bitfields4.chain = chain ? 1 : 0;
+	packet->bitfields4.offload_polling = 0;
+	packet->bitfields4.valid = 1;
+	packet->bitfields4.process_cnt = concurrent_proc_cnt;
+	packet->ordinal2 = lower_32_bits(ib);
+	packet->bitfields3.ib_base_hi = upper_32_bits(ib);
+
+	return 0;
+}
+
+static int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
+				struct scheduling_resources *res)
+{
+	struct pm4_mes_set_resources *packet;
+
+	packet = (struct pm4_mes_set_resources *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_set_resources));
+
+	packet->header.u32All = build_pm4_header(IT_SET_RESOURCES,
+					sizeof(struct pm4_mes_set_resources));
+
+	packet->bitfields2.queue_type =
+			queue_type__mes_set_resources__hsa_interface_queue_hiq;
+	packet->bitfields2.vmid_mask = res->vmid_mask;
+	packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100;
+	packet->bitfields7.oac_mask = res->oac_mask;
+	packet->bitfields8.gds_heap_base = res->gds_heap_base;
+	packet->bitfields8.gds_heap_size = res->gds_heap_size;
+
+	packet->gws_mask_lo = lower_32_bits(res->gws_mask);
+	packet->gws_mask_hi = upper_32_bits(res->gws_mask);
+
+	packet->queue_mask_lo = lower_32_bits(res->queue_mask);
+	packet->queue_mask_hi = upper_32_bits(res->queue_mask);
+
+	return 0;
+}
+
+static int pm_map_queues_vi(struct packet_manager *pm, uint32_t *buffer,
+		struct queue *q, bool is_static)
+{
+	struct pm4_mes_map_queues *packet;
+	bool use_static = is_static;
+
+	packet = (struct pm4_mes_map_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_map_queues));
+
+	packet->header.u32All = build_pm4_header(IT_MAP_QUEUES,
+					sizeof(struct pm4_mes_map_queues));
+	packet->bitfields2.alloc_format =
+		alloc_format__mes_map_queues__one_per_pipe_vi;
+	packet->bitfields2.num_queues = 1;
+	packet->bitfields2.queue_sel =
+		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi;
+
+	packet->bitfields2.engine_sel =
+		engine_sel__mes_map_queues__compute_vi;
+	packet->bitfields2.queue_type =
+		queue_type__mes_map_queues__normal_compute_vi;
+
+	switch (q->properties.type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+		if (use_static)
+			packet->bitfields2.queue_type =
+		queue_type__mes_map_queues__normal_latency_static_queue_vi;
+		break;
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.queue_type =
+			queue_type__mes_map_queues__debug_interface_queue_vi;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel = q->properties.sdma_engine_id +
+				engine_sel__mes_map_queues__sdma0_vi;
+		use_static = false; /* no static queues under SDMA */
+		break;
+	default:
+		WARN(1, "queue type %d", q->properties.type);
+		return -EINVAL;
+	}
+	packet->bitfields3.doorbell_offset =
+			q->properties.doorbell_off;
+
+	packet->mqd_addr_lo =
+			lower_32_bits(q->gart_mqd_addr);
+
+	packet->mqd_addr_hi =
+			upper_32_bits(q->gart_mqd_addr);
+
+	packet->wptr_addr_lo =
+			lower_32_bits((uint64_t)q->properties.write_ptr);
+
+	packet->wptr_addr_hi =
+			upper_32_bits((uint64_t)q->properties.write_ptr);
+
+	return 0;
+}
+
+static int pm_unmap_queues_vi(struct packet_manager *pm, uint32_t *buffer,
+			enum kfd_queue_type type,
+			enum kfd_unmap_queues_filter filter,
+			uint32_t filter_param, bool reset,
+			unsigned int sdma_engine)
+{
+	struct pm4_mes_unmap_queues *packet;
+
+	packet = (struct pm4_mes_unmap_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_unmap_queues));
+
+	packet->header.u32All = build_pm4_header(IT_UNMAP_QUEUES,
+					sizeof(struct pm4_mes_unmap_queues));
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
+		break;
+	default:
+		WARN(1, "queue type %d", type);
+		return -EINVAL;
+	}
+
+	if (reset)
+		packet->bitfields2.action =
+			action__mes_unmap_queues__reset_queues;
+	else
+		packet->bitfields2.action =
+			action__mes_unmap_queues__preempt_queues;
+
+	switch (filter) {
+	case KFD_UNMAP_QUEUES_FILTER_SINGLE_QUEUE:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__perform_request_on_specified_queues;
+		packet->bitfields2.num_queues = 1;
+		packet->bitfields3b.doorbell_offset0 = filter_param;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_BY_PASID:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__perform_request_on_pasid_queues;
+		packet->bitfields3a.pasid = filter_param;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__unmap_all_queues;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES:
+		/* in this case, we do not preempt static queues */
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__unmap_all_non_static_queues;
+		break;
+	default:
+		WARN(1, "filter %d", filter);
+		return -EINVAL;
+	}
+
+	return 0;
+
+}
+
+static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t fence_address,	uint32_t fence_value)
+{
+	struct pm4_mes_query_status *packet;
+
+	packet = (struct pm4_mes_query_status *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_query_status));
+
+	packet->header.u32All = build_pm4_header(IT_QUERY_STATUS,
+					sizeof(struct pm4_mes_query_status));
+
+	packet->bitfields2.context_id = 0;
+	packet->bitfields2.interrupt_sel =
+			interrupt_sel__mes_query_status__completion_status;
+	packet->bitfields2.command =
+			command__mes_query_status__fence_only_after_write_ack;
+
+	packet->addr_hi = upper_32_bits((uint64_t)fence_address);
+	packet->addr_lo = lower_32_bits((uint64_t)fence_address);
+	packet->data_hi = upper_32_bits((uint64_t)fence_value);
+	packet->data_lo = lower_32_bits((uint64_t)fence_value);
+
+	return 0;
+}
+
+static int pm_release_mem_vi(uint64_t gpu_addr, uint32_t *buffer)
+{
+	struct pm4_mec_release_mem *packet;
+
+	packet = (struct pm4_mec_release_mem *)buffer;
+	memset(buffer, 0, sizeof(*packet));
+
+	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
+						 sizeof(*packet));
+
+	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
+	packet->bitfields2.event_index = event_index___release_mem__end_of_pipe;
+	packet->bitfields2.tcl1_action_ena = 1;
+	packet->bitfields2.tc_action_ena = 1;
+	packet->bitfields2.cache_policy = cache_policy___release_mem__lru;
+	packet->bitfields2.atc = 0;
+
+	packet->bitfields3.data_sel = data_sel___release_mem__send_32_bit_low;
+	packet->bitfields3.int_sel =
+		int_sel___release_mem__send_interrupt_after_write_confirm;
+
+	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
+	packet->address_hi = upper_32_bits(gpu_addr);
+
+	packet->data_lo = 0;
+
+	return 0;
+}
+
+const struct packet_manager_funcs kfd_vi_pm_funcs = {
+	.map_process		= pm_map_process_vi,
+	.runlist		= pm_runlist_vi,
+	.set_resources		= pm_set_resources_vi,
+	.map_queues		= pm_map_queues_vi,
+	.unmap_queues		= pm_unmap_queues_vi,
+	.query_status		= pm_query_status_vi,
+	.release_mem		= pm_release_mem_vi,
+	.map_process_size	= sizeof(struct pm4_mes_map_process),
+	.runlist_size		= sizeof(struct pm4_mes_runlist),
+	.set_resources_size	= sizeof(struct pm4_mes_set_resources),
+	.map_queues_size	= sizeof(struct pm4_mes_map_queues),
+	.unmap_queues_size	= sizeof(struct pm4_mes_unmap_queues),
+	.query_status_size	= sizeof(struct pm4_mes_query_status),
+	.release_mem_size	= sizeof(struct pm4_mec_release_mem)
+};
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 89ba4c6..860ff24 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -26,8 +26,6 @@
 #include "kfd_device_queue_manager.h"
 #include "kfd_kernel_queue.h"
 #include "kfd_priv.h"
-#include "kfd_pm4_headers_vi.h"
-#include "kfd_pm4_opcodes.h"
 
 static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes,
 				unsigned int buffer_size_bytes)
@@ -39,18 +37,6 @@ static inline void inc_wptr(unsigned int *wptr, unsigned int increment_bytes,
 	*wptr = temp;
 }
 
-static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
-{
-	union PM4_MES_TYPE_3_HEADER header;
-
-	header.u32All = 0;
-	header.opcode = opcode;
-	header.count = packet_size / 4 - 2;
-	header.type = PM4_TYPE_3;
-
-	return header.u32All;
-}
-
 static void pm_calc_rlib_size(struct packet_manager *pm,
 				unsigned int *rlib_size,
 				bool *over_subscription)
@@ -80,9 +66,9 @@ static void pm_calc_rlib_size(struct packet_manager *pm,
 		pr_debug("Over subscribed runlist\n");
 	}
 
-	map_queue_size = sizeof(struct pm4_mes_map_queues);
+	map_queue_size = pm->pmf->map_queues_size;
 	/* calculate run list ib allocation size */
-	*rlib_size = process_count * sizeof(struct pm4_mes_map_process) +
+	*rlib_size = process_count * pm->pmf->map_process_size +
 		     queue_count * map_queue_size;
 
 	/*
@@ -90,7 +76,7 @@ static void pm_calc_rlib_size(struct packet_manager *pm,
 	 * when over subscription
 	 */
 	if (*over_subscription)
-		*rlib_size += sizeof(struct pm4_mes_runlist);
+		*rlib_size += pm->pmf->runlist_size;
 
 	pr_debug("runlist ib size %d\n", *rlib_size);
 }
@@ -124,137 +110,6 @@ static int pm_allocate_runlist_ib(struct packet_manager *pm,
 	return retval;
 }
 
-static int pm_create_runlist(struct packet_manager *pm, uint32_t *buffer,
-			uint64_t ib, size_t ib_size_in_dwords, bool chain)
-{
-	struct pm4_mes_runlist *packet;
-	int concurrent_proc_cnt = 0;
-	struct kfd_dev *kfd = pm->dqm->dev;
-
-	if (WARN_ON(!ib))
-		return -EFAULT;
-
-	/* Determine the number of processes to map together to HW:
-	 * it can not exceed the number of VMIDs available to the
-	 * scheduler, and it is determined by the smaller of the number
-	 * of processes in the runlist and kfd module parameter
-	 * hws_max_conc_proc.
-	 * Note: the arbitration between the number of VMIDs and
-	 * hws_max_conc_proc has been done in
-	 * kgd2kfd_device_init().
-	 */
-	concurrent_proc_cnt = min(pm->dqm->processes_count,
-			kfd->max_proc_per_quantum);
-
-	packet = (struct pm4_mes_runlist *)buffer;
-
-	memset(buffer, 0, sizeof(struct pm4_mes_runlist));
-	packet->header.u32All = build_pm4_header(IT_RUN_LIST,
-						sizeof(struct pm4_mes_runlist));
-
-	packet->bitfields4.ib_size = ib_size_in_dwords;
-	packet->bitfields4.chain = chain ? 1 : 0;
-	packet->bitfields4.offload_polling = 0;
-	packet->bitfields4.valid = 1;
-	packet->bitfields4.process_cnt = concurrent_proc_cnt;
-	packet->ordinal2 = lower_32_bits(ib);
-	packet->bitfields3.ib_base_hi = upper_32_bits(ib);
-
-	return 0;
-}
-
-static int pm_create_map_process(struct packet_manager *pm, uint32_t *buffer,
-				struct qcm_process_device *qpd)
-{
-	struct pm4_mes_map_process *packet;
-
-	packet = (struct pm4_mes_map_process *)buffer;
-
-	memset(buffer, 0, sizeof(struct pm4_mes_map_process));
-
-	packet->header.u32All = build_pm4_header(IT_MAP_PROCESS,
-					sizeof(struct pm4_mes_map_process));
-	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
-	packet->bitfields2.process_quantum = 1;
-	packet->bitfields2.pasid = qpd->pqm->process->pasid;
-	packet->bitfields3.page_table_base = qpd->page_table_base;
-	packet->bitfields10.gds_size = qpd->gds_size;
-	packet->bitfields10.num_gws = qpd->num_gws;
-	packet->bitfields10.num_oac = qpd->num_oac;
-	packet->bitfields10.num_queues = (qpd->is_debug) ? 0 : qpd->queue_count;
-
-	packet->sh_mem_config = qpd->sh_mem_config;
-	packet->sh_mem_bases = qpd->sh_mem_bases;
-	packet->sh_mem_ape1_base = qpd->sh_mem_ape1_base;
-	packet->sh_mem_ape1_limit = qpd->sh_mem_ape1_limit;
-
-	packet->sh_hidden_private_base_vmid = qpd->sh_hidden_private_base;
-
-	packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area);
-	packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area);
-
-	return 0;
-}
-
-static int pm_create_map_queue(struct packet_manager *pm, uint32_t *buffer,
-		struct queue *q, bool is_static)
-{
-	struct pm4_mes_map_queues *packet;
-	bool use_static = is_static;
-
-	packet = (struct pm4_mes_map_queues *)buffer;
-	memset(buffer, 0, sizeof(struct pm4_mes_map_queues));
-
-	packet->header.u32All = build_pm4_header(IT_MAP_QUEUES,
-						sizeof(struct pm4_mes_map_queues));
-	packet->bitfields2.alloc_format =
-		alloc_format__mes_map_queues__one_per_pipe_vi;
-	packet->bitfields2.num_queues = 1;
-	packet->bitfields2.queue_sel =
-		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi;
-
-	packet->bitfields2.engine_sel =
-		engine_sel__mes_map_queues__compute_vi;
-	packet->bitfields2.queue_type =
-		queue_type__mes_map_queues__normal_compute_vi;
-
-	switch (q->properties.type) {
-	case KFD_QUEUE_TYPE_COMPUTE:
-		if (use_static)
-			packet->bitfields2.queue_type =
-		queue_type__mes_map_queues__normal_latency_static_queue_vi;
-		break;
-	case KFD_QUEUE_TYPE_DIQ:
-		packet->bitfields2.queue_type =
-			queue_type__mes_map_queues__debug_interface_queue_vi;
-		break;
-	case KFD_QUEUE_TYPE_SDMA:
-		packet->bitfields2.engine_sel = q->properties.sdma_engine_id +
-				engine_sel__mes_map_queues__sdma0_vi;
-		use_static = false; /* no static queues under SDMA */
-		break;
-	default:
-		WARN(1, "queue type %d", q->properties.type);
-		return -EINVAL;
-	}
-	packet->bitfields3.doorbell_offset =
-			q->properties.doorbell_off;
-
-	packet->mqd_addr_lo =
-			lower_32_bits(q->gart_mqd_addr);
-
-	packet->mqd_addr_hi =
-			upper_32_bits(q->gart_mqd_addr);
-
-	packet->wptr_addr_lo =
-			lower_32_bits((uint64_t)q->properties.write_ptr);
-
-	packet->wptr_addr_hi =
-			upper_32_bits((uint64_t)q->properties.write_ptr);
-
-	return 0;
-}
-
 static int pm_create_runlist_ib(struct packet_manager *pm,
 				struct list_head *queues,
 				uint64_t *rl_gpu_addr,
@@ -292,12 +147,12 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 			return -ENOMEM;
 		}
 
-		retval = pm_create_map_process(pm, &rl_buffer[rl_wptr], qpd);
+		retval = pm->pmf->map_process(pm, &rl_buffer[rl_wptr], qpd);
 		if (retval)
 			return retval;
 
 		proccesses_mapped++;
-		inc_wptr(&rl_wptr, sizeof(struct pm4_mes_map_process),
+		inc_wptr(&rl_wptr, pm->pmf->map_process_size,
 				alloc_size_bytes);
 
 		list_for_each_entry(kq, &qpd->priv_queue_list, list) {
@@ -307,7 +162,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 			pr_debug("static_queue, mapping kernel q %d, is debug status %d\n",
 				kq->queue->queue, qpd->is_debug);
 
-			retval = pm_create_map_queue(pm,
+			retval = pm->pmf->map_queues(pm,
 						&rl_buffer[rl_wptr],
 						kq->queue,
 						qpd->is_debug);
@@ -315,7 +170,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 				return retval;
 
 			inc_wptr(&rl_wptr,
-				sizeof(struct pm4_mes_map_queues),
+				pm->pmf->map_queues_size,
 				alloc_size_bytes);
 		}
 
@@ -326,7 +181,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 			pr_debug("static_queue, mapping user queue %d, is debug status %d\n",
 				q->queue, qpd->is_debug);
 
-			retval = pm_create_map_queue(pm,
+			retval = pm->pmf->map_queues(pm,
 						&rl_buffer[rl_wptr],
 						q,
 						qpd->is_debug);
@@ -335,7 +190,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 				return retval;
 
 			inc_wptr(&rl_wptr,
-				sizeof(struct pm4_mes_map_queues),
+				pm->pmf->map_queues_size,
 				alloc_size_bytes);
 		}
 	}
@@ -343,7 +198,7 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 	pr_debug("Finished map process and queues to runlist\n");
 
 	if (is_over_subscription)
-		retval = pm_create_runlist(pm, &rl_buffer[rl_wptr],
+		retval = pm->pmf->runlist(pm, &rl_buffer[rl_wptr],
 					*rl_gpu_addr,
 					alloc_size_bytes / sizeof(uint32_t),
 					true);
@@ -355,45 +210,25 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 	return retval;
 }
 
-/* pm_create_release_mem - Create a RELEASE_MEM packet and return the size
- *     of this packet
- *     @gpu_addr - GPU address of the packet. It's a virtual address.
- *     @buffer - buffer to fill up with the packet. It's a CPU kernel pointer
- *     Return - length of the packet
- */
-uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer)
-{
-	struct pm4_mec_release_mem *packet;
-
-	WARN_ON(!buffer);
-
-	packet = (struct pm4_mec_release_mem *)buffer;
-	memset(buffer, 0, sizeof(*packet));
-
-	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
-						 sizeof(*packet));
-
-	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
-	packet->bitfields2.event_index = event_index___release_mem__end_of_pipe;
-	packet->bitfields2.tcl1_action_ena = 1;
-	packet->bitfields2.tc_action_ena = 1;
-	packet->bitfields2.cache_policy = cache_policy___release_mem__lru;
-	packet->bitfields2.atc = 0;
-
-	packet->bitfields3.data_sel = data_sel___release_mem__send_32_bit_low;
-	packet->bitfields3.int_sel =
-		int_sel___release_mem__send_interrupt_after_write_confirm;
-
-	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
-	packet->address_hi = upper_32_bits(gpu_addr);
-
-	packet->data_lo = 0;
-
-	return sizeof(*packet) / sizeof(unsigned int);
-}
-
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 {
+	switch (dqm->dev->device_info->asic_family) {
+	case CHIP_KAVERI:
+	case CHIP_HAWAII:
+		/* PM4 packet structures on CIK are the same as on VI */
+	case CHIP_CARRIZO:
+	case CHIP_TONGA:
+	case CHIP_FIJI:
+	case CHIP_POLARIS10:
+	case CHIP_POLARIS11:
+		pm->pmf = &kfd_vi_pm_funcs;
+		break;
+	default:
+		WARN(1, "Unexpected ASIC family %u",
+		     dqm->dev->device_info->asic_family);
+		return -EINVAL;
+	}
+
 	pm->dqm = dqm;
 	mutex_init(&pm->lock);
 	pm->priv_queue = kernel_queue_init(dqm->dev, KFD_QUEUE_TYPE_HIQ);
@@ -415,38 +250,25 @@ void pm_uninit(struct packet_manager *pm)
 int pm_send_set_resources(struct packet_manager *pm,
 				struct scheduling_resources *res)
 {
-	struct pm4_mes_set_resources *packet;
+	uint32_t *buffer, size;
 	int retval = 0;
 
+	size = pm->pmf->set_resources_size;
 	mutex_lock(&pm->lock);
 	pm->priv_queue->ops.acquire_packet_buffer(pm->priv_queue,
-					sizeof(*packet) / sizeof(uint32_t),
-					(unsigned int **)&packet);
-	if (!packet) {
+					size / sizeof(uint32_t),
+					(unsigned int **)&buffer);
+	if (!buffer) {
 		pr_err("Failed to allocate buffer on kernel queue\n");
 		retval = -ENOMEM;
 		goto out;
 	}
 
-	memset(packet, 0, sizeof(struct pm4_mes_set_resources));
-	packet->header.u32All = build_pm4_header(IT_SET_RESOURCES,
-					sizeof(struct pm4_mes_set_resources));
-
-	packet->bitfields2.queue_type =
-			queue_type__mes_set_resources__hsa_interface_queue_hiq;
-	packet->bitfields2.vmid_mask = res->vmid_mask;
-	packet->bitfields2.unmap_latency = KFD_UNMAP_LATENCY_MS / 100;
-	packet->bitfields7.oac_mask = res->oac_mask;
-	packet->bitfields8.gds_heap_base = res->gds_heap_base;
-	packet->bitfields8.gds_heap_size = res->gds_heap_size;
-
-	packet->gws_mask_lo = lower_32_bits(res->gws_mask);
-	packet->gws_mask_hi = upper_32_bits(res->gws_mask);
-
-	packet->queue_mask_lo = lower_32_bits(res->queue_mask);
-	packet->queue_mask_hi = upper_32_bits(res->queue_mask);
-
-	pm->priv_queue->ops.submit_packet(pm->priv_queue);
+	retval = pm->pmf->set_resources(pm, buffer, res);
+	if (!retval)
+		pm->priv_queue->ops.submit_packet(pm->priv_queue);
+	else
+		pm->priv_queue->ops.rollback_packet(pm->priv_queue);
 
 out:
 	mutex_unlock(&pm->lock);
@@ -468,7 +290,7 @@ int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)
 
 	pr_debug("runlist IB address: 0x%llX\n", rl_gpu_ib_addr);
 
-	packet_size_dwords = sizeof(struct pm4_mes_runlist) / sizeof(uint32_t);
+	packet_size_dwords = pm->pmf->runlist_size / sizeof(uint32_t);
 	mutex_lock(&pm->lock);
 
 	retval = pm->priv_queue->ops.acquire_packet_buffer(pm->priv_queue,
@@ -476,7 +298,7 @@ int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)
 	if (retval)
 		goto fail_acquire_packet_buffer;
 
-	retval = pm_create_runlist(pm, rl_buffer, rl_gpu_ib_addr,
+	retval = pm->pmf->runlist(pm, rl_buffer, rl_gpu_ib_addr,
 					rl_ib_size / sizeof(uint32_t), false);
 	if (retval)
 		goto fail_create_runlist;
@@ -499,37 +321,29 @@ int pm_send_runlist(struct packet_manager *pm, struct list_head *dqm_queues)
 int pm_send_query_status(struct packet_manager *pm, uint64_t fence_address,
 			uint32_t fence_value)
 {
-	int retval;
-	struct pm4_mes_query_status *packet;
+	uint32_t *buffer, size;
+	int retval = 0;
 
 	if (WARN_ON(!fence_address))
 		return -EFAULT;
 
+	size = pm->pmf->query_status_size;
 	mutex_lock(&pm->lock);
-	retval = pm->priv_queue->ops.acquire_packet_buffer(
-			pm->priv_queue,
-			sizeof(struct pm4_mes_query_status) / sizeof(uint32_t),
-			(unsigned int **)&packet);
-	if (retval)
-		goto fail_acquire_packet_buffer;
-
-	packet->header.u32All = build_pm4_header(IT_QUERY_STATUS,
-					sizeof(struct pm4_mes_query_status));
-
-	packet->bitfields2.context_id = 0;
-	packet->bitfields2.interrupt_sel =
-			interrupt_sel__mes_query_status__completion_status;
-	packet->bitfields2.command =
-			command__mes_query_status__fence_only_after_write_ack;
-
-	packet->addr_hi = upper_32_bits((uint64_t)fence_address);
-	packet->addr_lo = lower_32_bits((uint64_t)fence_address);
-	packet->data_hi = upper_32_bits((uint64_t)fence_value);
-	packet->data_lo = lower_32_bits((uint64_t)fence_value);
+	pm->priv_queue->ops.acquire_packet_buffer(pm->priv_queue,
+			size / sizeof(uint32_t), (unsigned int **)&buffer);
+	if (!buffer) {
+		pr_err("Failed to allocate buffer on kernel queue\n");
+		retval = -ENOMEM;
+		goto out;
+	}
 
-	pm->priv_queue->ops.submit_packet(pm->priv_queue);
+	retval = pm->pmf->query_status(pm, buffer, fence_address, fence_value);
+	if (!retval)
+		pm->priv_queue->ops.submit_packet(pm->priv_queue);
+	else
+		pm->priv_queue->ops.rollback_packet(pm->priv_queue);
 
-fail_acquire_packet_buffer:
+out:
 	mutex_unlock(&pm->lock);
 	return retval;
 }
@@ -539,82 +353,27 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 			uint32_t filter_param, bool reset,
 			unsigned int sdma_engine)
 {
-	int retval;
-	uint32_t *buffer;
-	struct pm4_mes_unmap_queues *packet;
+	uint32_t *buffer, size;
+	int retval = 0;
 
+	size = pm->pmf->unmap_queues_size;
 	mutex_lock(&pm->lock);
-	retval = pm->priv_queue->ops.acquire_packet_buffer(
-			pm->priv_queue,
-			sizeof(struct pm4_mes_unmap_queues) / sizeof(uint32_t),
-			&buffer);
-	if (retval)
-		goto err_acquire_packet_buffer;
-
-	packet = (struct pm4_mes_unmap_queues *)buffer;
-	memset(buffer, 0, sizeof(struct pm4_mes_unmap_queues));
-	pr_debug("static_queue: unmapping queues: filter is %d , reset is %d , type is %d\n",
-		filter, reset, type);
-	packet->header.u32All = build_pm4_header(IT_UNMAP_QUEUES,
-					sizeof(struct pm4_mes_unmap_queues));
-	switch (type) {
-	case KFD_QUEUE_TYPE_COMPUTE:
-	case KFD_QUEUE_TYPE_DIQ:
-		packet->bitfields2.engine_sel =
-			engine_sel__mes_unmap_queues__compute;
-		break;
-	case KFD_QUEUE_TYPE_SDMA:
-		packet->bitfields2.engine_sel =
-			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
-		break;
-	default:
-		WARN(1, "queue type %d", type);
-		retval = -EINVAL;
-		goto err_invalid;
+	pm->priv_queue->ops.acquire_packet_buffer(pm->priv_queue,
+			size / sizeof(uint32_t), (unsigned int **)&buffer);
+	if (!buffer) {
+		pr_err("Failed to allocate buffer on kernel queue\n");
+		retval = -ENOMEM;
+		goto out;
 	}
 
-	if (reset)
-		packet->bitfields2.action =
-				action__mes_unmap_queues__reset_queues;
+	retval = pm->pmf->unmap_queues(pm, buffer, type, filter, filter_param,
+				       reset, sdma_engine);
+	if (!retval)
+		pm->priv_queue->ops.submit_packet(pm->priv_queue);
 	else
-		packet->bitfields2.action =
-				action__mes_unmap_queues__preempt_queues;
-
-	switch (filter) {
-	case KFD_UNMAP_QUEUES_FILTER_SINGLE_QUEUE:
-		packet->bitfields2.queue_sel =
-				queue_sel__mes_unmap_queues__perform_request_on_specified_queues;
-		packet->bitfields2.num_queues = 1;
-		packet->bitfields3b.doorbell_offset0 = filter_param;
-		break;
-	case KFD_UNMAP_QUEUES_FILTER_BY_PASID:
-		packet->bitfields2.queue_sel =
-				queue_sel__mes_unmap_queues__perform_request_on_pasid_queues;
-		packet->bitfields3a.pasid = filter_param;
-		break;
-	case KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES:
-		packet->bitfields2.queue_sel =
-				queue_sel__mes_unmap_queues__unmap_all_queues;
-		break;
-	case KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES:
-		/* in this case, we do not preempt static queues */
-		packet->bitfields2.queue_sel =
-				queue_sel__mes_unmap_queues__unmap_all_non_static_queues;
-		break;
-	default:
-		WARN(1, "filter %d", filter);
-		retval = -EINVAL;
-		goto err_invalid;
-	}
+		pm->priv_queue->ops.rollback_packet(pm->priv_queue);
 
-	pm->priv_queue->ops.submit_packet(pm->priv_queue);
-
-	mutex_unlock(&pm->lock);
-	return 0;
-
-err_invalid:
-	pm->priv_queue->ops.rollback_packet(pm->priv_queue);
-err_acquire_packet_buffer:
+out:
 	mutex_unlock(&pm->lock);
 	return retval;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index ddb3c8c..873a8fb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -866,8 +866,41 @@ struct packet_manager {
 	bool allocated;
 	struct kfd_mem_obj *ib_buffer_obj;
 	unsigned int ib_size_bytes;
+
+	const struct packet_manager_funcs *pmf;
+};
+
+struct packet_manager_funcs {
+	/* Support ASIC-specific packet formats for PM4 packets */
+	int (*map_process)(struct packet_manager *pm, uint32_t *buffer,
+			struct qcm_process_device *qpd);
+	int (*runlist)(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t ib, size_t ib_size_in_dwords, bool chain);
+	int (*set_resources)(struct packet_manager *pm, uint32_t *buffer,
+			struct scheduling_resources *res);
+	int (*map_queues)(struct packet_manager *pm, uint32_t *buffer,
+			struct queue *q, bool is_static);
+	int (*unmap_queues)(struct packet_manager *pm, uint32_t *buffer,
+			enum kfd_queue_type type,
+			enum kfd_unmap_queues_filter mode,
+			uint32_t filter_param, bool reset,
+			unsigned int sdma_engine);
+	int (*query_status)(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t fence_address,	uint32_t fence_value);
+	int (*release_mem)(uint64_t gpu_addr, uint32_t *buffer);
+
+	/* Packet sizes */
+	int map_process_size;
+	int runlist_size;
+	int set_resources_size;
+	int map_queues_size;
+	int unmap_queues_size;
+	int query_status_size;
+	int release_mem_size;
 };
 
+extern const struct packet_manager_funcs kfd_vi_pm_funcs;
+
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm);
 void pm_uninit(struct packet_manager *pm);
 int pm_send_set_resources(struct packet_manager *pm,
@@ -883,8 +916,6 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 
 void pm_release_ib(struct packet_manager *pm);
 
-uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer);
-
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager Felix Kuehling
                     ` (13 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Shaoyun Liu

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Makefile              |   7 +-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c | 331 +++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c |  18 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c  |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h  | 583 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h            |   6 +
 6 files changed, 937 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index 0d02422..52b3c1b 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -31,9 +31,10 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
 		kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
 		kfd_kernel_queue.o kfd_kernel_queue_cik.o \
-		kfd_kernel_queue_vi.o kfd_packet_manager.o \
-		kfd_process_queue_manager.o kfd_device_queue_manager.o \
-		kfd_device_queue_manager_cik.o kfd_device_queue_manager_vi.o \
+		kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
+		kfd_packet_manager.o kfd_process_queue_manager.o \
+		kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
+		kfd_device_queue_manager_vi.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
 		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
new file mode 100644
index 0000000..ece7d59
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -0,0 +1,331 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "kfd_kernel_queue.h"
+#include "kfd_device_queue_manager.h"
+#include "kfd_pm4_headers_ai.h"
+#include "kfd_pm4_opcodes.h"
+
+static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
+			enum kfd_queue_type type, unsigned int queue_size);
+static void uninitialize_v9(struct kernel_queue *kq);
+
+void kernel_queue_init_v9(struct kernel_queue_ops *ops)
+{
+	ops->initialize = initialize_v9;
+	ops->uninitialize = uninitialize_v9;
+}
+
+static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
+			enum kfd_queue_type type, unsigned int queue_size)
+{
+	int retval;
+
+	retval = kfd_gtt_sa_allocate(dev, PAGE_SIZE, &kq->eop_mem);
+	if (retval)
+		return false;
+
+	kq->eop_gpu_addr = kq->eop_mem->gpu_addr;
+	kq->eop_kernel_addr = kq->eop_mem->cpu_ptr;
+
+	memset(kq->eop_kernel_addr, 0, PAGE_SIZE);
+
+	return true;
+}
+
+static void uninitialize_v9(struct kernel_queue *kq)
+{
+	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
+}
+
+static int pm_map_process_v9(struct packet_manager *pm,
+		uint32_t *buffer, struct qcm_process_device *qpd)
+{
+	struct pm4_mes_map_process *packet;
+	uint64_t vm_page_table_base_addr =
+		(uint64_t)(qpd->page_table_base) << 12;
+
+	packet = (struct pm4_mes_map_process *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_map_process));
+
+	packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS,
+					sizeof(struct pm4_mes_map_process));
+	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
+	packet->bitfields2.process_quantum = 1;
+	packet->bitfields2.pasid = qpd->pqm->process->pasid;
+	packet->bitfields14.gds_size = qpd->gds_size;
+	packet->bitfields14.num_gws = qpd->num_gws;
+	packet->bitfields14.num_oac = qpd->num_oac;
+	packet->bitfields14.sdma_enable = 1;
+	packet->bitfields14.num_queues = (qpd->is_debug) ? 0 : qpd->queue_count;
+
+	packet->sh_mem_config = qpd->sh_mem_config;
+	packet->sh_mem_bases = qpd->sh_mem_bases;
+	packet->sq_shader_tba_lo = lower_32_bits(qpd->tba_addr >> 8);
+	packet->sq_shader_tba_hi = upper_32_bits(qpd->tba_addr >> 8);
+	packet->sq_shader_tma_lo = lower_32_bits(qpd->tma_addr >> 8);
+	packet->sq_shader_tma_hi = upper_32_bits(qpd->tma_addr >> 8);
+
+	packet->gds_addr_lo = lower_32_bits(qpd->gds_context_area);
+	packet->gds_addr_hi = upper_32_bits(qpd->gds_context_area);
+
+	packet->vm_context_page_table_base_addr_lo32 =
+			lower_32_bits(vm_page_table_base_addr);
+	packet->vm_context_page_table_base_addr_hi32 =
+			upper_32_bits(vm_page_table_base_addr);
+
+	return 0;
+}
+
+static int pm_runlist_v9(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t ib, size_t ib_size_in_dwords, bool chain)
+{
+	struct pm4_mes_runlist *packet;
+
+	int concurrent_proc_cnt = 0;
+	struct kfd_dev *kfd = pm->dqm->dev;
+
+	/* Determine the number of processes to map together to HW:
+	 * it can not exceed the number of VMIDs available to the
+	 * scheduler, and it is determined by the smaller of the number
+	 * of processes in the runlist and kfd module parameter
+	 * hws_max_conc_proc.
+	 * Note: the arbitration between the number of VMIDs and
+	 * hws_max_conc_proc has been done in
+	 * kgd2kfd_device_init().
+	 */
+	concurrent_proc_cnt = min(pm->dqm->processes_count,
+			kfd->max_proc_per_quantum);
+
+	packet = (struct pm4_mes_runlist *)buffer;
+
+	memset(buffer, 0, sizeof(struct pm4_mes_runlist));
+	packet->header.u32All = pm_build_pm4_header(IT_RUN_LIST,
+						sizeof(struct pm4_mes_runlist));
+
+	packet->bitfields4.ib_size = ib_size_in_dwords;
+	packet->bitfields4.chain = chain ? 1 : 0;
+	packet->bitfields4.offload_polling = 0;
+	packet->bitfields4.valid = 1;
+	packet->bitfields4.process_cnt = concurrent_proc_cnt;
+	packet->ordinal2 = lower_32_bits(ib);
+	packet->ib_base_hi = upper_32_bits(ib);
+
+	return 0;
+}
+
+static int pm_map_queues_v9(struct packet_manager *pm, uint32_t *buffer,
+		struct queue *q, bool is_static)
+{
+	struct pm4_mes_map_queues *packet;
+	bool use_static = is_static;
+
+	packet = (struct pm4_mes_map_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_map_queues));
+
+	packet->header.u32All = pm_build_pm4_header(IT_MAP_QUEUES,
+					sizeof(struct pm4_mes_map_queues));
+	packet->bitfields2.alloc_format =
+		alloc_format__mes_map_queues__one_per_pipe_vi;
+	packet->bitfields2.num_queues = 1;
+	packet->bitfields2.queue_sel =
+		queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi;
+
+	packet->bitfields2.engine_sel =
+		engine_sel__mes_map_queues__compute_vi;
+	packet->bitfields2.queue_type =
+		queue_type__mes_map_queues__normal_compute_vi;
+
+	switch (q->properties.type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+		if (use_static)
+			packet->bitfields2.queue_type =
+		queue_type__mes_map_queues__normal_latency_static_queue_vi;
+		break;
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.queue_type =
+			queue_type__mes_map_queues__debug_interface_queue_vi;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel = q->properties.sdma_engine_id +
+				engine_sel__mes_map_queues__sdma0_vi;
+		use_static = false; /* no static queues under SDMA */
+		break;
+	default:
+		WARN(1, "queue type %d", q->properties.type);
+		return -EINVAL;
+	}
+	packet->bitfields3.doorbell_offset =
+			q->properties.doorbell_off;
+
+	packet->mqd_addr_lo =
+			lower_32_bits(q->gart_mqd_addr);
+
+	packet->mqd_addr_hi =
+			upper_32_bits(q->gart_mqd_addr);
+
+	packet->wptr_addr_lo =
+			lower_32_bits((uint64_t)q->properties.write_ptr);
+
+	packet->wptr_addr_hi =
+			upper_32_bits((uint64_t)q->properties.write_ptr);
+
+	return 0;
+}
+
+static int pm_unmap_queues_v9(struct packet_manager *pm, uint32_t *buffer,
+			enum kfd_queue_type type,
+			enum kfd_unmap_queues_filter filter,
+			uint32_t filter_param, bool reset,
+			unsigned int sdma_engine)
+{
+	struct pm4_mes_unmap_queues *packet;
+
+	packet = (struct pm4_mes_unmap_queues *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_unmap_queues));
+
+	packet->header.u32All = pm_build_pm4_header(IT_UNMAP_QUEUES,
+					sizeof(struct pm4_mes_unmap_queues));
+	switch (type) {
+	case KFD_QUEUE_TYPE_COMPUTE:
+	case KFD_QUEUE_TYPE_DIQ:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__compute;
+		break;
+	case KFD_QUEUE_TYPE_SDMA:
+		packet->bitfields2.engine_sel =
+			engine_sel__mes_unmap_queues__sdma0 + sdma_engine;
+		break;
+	default:
+		WARN(1, "queue type %d", type);
+		return -EINVAL;
+	}
+
+	if (reset)
+		packet->bitfields2.action =
+			action__mes_unmap_queues__reset_queues;
+	else
+		packet->bitfields2.action =
+			action__mes_unmap_queues__preempt_queues;
+
+	switch (filter) {
+	case KFD_UNMAP_QUEUES_FILTER_SINGLE_QUEUE:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__perform_request_on_specified_queues;
+		packet->bitfields2.num_queues = 1;
+		packet->bitfields3b.doorbell_offset0 = filter_param;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_BY_PASID:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__perform_request_on_pasid_queues;
+		packet->bitfields3a.pasid = filter_param;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES:
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__unmap_all_queues;
+		break;
+	case KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES:
+		/* in this case, we do not preempt static queues */
+		packet->bitfields2.queue_sel =
+			queue_sel__mes_unmap_queues__unmap_all_non_static_queues;
+		break;
+	default:
+		WARN(1, "filter %d", filter);
+		return -EINVAL;
+	}
+
+	return 0;
+
+}
+
+static int pm_query_status_v9(struct packet_manager *pm, uint32_t *buffer,
+			uint64_t fence_address,	uint32_t fence_value)
+{
+	struct pm4_mes_query_status *packet;
+
+	packet = (struct pm4_mes_query_status *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mes_query_status));
+
+
+	packet->header.u32All = pm_build_pm4_header(IT_QUERY_STATUS,
+					sizeof(struct pm4_mes_query_status));
+
+	packet->bitfields2.context_id = 0;
+	packet->bitfields2.interrupt_sel =
+			interrupt_sel__mes_query_status__completion_status;
+	packet->bitfields2.command =
+			command__mes_query_status__fence_only_after_write_ack;
+
+	packet->addr_hi = upper_32_bits((uint64_t)fence_address);
+	packet->addr_lo = lower_32_bits((uint64_t)fence_address);
+	packet->data_hi = upper_32_bits((uint64_t)fence_value);
+	packet->data_lo = lower_32_bits((uint64_t)fence_value);
+
+	return 0;
+}
+
+
+static int pm_release_mem_v9(uint64_t gpu_addr, uint32_t *buffer)
+{
+	struct pm4_mec_release_mem *packet;
+
+	packet = (struct pm4_mec_release_mem *)buffer;
+	memset(buffer, 0, sizeof(struct pm4_mec_release_mem));
+
+	packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
+					sizeof(struct pm4_mec_release_mem));
+
+	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
+	packet->bitfields2.event_index = event_index__mec_release_mem__end_of_pipe;
+	packet->bitfields2.tcl1_action_ena = 1;
+	packet->bitfields2.tc_action_ena = 1;
+	packet->bitfields2.cache_policy = cache_policy__mec_release_mem__lru;
+
+	packet->bitfields3.data_sel = data_sel__mec_release_mem__send_32_bit_low;
+	packet->bitfields3.int_sel =
+		int_sel__mec_release_mem__send_interrupt_after_write_confirm;
+
+	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
+	packet->address_hi = upper_32_bits(gpu_addr);
+
+	packet->data_lo = 0;
+
+	return 0;
+}
+
+const struct packet_manager_funcs kfd_v9_pm_funcs = {
+	.map_process		= pm_map_process_v9,
+	.runlist		= pm_runlist_v9,
+	.set_resources		= pm_set_resources_vi,
+	.map_queues		= pm_map_queues_v9,
+	.unmap_queues		= pm_unmap_queues_v9,
+	.query_status		= pm_query_status_v9,
+	.release_mem		= pm_release_mem_v9,
+	.map_process_size	= sizeof(struct pm4_mes_map_process),
+	.runlist_size		= sizeof(struct pm4_mes_runlist),
+	.set_resources_size	= sizeof(struct pm4_mes_set_resources),
+	.map_queues_size	= sizeof(struct pm4_mes_map_queues),
+	.unmap_queues_size	= sizeof(struct pm4_mes_unmap_queues),
+	.query_status_size	= sizeof(struct pm4_mes_query_status),
+	.release_mem_size	= sizeof(struct pm4_mec_release_mem)
+};
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index 7ee326f..f9019ef 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -58,7 +58,7 @@ static void uninitialize_vi(struct kernel_queue *kq)
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
 
-static unsigned int build_pm4_header(unsigned int opcode, size_t packet_size)
+unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size)
 {
 	union PM4_MES_TYPE_3_HEADER header;
 
@@ -79,7 +79,7 @@ static int pm_map_process_vi(struct packet_manager *pm, uint32_t *buffer,
 
 	memset(buffer, 0, sizeof(struct pm4_mes_map_process));
 
-	packet->header.u32All = build_pm4_header(IT_MAP_PROCESS,
+	packet->header.u32All = pm_build_pm4_header(IT_MAP_PROCESS,
 					sizeof(struct pm4_mes_map_process));
 	packet->bitfields2.diq_enable = (qpd->is_debug) ? 1 : 0;
 	packet->bitfields2.process_quantum = 1;
@@ -128,7 +128,7 @@ static int pm_runlist_vi(struct packet_manager *pm, uint32_t *buffer,
 	packet = (struct pm4_mes_runlist *)buffer;
 
 	memset(buffer, 0, sizeof(struct pm4_mes_runlist));
-	packet->header.u32All = build_pm4_header(IT_RUN_LIST,
+	packet->header.u32All = pm_build_pm4_header(IT_RUN_LIST,
 						sizeof(struct pm4_mes_runlist));
 
 	packet->bitfields4.ib_size = ib_size_in_dwords;
@@ -142,7 +142,7 @@ static int pm_runlist_vi(struct packet_manager *pm, uint32_t *buffer,
 	return 0;
 }
 
-static int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
+int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
 				struct scheduling_resources *res)
 {
 	struct pm4_mes_set_resources *packet;
@@ -150,7 +150,7 @@ static int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
 	packet = (struct pm4_mes_set_resources *)buffer;
 	memset(buffer, 0, sizeof(struct pm4_mes_set_resources));
 
-	packet->header.u32All = build_pm4_header(IT_SET_RESOURCES,
+	packet->header.u32All = pm_build_pm4_header(IT_SET_RESOURCES,
 					sizeof(struct pm4_mes_set_resources));
 
 	packet->bitfields2.queue_type =
@@ -179,7 +179,7 @@ static int pm_map_queues_vi(struct packet_manager *pm, uint32_t *buffer,
 	packet = (struct pm4_mes_map_queues *)buffer;
 	memset(buffer, 0, sizeof(struct pm4_mes_map_queues));
 
-	packet->header.u32All = build_pm4_header(IT_MAP_QUEUES,
+	packet->header.u32All = pm_build_pm4_header(IT_MAP_QUEUES,
 					sizeof(struct pm4_mes_map_queues));
 	packet->bitfields2.alloc_format =
 		alloc_format__mes_map_queues__one_per_pipe_vi;
@@ -240,7 +240,7 @@ static int pm_unmap_queues_vi(struct packet_manager *pm, uint32_t *buffer,
 	packet = (struct pm4_mes_unmap_queues *)buffer;
 	memset(buffer, 0, sizeof(struct pm4_mes_unmap_queues));
 
-	packet->header.u32All = build_pm4_header(IT_UNMAP_QUEUES,
+	packet->header.u32All = pm_build_pm4_header(IT_UNMAP_QUEUES,
 					sizeof(struct pm4_mes_unmap_queues));
 	switch (type) {
 	case KFD_QUEUE_TYPE_COMPUTE:
@@ -302,7 +302,7 @@ static int pm_query_status_vi(struct packet_manager *pm, uint32_t *buffer,
 	packet = (struct pm4_mes_query_status *)buffer;
 	memset(buffer, 0, sizeof(struct pm4_mes_query_status));
 
-	packet->header.u32All = build_pm4_header(IT_QUERY_STATUS,
+	packet->header.u32All = pm_build_pm4_header(IT_QUERY_STATUS,
 					sizeof(struct pm4_mes_query_status));
 
 	packet->bitfields2.context_id = 0;
@@ -326,7 +326,7 @@ static int pm_release_mem_vi(uint64_t gpu_addr, uint32_t *buffer)
 	packet = (struct pm4_mec_release_mem *)buffer;
 	memset(buffer, 0, sizeof(*packet));
 
-	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
+	packet->header.u32All = pm_build_pm4_header(IT_RELEASE_MEM,
 						 sizeof(*packet));
 
 	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 860ff24..91f0350 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -223,6 +223,10 @@ int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 	case CHIP_POLARIS11:
 		pm->pmf = &kfd_vi_pm_funcs;
 		break;
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		pm->pmf = &kfd_v9_pm_funcs;
+		break;
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dqm->dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
new file mode 100644
index 0000000..ddad9be
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
@@ -0,0 +1,583 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#ifndef F32_MES_PM4_PACKETS_H
+#define F32_MES_PM4_PACKETS_H
+
+#ifndef PM4_MES_HEADER_DEFINED
+#define PM4_MES_HEADER_DEFINED
+union PM4_MES_TYPE_3_HEADER {
+	struct {
+		uint32_t reserved1 : 8; /* < reserved */
+		uint32_t opcode    : 8; /* < IT opcode */
+		uint32_t count     : 14;/* < number of DWORDs - 1 in the
+					 *   information body.
+					 */
+		uint32_t type      : 2; /* < packet identifier.
+					 *   It should be 3 for type 3 packets
+					 */
+	};
+	uint32_t u32All;
+};
+#endif /* PM4_MES_HEADER_DEFINED */
+
+/*--------------------MES_SET_RESOURCES--------------------*/
+
+#ifndef PM4_MES_SET_RESOURCES_DEFINED
+#define PM4_MES_SET_RESOURCES_DEFINED
+enum mes_set_resources_queue_type_enum {
+	queue_type__mes_set_resources__kernel_interface_queue_kiq = 0,
+	queue_type__mes_set_resources__hsa_interface_queue_hiq = 1,
+	queue_type__mes_set_resources__hsa_debug_interface_queue = 4
+};
+
+
+struct pm4_mes_set_resources {
+	union {
+		union PM4_MES_TYPE_3_HEADER	header;		/* header */
+		uint32_t			ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t vmid_mask:16;
+			uint32_t unmap_latency:8;
+			uint32_t reserved1:5;
+			enum mes_set_resources_queue_type_enum queue_type:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	uint32_t queue_mask_lo;
+	uint32_t queue_mask_hi;
+	uint32_t gws_mask_lo;
+	uint32_t gws_mask_hi;
+
+	union {
+		struct {
+			uint32_t oac_mask:16;
+			uint32_t reserved2:16;
+		} bitfields7;
+		uint32_t ordinal7;
+	};
+
+	union {
+		struct {
+		uint32_t gds_heap_base:6;
+		uint32_t reserved3:5;
+		uint32_t gds_heap_size:6;
+		uint32_t reserved4:15;
+		} bitfields8;
+		uint32_t ordinal8;
+	};
+
+};
+#endif
+
+/*--------------------MES_RUN_LIST--------------------*/
+
+#ifndef PM4_MES_RUN_LIST_DEFINED
+#define PM4_MES_RUN_LIST_DEFINED
+
+struct pm4_mes_runlist {
+	union {
+	    union PM4_MES_TYPE_3_HEADER   header;            /* header */
+	    uint32_t            ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t reserved1:2;
+			uint32_t ib_base_lo:30;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	uint32_t ib_base_hi;
+
+	union {
+		struct {
+			uint32_t ib_size:20;
+			uint32_t chain:1;
+			uint32_t offload_polling:1;
+			uint32_t reserved2:1;
+			uint32_t valid:1;
+			uint32_t process_cnt:4;
+			uint32_t reserved3:4;
+		} bitfields4;
+		uint32_t ordinal4;
+	};
+
+};
+#endif
+
+/*--------------------MES_MAP_PROCESS--------------------*/
+
+#ifndef PM4_MES_MAP_PROCESS_DEFINED
+#define PM4_MES_MAP_PROCESS_DEFINED
+
+struct pm4_mes_map_process {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved1:8;
+			uint32_t diq_enable:1;
+			uint32_t process_quantum:7;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	uint32_t vm_context_page_table_base_addr_lo32;
+
+	uint32_t vm_context_page_table_base_addr_hi32;
+
+	uint32_t sh_mem_bases;
+
+	uint32_t sh_mem_config;
+
+	uint32_t sq_shader_tba_lo;
+
+	uint32_t sq_shader_tba_hi;
+
+	uint32_t sq_shader_tma_lo;
+
+	uint32_t sq_shader_tma_hi;
+
+	uint32_t reserved6;
+
+	uint32_t gds_addr_lo;
+
+	uint32_t gds_addr_hi;
+
+	union {
+		struct {
+			uint32_t num_gws:6;
+			uint32_t reserved7:1;
+			uint32_t sdma_enable:1;
+			uint32_t num_oac:4;
+			uint32_t reserved8:4;
+			uint32_t gds_size:6;
+			uint32_t num_queues:10;
+		} bitfields14;
+		uint32_t ordinal14;
+	};
+
+	uint32_t completion_signal_lo;
+
+	uint32_t completion_signal_hi;
+
+};
+
+#endif
+
+/*--------------------MES_MAP_PROCESS_VM--------------------*/
+
+#ifndef PM4_MES_MAP_PROCESS_VM_DEFINED
+#define PM4_MES_MAP_PROCESS_VM_DEFINED
+
+struct PM4_MES_MAP_PROCESS_VM {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;	/* header */
+		uint32_t ordinal1;
+	};
+
+	uint32_t reserved1;
+
+	uint32_t vm_context_cntl;
+
+	uint32_t reserved2;
+
+	uint32_t vm_context_page_table_end_addr_lo32;
+
+	uint32_t vm_context_page_table_end_addr_hi32;
+
+	uint32_t vm_context_page_table_start_addr_lo32;
+
+	uint32_t vm_context_page_table_start_addr_hi32;
+
+	uint32_t reserved3;
+
+	uint32_t reserved4;
+
+	uint32_t reserved5;
+
+	uint32_t reserved6;
+
+	uint32_t reserved7;
+
+	uint32_t reserved8;
+
+	uint32_t completion_signal_lo32;
+
+	uint32_t completion_signal_hi32;
+
+};
+#endif
+
+/*--------------------MES_MAP_QUEUES--------------------*/
+
+#ifndef PM4_MES_MAP_QUEUES_VI_DEFINED
+#define PM4_MES_MAP_QUEUES_VI_DEFINED
+enum mes_map_queues_queue_sel_enum {
+	queue_sel__mes_map_queues__map_to_specified_queue_slots_vi = 0,
+queue_sel__mes_map_queues__map_to_hws_determined_queue_slots_vi = 1
+};
+
+enum mes_map_queues_queue_type_enum {
+	queue_type__mes_map_queues__normal_compute_vi = 0,
+	queue_type__mes_map_queues__debug_interface_queue_vi = 1,
+	queue_type__mes_map_queues__normal_latency_static_queue_vi = 2,
+queue_type__mes_map_queues__low_latency_static_queue_vi = 3
+};
+
+enum mes_map_queues_alloc_format_enum {
+	alloc_format__mes_map_queues__one_per_pipe_vi = 0,
+alloc_format__mes_map_queues__all_on_one_pipe_vi = 1
+};
+
+enum mes_map_queues_engine_sel_enum {
+	engine_sel__mes_map_queues__compute_vi = 0,
+	engine_sel__mes_map_queues__sdma0_vi = 2,
+	engine_sel__mes_map_queues__sdma1_vi = 3
+};
+
+
+struct pm4_mes_map_queues {
+	union {
+		union PM4_MES_TYPE_3_HEADER   header;            /* header */
+		uint32_t            ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t reserved1:4;
+			enum mes_map_queues_queue_sel_enum queue_sel:2;
+			uint32_t reserved2:15;
+			enum mes_map_queues_queue_type_enum queue_type:3;
+			enum mes_map_queues_alloc_format_enum alloc_format:2;
+			enum mes_map_queues_engine_sel_enum engine_sel:3;
+			uint32_t num_queues:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t reserved3:1;
+			uint32_t check_disable:1;
+			uint32_t doorbell_offset:26;
+			uint32_t reserved4:4;
+		} bitfields3;
+		uint32_t ordinal3;
+	};
+
+	uint32_t mqd_addr_lo;
+	uint32_t mqd_addr_hi;
+	uint32_t wptr_addr_lo;
+	uint32_t wptr_addr_hi;
+};
+#endif
+
+/*--------------------MES_QUERY_STATUS--------------------*/
+
+#ifndef PM4_MES_QUERY_STATUS_DEFINED
+#define PM4_MES_QUERY_STATUS_DEFINED
+enum mes_query_status_interrupt_sel_enum {
+	interrupt_sel__mes_query_status__completion_status = 0,
+	interrupt_sel__mes_query_status__process_status = 1,
+	interrupt_sel__mes_query_status__queue_status = 2
+};
+
+enum mes_query_status_command_enum {
+	command__mes_query_status__interrupt_only = 0,
+	command__mes_query_status__fence_only_immediate = 1,
+	command__mes_query_status__fence_only_after_write_ack = 2,
+	command__mes_query_status__fence_wait_for_write_ack_send_interrupt = 3
+};
+
+enum mes_query_status_engine_sel_enum {
+	engine_sel__mes_query_status__compute = 0,
+	engine_sel__mes_query_status__sdma0_queue = 2,
+	engine_sel__mes_query_status__sdma1_queue = 3
+};
+
+struct pm4_mes_query_status {
+	union {
+		union PM4_MES_TYPE_3_HEADER   header;            /* header */
+		uint32_t            ordinal1;
+	};
+
+	union {
+		struct {
+			uint32_t context_id:28;
+			enum mes_query_status_interrupt_sel_enum	interrupt_sel:2;
+			enum mes_query_status_command_enum command:2;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved1:16;
+		} bitfields3a;
+		struct {
+			uint32_t reserved2:2;
+			uint32_t doorbell_offset:26;
+			enum mes_query_status_engine_sel_enum engine_sel:3;
+			uint32_t reserved3:1;
+		} bitfields3b;
+		uint32_t ordinal3;
+	};
+
+	uint32_t addr_lo;
+	uint32_t addr_hi;
+	uint32_t data_lo;
+	uint32_t data_hi;
+};
+#endif
+
+/*--------------------MES_UNMAP_QUEUES--------------------*/
+
+#ifndef PM4_MES_UNMAP_QUEUES_DEFINED
+#define PM4_MES_UNMAP_QUEUES_DEFINED
+enum mes_unmap_queues_action_enum {
+	action__mes_unmap_queues__preempt_queues = 0,
+	action__mes_unmap_queues__reset_queues = 1,
+	action__mes_unmap_queues__disable_process_queues = 2,
+	action__mes_unmap_queues__reserved = 3
+};
+
+enum mes_unmap_queues_queue_sel_enum {
+	queue_sel__mes_unmap_queues__perform_request_on_specified_queues = 0,
+	queue_sel__mes_unmap_queues__perform_request_on_pasid_queues = 1,
+	queue_sel__mes_unmap_queues__unmap_all_queues = 2,
+	queue_sel__mes_unmap_queues__unmap_all_non_static_queues = 3
+};
+
+enum mes_unmap_queues_engine_sel_enum {
+	engine_sel__mes_unmap_queues__compute = 0,
+	engine_sel__mes_unmap_queues__sdma0 = 2,
+	engine_sel__mes_unmap_queues__sdmal = 3
+};
+
+struct pm4_mes_unmap_queues {
+	union {
+		union PM4_MES_TYPE_3_HEADER   header;            /* header */
+		uint32_t            ordinal1;
+	};
+
+	union {
+		struct {
+			enum mes_unmap_queues_action_enum action:2;
+			uint32_t reserved1:2;
+			enum mes_unmap_queues_queue_sel_enum queue_sel:2;
+			uint32_t reserved2:20;
+			enum mes_unmap_queues_engine_sel_enum engine_sel:3;
+			uint32_t num_queues:3;
+		} bitfields2;
+		uint32_t ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t pasid:16;
+			uint32_t reserved3:16;
+		} bitfields3a;
+		struct {
+			uint32_t reserved4:2;
+			uint32_t doorbell_offset0:26;
+			int32_t reserved5:4;
+		} bitfields3b;
+		uint32_t ordinal3;
+	};
+
+	union {
+	struct {
+			uint32_t reserved6:2;
+			uint32_t doorbell_offset1:26;
+			uint32_t reserved7:4;
+		} bitfields4;
+		uint32_t ordinal4;
+	};
+
+	union {
+		struct {
+			uint32_t reserved8:2;
+			uint32_t doorbell_offset2:26;
+			uint32_t reserved9:4;
+		} bitfields5;
+		uint32_t ordinal5;
+	};
+
+	union {
+		struct {
+			uint32_t reserved10:2;
+			uint32_t doorbell_offset3:26;
+			uint32_t reserved11:4;
+		} bitfields6;
+		uint32_t ordinal6;
+	};
+};
+#endif
+
+#ifndef PM4_MEC_RELEASE_MEM_DEFINED
+#define PM4_MEC_RELEASE_MEM_DEFINED
+
+enum mec_release_mem_event_index_enum {
+	event_index__mec_release_mem__end_of_pipe = 5,
+	event_index__mec_release_mem__shader_done = 6
+};
+
+enum mec_release_mem_cache_policy_enum {
+	cache_policy__mec_release_mem__lru = 0,
+	cache_policy__mec_release_mem__stream = 1
+};
+
+enum mec_release_mem_pq_exe_status_enum {
+	pq_exe_status__mec_release_mem__default = 0,
+	pq_exe_status__mec_release_mem__phase_update = 1
+};
+
+enum mec_release_mem_dst_sel_enum {
+	dst_sel__mec_release_mem__memory_controller = 0,
+	dst_sel__mec_release_mem__tc_l2 = 1,
+	dst_sel__mec_release_mem__queue_write_pointer_register = 2,
+	dst_sel__mec_release_mem__queue_write_pointer_poll_mask_bit = 3
+};
+
+enum mec_release_mem_int_sel_enum {
+	int_sel__mec_release_mem__none = 0,
+	int_sel__mec_release_mem__send_interrupt_only = 1,
+	int_sel__mec_release_mem__send_interrupt_after_write_confirm = 2,
+	int_sel__mec_release_mem__send_data_after_write_confirm = 3,
+	int_sel__mec_release_mem__unconditionally_send_int_ctxid = 4,
+	int_sel__mec_release_mem__conditionally_send_int_ctxid_based_on_32_bit_compare = 5,
+	int_sel__mec_release_mem__conditionally_send_int_ctxid_based_on_64_bit_compare = 6
+};
+
+enum mec_release_mem_data_sel_enum {
+	data_sel__mec_release_mem__none = 0,
+	data_sel__mec_release_mem__send_32_bit_low = 1,
+	data_sel__mec_release_mem__send_64_bit_data = 2,
+	data_sel__mec_release_mem__send_gpu_clock_counter = 3,
+	data_sel__mec_release_mem__send_cp_perfcounter_hi_lo = 4,
+	data_sel__mec_release_mem__store_gds_data_to_memory = 5
+};
+
+struct pm4_mec_release_mem {
+	union {
+		union PM4_MES_TYPE_3_HEADER header;     /*header */
+		unsigned int ordinal1;
+	};
+
+	union {
+		struct {
+			unsigned int event_type:6;
+			unsigned int reserved1:2;
+			enum mec_release_mem_event_index_enum event_index:4;
+			unsigned int tcl1_vol_action_ena:1;
+			unsigned int tc_vol_action_ena:1;
+			unsigned int reserved2:1;
+			unsigned int tc_wb_action_ena:1;
+			unsigned int tcl1_action_ena:1;
+			unsigned int tc_action_ena:1;
+			uint32_t reserved3:1;
+			uint32_t tc_nc_action_ena:1;
+			uint32_t tc_wc_action_ena:1;
+			uint32_t tc_md_action_ena:1;
+			uint32_t reserved4:3;
+			enum mec_release_mem_cache_policy_enum cache_policy:2;
+			uint32_t reserved5:2;
+			enum mec_release_mem_pq_exe_status_enum pq_exe_status:1;
+			uint32_t reserved6:2;
+		} bitfields2;
+		unsigned int ordinal2;
+	};
+
+	union {
+		struct {
+			uint32_t reserved7:16;
+			enum mec_release_mem_dst_sel_enum dst_sel:2;
+			uint32_t reserved8:6;
+			enum mec_release_mem_int_sel_enum int_sel:3;
+			uint32_t reserved9:2;
+			enum mec_release_mem_data_sel_enum data_sel:3;
+		} bitfields3;
+		unsigned int ordinal3;
+	};
+
+	union {
+		struct {
+			uint32_t reserved10:2;
+			unsigned int address_lo_32b:30;
+		} bitfields4;
+		struct {
+			uint32_t reserved11:3;
+			uint32_t address_lo_64b:29;
+		} bitfields4b;
+		uint32_t reserved12;
+		unsigned int ordinal4;
+	};
+
+	union {
+		uint32_t address_hi;
+		uint32_t reserved13;
+		uint32_t ordinal5;
+	};
+
+	union {
+		uint32_t data_lo;
+		uint32_t cmp_data_lo;
+		struct {
+			uint32_t dw_offset:16;
+			uint32_t num_dwords:16;
+		} bitfields6c;
+		uint32_t reserved14;
+		uint32_t ordinal6;
+	};
+
+	union {
+		uint32_t data_hi;
+		uint32_t cmp_data_hi;
+		uint32_t reserved15;
+		uint32_t reserved16;
+		uint32_t ordinal7;
+	};
+
+	uint32_t int_ctxid;
+
+};
+
+#endif
+
+enum {
+	CACHE_FLUSH_AND_INV_TS_EVENT = 0x00000014
+};
+#endif
+
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 873a8fb..b68299a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -900,6 +900,7 @@ struct packet_manager_funcs {
 };
 
 extern const struct packet_manager_funcs kfd_vi_pm_funcs;
+extern const struct packet_manager_funcs kfd_v9_pm_funcs;
 
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm);
 void pm_uninit(struct packet_manager *pm);
@@ -916,6 +917,11 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 
 void pm_release_ib(struct packet_manager *pm);
 
+/* Following PM funcs can be shared among VI and AI */
+unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size);
+int pm_set_resources_vi(struct packet_manager *pm, uint32_t *buffer,
+				struct scheduling_resources *res);
+
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
       [not found]     ` <1523395998-31314-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:33   ` [PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager Felix Kuehling
                     ` (12 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Jay Cornwall, John Bridgman

Signed-off-by: John Bridgman <john.bridgman@amd.com>
Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Makefile             |   1 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c         |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c    |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   3 +
 5 files changed, 451 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index 52b3c1b..094b591 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -30,6 +30,7 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
 		kfd_process.o kfd_queue.o kfd_mqd_manager.o \
 		kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
+		kfd_mqd_manager_v9.o \
 		kfd_kernel_queue.o kfd_kernel_queue_cik.o \
 		kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
 		kfd_packet_manager.o kfd_process_queue_manager.o \
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index f563acb..c368ce3 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
 	if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
 		return -ENOMEM;
 
-	*mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
+	*mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
 	if ((*mem_obj) == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
index ee7061e..4b8eb50 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
@@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
 	case CHIP_POLARIS10:
 	case CHIP_POLARIS11:
 		return mqd_manager_init_vi_tonga(type, dev);
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		return mqd_manager_init_v9(type, dev);
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
new file mode 100644
index 0000000..684054f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
@@ -0,0 +1,443 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include <linux/printk.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include "kfd_priv.h"
+#include "kfd_mqd_manager.h"
+#include "v9_structs.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+
+static inline struct v9_mqd *get_mqd(void *mqd)
+{
+	return (struct v9_mqd *)mqd;
+}
+
+static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
+{
+	return (struct v9_sdma_mqd *)mqd;
+}
+
+static int init_mqd(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *q)
+{
+	int retval;
+	uint64_t addr;
+	struct v9_mqd *m;
+	struct kfd_dev *kfd = mm->dev;
+
+	/* From V9,  for CWSR, the control stack is located on the next page
+	 * boundary after the mqd, we will use the gtt allocation function
+	 * instead of sub-allocation function.
+	 */
+	if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
+		*mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
+		if (!*mqd_mem_obj)
+			return -ENOMEM;
+		retval = kfd->kfd2kgd->init_gtt_mem_allocation(kfd->kgd,
+			ALIGN(q->ctl_stack_size, PAGE_SIZE) +
+				ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
+			&((*mqd_mem_obj)->gtt_mem),
+			&((*mqd_mem_obj)->gpu_addr),
+			(void *)&((*mqd_mem_obj)->cpu_ptr));
+	} else
+		retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
+				mqd_mem_obj);
+	if (retval != 0)
+		return -ENOMEM;
+
+	m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
+	addr = (*mqd_mem_obj)->gpu_addr;
+
+	memset(m, 0, sizeof(struct v9_mqd));
+
+	m->header = 0xC0310800;
+	m->compute_pipelinestat_enable = 1;
+	m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
+	m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
+
+	m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |
+			0x53 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;
+
+	m->cp_mqd_control = 1 << CP_MQD_CONTROL__PRIV_STATE__SHIFT;
+
+	m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
+	m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
+
+	m->cp_hqd_quantum = 1 << CP_HQD_QUANTUM__QUANTUM_EN__SHIFT |
+			1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
+			10 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
+
+	m->cp_hqd_pipe_priority = 1;
+	m->cp_hqd_queue_priority = 15;
+
+	if (q->format == KFD_QUEUE_FORMAT_AQL) {
+		m->cp_hqd_aql_control =
+			1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
+	}
+
+	if (q->tba_addr) {
+		m->compute_pgm_rsrc2 |=
+			(1 << COMPUTE_PGM_RSRC2__TRAP_PRESENT__SHIFT);
+	}
+
+	if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address) {
+		m->cp_hqd_persistent_state |=
+			(1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
+		m->cp_hqd_ctx_save_base_addr_lo =
+			lower_32_bits(q->ctx_save_restore_area_address);
+		m->cp_hqd_ctx_save_base_addr_hi =
+			upper_32_bits(q->ctx_save_restore_area_address);
+		m->cp_hqd_ctx_save_size = q->ctx_save_restore_area_size;
+		m->cp_hqd_cntl_stack_size = q->ctl_stack_size;
+		m->cp_hqd_cntl_stack_offset = q->ctl_stack_size;
+		m->cp_hqd_wg_state_offset = q->ctl_stack_size;
+	}
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = addr;
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static int load_mqd(struct mqd_manager *mm, void *mqd,
+			uint32_t pipe_id, uint32_t queue_id,
+			struct queue_properties *p, struct mm_struct *mms)
+{
+	/* AQL write pointer counts in 64B packets, PM4/CP counts in dwords. */
+	uint32_t wptr_shift = (p->format == KFD_QUEUE_FORMAT_AQL ? 4 : 0);
+
+	return mm->dev->kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id,
+					  (uint32_t __user *)p->write_ptr,
+					  wptr_shift, 0, mms);
+}
+
+static int update_mqd(struct mqd_manager *mm, void *mqd,
+		      struct queue_properties *q)
+{
+	struct v9_mqd *m;
+
+	m = get_mqd(mqd);
+
+	m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
+	m->cp_hqd_pq_control |= order_base_2(q->queue_size / 4) - 1;
+	pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
+
+	m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
+	m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
+
+	m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
+	m->cp_hqd_pq_wptr_poll_addr_lo = lower_32_bits((uint64_t)q->write_ptr);
+	m->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits((uint64_t)q->write_ptr);
+
+	m->cp_hqd_pq_doorbell_control =
+		q->doorbell_off <<
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
+	pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
+			m->cp_hqd_pq_doorbell_control);
+
+	m->cp_hqd_ib_control =
+		3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
+		1 << CP_HQD_IB_CONTROL__IB_EXE_DISABLE__SHIFT;
+
+	/*
+	 * HW does not clamp this field correctly. Maximum EOP queue size
+	 * is constrained by per-SE EOP done signal count, which is 8-bit.
+	 * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
+	 * more than (EOP entry count - 1) so a queue size of 0x800 dwords
+	 * is safe, giving a maximum field value of 0xA.
+	 */
+	m->cp_hqd_eop_control = min(0xA,
+		order_base_2(q->eop_ring_buffer_size / 4) - 1);
+	m->cp_hqd_eop_base_addr_lo =
+			lower_32_bits(q->eop_ring_buffer_address >> 8);
+	m->cp_hqd_eop_base_addr_hi =
+			upper_32_bits(q->eop_ring_buffer_address >> 8);
+
+	m->cp_hqd_iq_timer = 0;
+
+	m->cp_hqd_vmid = q->vmid;
+
+	if (q->format == KFD_QUEUE_FORMAT_AQL) {
+		m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
+				2 << CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
+				1 << CP_HQD_PQ_CONTROL__QUEUE_FULL_EN__SHIFT |
+				1 << CP_HQD_PQ_CONTROL__WPP_CLAMP_EN__SHIFT;
+		m->cp_hqd_pq_doorbell_control |= 1 <<
+			CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_BIF_DROP__SHIFT;
+	}
+	if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address)
+		m->cp_hqd_ctx_save_control = 0;
+
+	q->is_active = (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0 &&
+			!q->is_evicted);
+
+	return 0;
+}
+
+
+static int destroy_mqd(struct mqd_manager *mm, void *mqd,
+			enum kfd_preempt_type type,
+			unsigned int timeout, uint32_t pipe_id,
+			uint32_t queue_id)
+{
+	return mm->dev->kfd2kgd->hqd_destroy
+		(mm->dev->kgd, mqd, type, timeout,
+		pipe_id, queue_id);
+}
+
+static void uninit_mqd(struct mqd_manager *mm, void *mqd,
+			struct kfd_mem_obj *mqd_mem_obj)
+{
+	struct kfd_dev *kfd = mm->dev;
+
+	if (mqd_mem_obj->gtt_mem) {
+		kfd->kfd2kgd->free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
+		kfree(mqd_mem_obj);
+	} else {
+		kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
+	}
+}
+
+static bool is_occupied(struct mqd_manager *mm, void *mqd,
+			uint64_t queue_address,	uint32_t pipe_id,
+			uint32_t queue_id)
+{
+	return mm->dev->kfd2kgd->hqd_is_occupied(
+		mm->dev->kgd, queue_address,
+		pipe_id, queue_id);
+}
+
+static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
+			struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+			struct queue_properties *q)
+{
+	struct v9_mqd *m;
+	int retval = init_mqd(mm, mqd, mqd_mem_obj, gart_addr, q);
+
+	if (retval != 0)
+		return retval;
+
+	m = get_mqd(*mqd);
+
+	m->cp_hqd_pq_control |= 1 << CP_HQD_PQ_CONTROL__PRIV_STATE__SHIFT |
+			1 << CP_HQD_PQ_CONTROL__KMD_QUEUE__SHIFT;
+
+	return retval;
+}
+
+static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
+			struct queue_properties *q)
+{
+	struct v9_mqd *m;
+	int retval = update_mqd(mm, mqd, q);
+
+	if (retval != 0)
+		return retval;
+
+	/* TODO: what's the point? update_mqd already does this. */
+	m = get_mqd(mqd);
+	m->cp_hqd_vmid = q->vmid;
+	return retval;
+}
+
+static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
+		struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
+		struct queue_properties *q)
+{
+	int retval;
+	struct v9_sdma_mqd *m;
+
+
+	retval = kfd_gtt_sa_allocate(mm->dev,
+			sizeof(struct v9_sdma_mqd),
+			mqd_mem_obj);
+
+	if (retval != 0)
+		return -ENOMEM;
+
+	m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
+
+	memset(m, 0, sizeof(struct v9_sdma_mqd));
+
+	*mqd = m;
+	if (gart_addr)
+		*gart_addr = (*mqd_mem_obj)->gpu_addr;
+
+	retval = mm->update_mqd(mm, m, q);
+
+	return retval;
+}
+
+static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
+		struct kfd_mem_obj *mqd_mem_obj)
+{
+	kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
+}
+
+static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
+		uint32_t pipe_id, uint32_t queue_id,
+		struct queue_properties *p, struct mm_struct *mms)
+{
+	return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd,
+					       (uint32_t __user *)p->write_ptr,
+					       mms);
+}
+
+#define SDMA_RLC_DUMMY_DEFAULT 0xf
+
+static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
+		struct queue_properties *q)
+{
+	struct v9_sdma_mqd *m;
+
+	m = get_sdma_mqd(mqd);
+	m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
+		<< SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
+		q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
+		1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
+		6 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_TIMER__SHIFT;
+
+	m->sdmax_rlcx_rb_base = lower_32_bits(q->queue_address >> 8);
+	m->sdmax_rlcx_rb_base_hi = upper_32_bits(q->queue_address >> 8);
+	m->sdmax_rlcx_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
+	m->sdmax_rlcx_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
+	m->sdmax_rlcx_doorbell_offset =
+		q->doorbell_off << SDMA0_RLC0_DOORBELL_OFFSET__OFFSET__SHIFT;
+
+	m->sdma_engine_id = q->sdma_engine_id;
+	m->sdma_queue_id = q->sdma_queue_id;
+	m->sdmax_rlcx_dummy_reg = SDMA_RLC_DUMMY_DEFAULT;
+
+	q->is_active = (q->queue_size > 0 &&
+			q->queue_address != 0 &&
+			q->queue_percent > 0 &&
+			!q->is_evicted);
+
+	return 0;
+}
+
+/*
+ *  * preempt type here is ignored because there is only one way
+ *  * to preempt sdma queue
+ */
+static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
+		enum kfd_preempt_type type,
+		unsigned int timeout, uint32_t pipe_id,
+		uint32_t queue_id)
+{
+	return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
+}
+
+static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
+		uint64_t queue_address, uint32_t pipe_id,
+		uint32_t queue_id)
+{
+	return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
+}
+
+#if defined(CONFIG_DEBUG_FS)
+
+static int debugfs_show_mqd(struct seq_file *m, void *data)
+{
+	seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
+		     data, sizeof(struct v9_mqd), false);
+	return 0;
+}
+
+static int debugfs_show_mqd_sdma(struct seq_file *m, void *data)
+{
+	seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
+		     data, sizeof(struct v9_sdma_mqd), false);
+	return 0;
+}
+
+#endif
+
+struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
+		struct kfd_dev *dev)
+{
+	struct mqd_manager *mqd;
+
+	if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
+		return NULL;
+
+	mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
+	if (!mqd)
+		return NULL;
+
+	mqd->dev = dev;
+
+	switch (type) {
+	case KFD_MQD_TYPE_CP:
+	case KFD_MQD_TYPE_COMPUTE:
+		mqd->init_mqd = init_mqd;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd;
+#endif
+		break;
+	case KFD_MQD_TYPE_HIQ:
+		mqd->init_mqd = init_mqd_hiq;
+		mqd->uninit_mqd = uninit_mqd;
+		mqd->load_mqd = load_mqd;
+		mqd->update_mqd = update_mqd_hiq;
+		mqd->destroy_mqd = destroy_mqd;
+		mqd->is_occupied = is_occupied;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd;
+#endif
+		break;
+	case KFD_MQD_TYPE_SDMA:
+		mqd->init_mqd = init_mqd_sdma;
+		mqd->uninit_mqd = uninit_mqd_sdma;
+		mqd->load_mqd = load_mqd_sdma;
+		mqd->update_mqd = update_mqd_sdma;
+		mqd->destroy_mqd = destroy_mqd_sdma;
+		mqd->is_occupied = is_occupied_sdma;
+#if defined(CONFIG_DEBUG_FS)
+		mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
+#endif
+		break;
+	default:
+		kfree(mqd);
+		return NULL;
+	}
+
+	return mqd;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b68299a..fac2882 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -197,6 +197,7 @@ struct kfd_mem_obj {
 	uint32_t range_end;
 	uint64_t gpu_addr;
 	uint32_t *cpu_ptr;
+	void *gtt_mem;
 };
 
 struct kfd_vmid_info {
@@ -822,6 +823,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 		struct kfd_dev *dev);
 struct mqd_manager *mqd_manager_init_vi_tonga(enum KFD_MQD_TYPE type,
 		struct kfd_dev *dev);
+struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
+		struct kfd_dev *dev);
 struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
 void device_queue_manager_uninit(struct device_queue_manager *dqm);
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support Felix Kuehling
                     ` (11 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, John Bridgman

Signed-off-by: John Bridgman <john.bridgman@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Makefile                |  2 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 10 ++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  2 +
 .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   | 84 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |  5 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  5 ++
 6 files changed, 106 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index 094b591..ff8b5aa 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -35,7 +35,7 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
 		kfd_packet_manager.o kfd_process_queue_manager.o \
 		kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
-		kfd_device_queue_manager_vi.o \
+		kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
 		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 500f022..9af94b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1386,7 +1386,10 @@ static bool set_cache_memory_policy(struct device_queue_manager *dqm,
 				   void __user *alternate_aperture_base,
 				   uint64_t alternate_aperture_size)
 {
-	bool retval;
+	bool retval = true;
+
+	if (!dqm->asic_ops.set_cache_memory_policy)
+		return retval;
 
 	mutex_lock(&dqm->lock);
 
@@ -1655,6 +1658,11 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 	case CHIP_POLARIS11:
 		device_queue_manager_init_vi_tonga(&dqm->asic_ops);
 		break;
+
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		device_queue_manager_init_v9(&dqm->asic_ops);
+		break;
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 412beff..59a6b19 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -200,6 +200,8 @@ void device_queue_manager_init_vi(
 		struct device_queue_manager_asic_ops *asic_ops);
 void device_queue_manager_init_vi_tonga(
 		struct device_queue_manager_asic_ops *asic_ops);
+void device_queue_manager_init_v9(
+		struct device_queue_manager_asic_ops *asic_ops);
 void program_sh_mem_settings(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd);
 unsigned int get_queues_num(struct device_queue_manager *dqm);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
new file mode 100644
index 0000000..79e5bcf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "kfd_device_queue_manager.h"
+#include "vega10_enum.h"
+#include "gc/gc_9_0_offset.h"
+#include "gc/gc_9_0_sh_mask.h"
+#include "sdma0/sdma0_4_0_sh_mask.h"
+
+static int update_qpd_v9(struct device_queue_manager *dqm,
+			 struct qcm_process_device *qpd);
+static void init_sdma_vm_v9(struct device_queue_manager *dqm, struct queue *q,
+			    struct qcm_process_device *qpd);
+
+void device_queue_manager_init_v9(
+	struct device_queue_manager_asic_ops *asic_ops)
+{
+	asic_ops->update_qpd = update_qpd_v9;
+	asic_ops->init_sdma_vm = init_sdma_vm_v9;
+}
+
+static uint32_t compute_sh_mem_bases_64bit(struct kfd_process_device *pdd)
+{
+	uint32_t shared_base = pdd->lds_base >> 48;
+	uint32_t private_base = pdd->scratch_base >> 48;
+
+	return (shared_base << SH_MEM_BASES__SHARED_BASE__SHIFT) |
+		private_base;
+}
+
+static int update_qpd_v9(struct device_queue_manager *dqm,
+			 struct qcm_process_device *qpd)
+{
+	struct kfd_process_device *pdd;
+
+	pdd = qpd_to_pdd(qpd);
+
+	/* check if sh_mem_config register already configured */
+	if (qpd->sh_mem_config == 0) {
+		qpd->sh_mem_config =
+				SH_MEM_ALIGNMENT_MODE_UNALIGNED <<
+					SH_MEM_CONFIG__ALIGNMENT_MODE__SHIFT;
+		if (vega10_noretry &&
+		    !dqm->dev->device_info->needs_iommu_device)
+			qpd->sh_mem_config |=
+				1 << SH_MEM_CONFIG__RETRY_DISABLE__SHIFT;
+
+		qpd->sh_mem_ape1_limit = 0;
+		qpd->sh_mem_ape1_base = 0;
+	}
+
+	qpd->sh_mem_bases = compute_sh_mem_bases_64bit(pdd);
+
+	pr_debug("sh_mem_bases 0x%X\n", qpd->sh_mem_bases);
+
+	return 0;
+}
+
+static void init_sdma_vm_v9(struct device_queue_manager *dqm, struct queue *q,
+			    struct qcm_process_device *qpd)
+{
+	/* Not needed on SDMAv4 any more */
+	q->properties.sdma_vm_addr = 0;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 45bc458..76bf2dc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -83,6 +83,11 @@ module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
 	"Ignore CRAT table during KFD initialization (0 = use CRAT (default), 1 = ignore CRAT)");
 
+int vega10_noretry;
+module_param_named(noretry, vega10_noretry, int, 0644);
+MODULE_PARM_DESC(noretry,
+	"Set sh_mem_config.retry_disable on Vega10 (0 = retry enabled (default), 1 = retry disabled)");
+
 static int amdkfd_init_completed;
 
 int kgd2kfd_init(unsigned int interface_version,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index fac2882..d5cdb5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -137,6 +137,11 @@ extern int debug_largebar;
  */
 extern int ignore_crat;
 
+/*
+ * Set sh_mem_config.retry_disable on Vega10
+ */
+extern int vega10_noretry;
+
 /**
  * enum kfd_sched_policy
  *
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 14/21] drm/amdkfd: Fix goto usage Felix Kuehling
                     ` (10 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Oak Zeng, Shaoyun Liu

Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Makefile             |  2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 84 +++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |  2 +
 drivers/gpu/drm/amd/amdkfd/soc15_int.h          | 47 ++++++++++++++
 4 files changed, 134 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index ff8b5aa..ffd096f 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -37,7 +37,7 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_device_queue_manager.o kfd_device_queue_manager_cik.o \
 		kfd_device_queue_manager_vi.o kfd_device_queue_manager_v9.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
-		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
+		kfd_int_process_v9.o kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
 ifneq ($(CONFIG_AMD_IOMMU_V2),)
 amdkfd-y += kfd_iommu.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
new file mode 100644
index 0000000..39d4115
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -0,0 +1,84 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include "kfd_priv.h"
+#include "kfd_events.h"
+#include "soc15_int.h"
+
+
+static bool event_interrupt_isr_v9(struct kfd_dev *dev,
+					const uint32_t *ih_ring_entry)
+{
+	uint16_t source_id, client_id, pasid, vmid;
+
+	source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
+	client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
+	pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
+	vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
+
+	if (pasid) {
+		const uint32_t *data = ih_ring_entry;
+
+		pr_debug("client id 0x%x, source id %d, pasid 0x%x. raw data:\n",
+			 client_id, source_id, pasid);
+		pr_debug("%8X, %8X, %8X, %8X, %8X, %8X, %8X, %8X.\n",
+			 data[0], data[1], data[2], data[3],
+			 data[4], data[5], data[6], data[7]);
+	}
+
+	return (pasid != 0) &&
+		(source_id == SOC15_INTSRC_CP_END_OF_PIPE ||
+		 source_id == SOC15_INTSRC_SDMA_TRAP ||
+		 source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG ||
+		 source_id == SOC15_INTSRC_CP_BAD_OPCODE);
+}
+
+static void event_interrupt_wq_v9(struct kfd_dev *dev,
+					const uint32_t *ih_ring_entry)
+{
+	uint16_t source_id, client_id, pasid, vmid;
+	uint32_t context_id;
+
+	source_id = SOC15_SOURCE_ID_FROM_IH_ENTRY(ih_ring_entry);
+	client_id = SOC15_CLIENT_ID_FROM_IH_ENTRY(ih_ring_entry);
+	pasid = SOC15_PASID_FROM_IH_ENTRY(ih_ring_entry);
+	vmid = SOC15_VMID_FROM_IH_ENTRY(ih_ring_entry);
+	context_id = SOC15_CONTEXT_ID0_FROM_IH_ENTRY(ih_ring_entry);
+
+	if (source_id == SOC15_INTSRC_CP_END_OF_PIPE)
+		kfd_signal_event_interrupt(pasid, context_id, 32);
+	else if (source_id == SOC15_INTSRC_SDMA_TRAP)
+		kfd_signal_event_interrupt(pasid, context_id & 0xfffffff, 28);
+	else if (source_id == SOC15_INTSRC_SQ_INTERRUPT_MSG)
+		kfd_signal_event_interrupt(pasid, context_id & 0xffffff, 24);
+	else if (source_id == SOC15_INTSRC_CP_BAD_OPCODE)
+		kfd_signal_hw_exception_event(pasid);
+	else if (client_id == SOC15_IH_CLIENTID_VMC ||
+		 client_id == SOC15_IH_CLIENTID_UTCL2) {
+		/* TODO */
+	}
+}
+
+const struct kfd_event_interrupt_class event_interrupt_class_v9 = {
+	.interrupt_isr = event_interrupt_isr_v9,
+	.interrupt_wq = event_interrupt_wq_v9,
+};
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index d5cdb5d..06b210b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -934,6 +934,8 @@ uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
 extern const struct kfd_event_interrupt_class event_interrupt_class_cik;
+extern const struct kfd_event_interrupt_class event_interrupt_class_v9;
+
 extern const struct kfd_device_global_init_class device_global_init_class_cik;
 
 void kfd_event_init_process(struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/soc15_int.h b/drivers/gpu/drm/amd/amdkfd/soc15_int.h
new file mode 100644
index 0000000..0bc0b25
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/soc15_int.h
@@ -0,0 +1,47 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef HSA_SOC15_INT_H_INCLUDED
+#define HSA_SOC15_INT_H_INCLUDED
+
+#include "soc15_ih_clientid.h"
+
+#define SOC15_INTSRC_CP_END_OF_PIPE	181
+#define SOC15_INTSRC_CP_BAD_OPCODE	183
+#define SOC15_INTSRC_SQ_INTERRUPT_MSG	239
+#define SOC15_INTSRC_VMC_FAULT		0
+#define SOC15_INTSRC_SDMA_TRAP		224
+
+
+#define SOC15_CLIENT_ID_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[0]) & 0xff)
+#define SOC15_SOURCE_ID_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[0]) >> 8 & 0xff)
+#define SOC15_RING_ID_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[0]) >> 16 & 0xff)
+#define SOC15_VMID_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[0]) >> 24 & 0xf)
+#define SOC15_VMID_TYPE_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[0]) >> 31 & 0x1)
+#define SOC15_PASID_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[3]) & 0xffff)
+#define SOC15_CONTEXT_ID0_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[4]))
+#define SOC15_CONTEXT_ID1_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[5]))
+#define SOC15_CONTEXT_ID2_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[6]))
+#define SOC15_CONTEXT_ID3_FROM_IH_ENTRY(entry) (le32_to_cpu(entry[7]))
+
+#endif
+
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/21] drm/amdkfd: Fix goto usage
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet Felix Kuehling
                     ` (9 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Kent Russell

Missed a spot in previous cleanup commit:
Remove gotos that do not feature any common cleanup, and use gotos
instead of repeating cleanup commands.

According to kernel.org: "The goto statement comes in handy when a
function exits from multiple locations and some common work such as
cleanup has to be done. If there is no cleanup needed then just return
directly."

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 69f4964..23e586b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -232,18 +232,16 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 		 * make sure calling functions know
 		 * acquire_packet_buffer() failed
 		 */
-		*buffer_ptr = NULL;
-		return -ENOMEM;
+		goto err_no_space;
 	}
 
 	if (wptr + packet_size_in_dwords >= queue_size_dwords) {
 		/* make sure after rolling back to position 0, there is
 		 * still enough space.
 		 */
-		if (packet_size_in_dwords >= rptr) {
-			*buffer_ptr = NULL;
-			return -ENOMEM;
-		}
+		if (packet_size_in_dwords >= rptr)
+			goto err_no_space;
+
 		/* fill nops, roll back and start at position 0 */
 		while (wptr > 0) {
 			queue_address[wptr] = kq->nop_packet;
@@ -255,6 +253,10 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 	kq->pending_wptr = wptr + packet_size_in_dwords;
 
 	return 0;
+
+err_no_space:
+	*buffer_ptr = NULL;
+	return -ENOMEM;
 }
 
 static void submit_packet(struct kernel_queue *kq)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 14/21] drm/amdkfd: Fix goto usage Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue Felix Kuehling
                     ` (8 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

kq->queue->properties.write_ptr is a GPU address which can'd be
derefenced in the kernel. Use kq->wptr_kernel instead, which is the
kernel CPU address of the same buffer.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 23e586b..9f38161 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -279,7 +279,7 @@ static void submit_packet(struct kernel_queue *kq)
 
 static void rollback_packet(struct kernel_queue *kq)
 {
-	kq->pending_wptr = *kq->queue->properties.write_ptr;
+	kq->pending_wptr = *kq->wptr_kernel;
 }
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
       [not found]     ` <1523395998-31314-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-04-10 21:33   ` [PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up) Felix Kuehling
                     ` (7 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c         | 10 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c     | 25 +++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h     |  7 ++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h             |  1 +
 7 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 36c9269e..5d7cccc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
 	}
 }
 
+void write_kernel_doorbell64(void __iomem *db, u64 value)
+{
+	if (db) {
+		WARN(((unsigned long)db & 7) != 0,
+		     "Unaligned 64-bit doorbell");
+		writeq(value, (u64 __iomem *)db);
+		pr_debug("writing %llu to doorbell address 0x%p\n", value, db);
+	}
+}
+
 unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
 					unsigned int doorbell_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 9f38161..476951d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
 	kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
 	kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
 
-	retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel),
+	retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size,
 					&kq->wptr_mem);
 
 	if (retval != 0)
@@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 	size_t available_size;
 	size_t queue_size_dwords;
 	uint32_t wptr, rptr;
+	uint64_t wptr64;
 	unsigned int *queue_address;
 
 	/* When rptr == wptr, the buffer is empty.
@@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 	 * the opposite. So we can only use up to queue_size_dwords - 1 dwords.
 	 */
 	rptr = *kq->rptr_kernel;
-	wptr = *kq->wptr_kernel;
+	wptr = kq->pending_wptr;
+	wptr64 = kq->pending_wptr64;
 	queue_address = (unsigned int *)kq->pq_kernel_addr;
 	queue_size_dwords = kq->queue->properties.queue_size / 4;
 
@@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 		while (wptr > 0) {
 			queue_address[wptr] = kq->nop_packet;
 			wptr = (wptr + 1) % queue_size_dwords;
+			wptr64++;
 		}
 	}
 
 	*buffer_ptr = &queue_address[wptr];
 	kq->pending_wptr = wptr + packet_size_in_dwords;
+	kq->pending_wptr64 = wptr64 + packet_size_in_dwords;
 
 	return 0;
 
@@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq)
 	pr_debug("\n");
 #endif
 
-	*kq->wptr_kernel = kq->pending_wptr;
-	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
-				kq->pending_wptr);
+	kq->ops_asic_specific.submit_packet(kq);
 }
 
 static void rollback_packet(struct kernel_queue *kq)
 {
-	kq->pending_wptr = *kq->wptr_kernel;
+	if (kq->dev->device_info->doorbell_size == 8) {
+		kq->pending_wptr64 = *kq->wptr64_kernel;
+		kq->pending_wptr = *kq->wptr_kernel %
+			(kq->queue->properties.queue_size / 4);
+	} else {
+		kq->pending_wptr = *kq->wptr_kernel;
+	}
 }
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
@@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
 	case CHIP_HAWAII:
 		kernel_queue_init_cik(&kq->ops_asic_specific);
 		break;
+
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		kernel_queue_init_v9(&kq->ops_asic_specific);
+		break;
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
index 5940531..97aff20 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
@@ -72,6 +72,7 @@ struct kernel_queue {
 	struct kfd_dev		*dev;
 	struct mqd_manager	*mqd;
 	struct queue		*queue;
+	uint64_t		pending_wptr64;
 	uint32_t		pending_wptr;
 	unsigned int		nop_packet;
 
@@ -79,7 +80,10 @@ struct kernel_queue {
 	uint32_t		*rptr_kernel;
 	uint64_t		rptr_gpu_addr;
 	struct kfd_mem_obj	*wptr_mem;
-	uint32_t		*wptr_kernel;
+	union {
+		uint64_t	*wptr64_kernel;
+		uint32_t	*wptr_kernel;
+	};
 	uint64_t		wptr_gpu_addr;
 	struct kfd_mem_obj	*pq;
 	uint64_t		pq_gpu_addr;
@@ -97,5 +101,6 @@ struct kernel_queue {
 
 void kernel_queue_init_cik(struct kernel_queue_ops *ops);
 void kernel_queue_init_vi(struct kernel_queue_ops *ops);
+void kernel_queue_init_v9(struct kernel_queue_ops *ops);
 
 #endif /* KFD_KERNEL_QUEUE_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
index a90eb44..19e54ac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
@@ -26,11 +26,13 @@
 static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_cik(struct kernel_queue *kq);
+static void submit_packet_cik(struct kernel_queue *kq);
 
 void kernel_queue_init_cik(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_cik;
 	ops->uninitialize = uninitialize_cik;
+	ops->submit_packet = submit_packet_cik;
 }
 
 static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -42,3 +44,10 @@ static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
 static void uninitialize_cik(struct kernel_queue *kq)
 {
 }
+
+static void submit_packet_cik(struct kernel_queue *kq)
+{
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
index ece7d59..684a3bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -29,11 +29,13 @@
 static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_v9(struct kernel_queue *kq);
+static void submit_packet_v9(struct kernel_queue *kq);
 
 void kernel_queue_init_v9(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_v9;
 	ops->uninitialize = uninitialize_v9;
+	ops->submit_packet = submit_packet_v9;
 }
 
 static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -58,6 +60,13 @@ static void uninitialize_v9(struct kernel_queue *kq)
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
 
+static void submit_packet_v9(struct kernel_queue *kq)
+{
+	*kq->wptr64_kernel = kq->pending_wptr64;
+	write_kernel_doorbell64(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr64);
+}
+
 static int pm_map_process_v9(struct packet_manager *pm,
 		uint32_t *buffer, struct qcm_process_device *qpd)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index f9019ef..bf20c6d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -29,11 +29,13 @@
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_vi(struct kernel_queue *kq);
+static void submit_packet_vi(struct kernel_queue *kq);
 
 void kernel_queue_init_vi(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_vi;
 	ops->uninitialize = uninitialize_vi;
+	ops->submit_packet = submit_packet_vi;
 }
 
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -58,6 +60,13 @@ static void uninitialize_vi(struct kernel_queue *kq)
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
 
+static void submit_packet_vi(struct kernel_queue *kq)
+{
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr);
+}
+
 unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size)
 {
 	union PM4_MES_TYPE_3_HEADER header;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 06b210b..10d5b54 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -769,6 +769,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
 u32 read_kernel_doorbell(u32 __iomem *db);
 void write_kernel_doorbell(void __iomem *db, u32 value);
+void write_kernel_doorbell64(void __iomem *db, u64 value);
 unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
 					unsigned int doorbell_id);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up)
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9 Felix Kuehling
                     ` (6 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This condition was missed in a previous commit with the same title.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 66852de..f16ac2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -307,9 +307,7 @@ int kfd_init_apertures(struct kfd_process *process)
 	struct kfd_process_device *pdd;
 
 	/*Iterating over all devices*/
-	while (kfd_topology_enum_kfd_devices(id, &dev) == 0 &&
-		id < NUM_OF_SUPPORTED_GPUS) {
-
+	while (kfd_topology_enum_kfd_devices(id, &dev) == 0) {
 		if (!dev) {
 			id++; /* Skip non GPU devices */
 			continue;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up) Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler Felix Kuehling
                     ` (5 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 115 ++++++++++++++++++++-------
 1 file changed, 87 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index f16ac2b..97d5423 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -275,23 +275,35 @@
  * for FLAT_* / S_LOAD operations.
  */
 
-#define MAKE_GPUVM_APP_BASE(gpu_num) \
+#define MAKE_GPUVM_APP_BASE_VI(gpu_num) \
 	(((uint64_t)(gpu_num) << 61) + 0x1000000000000L)
 
 #define MAKE_GPUVM_APP_LIMIT(base, size) \
 	(((uint64_t)(base) & 0xFFFFFF0000000000UL) + (size) - 1)
 
-#define MAKE_SCRATCH_APP_BASE() \
+#define MAKE_SCRATCH_APP_BASE_VI() \
 	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
 
 #define MAKE_SCRATCH_APP_LIMIT(base) \
 	(((uint64_t)base & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
-#define MAKE_LDS_APP_BASE() \
+#define MAKE_LDS_APP_BASE_VI() \
 	(((uint64_t)(0x1UL) << 61) + 0x0)
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
+/* On GFXv9 the LDS and scratch apertures are programmed independently
+ * using the high 16 bits of the 64-bit virtual address. They must be
+ * in the hole, which will be the case as long as the high 16 bits are
+ * not 0.
+ *
+ * The aperture sizes are still 4GB implicitly.
+ *
+ * A GPUVM aperture is not applicable on GFXv9.
+ */
+#define MAKE_LDS_APP_BASE_V9() ((uint64_t)(0x1UL) << 48)
+#define MAKE_SCRATCH_APP_BASE_V9() ((uint64_t)(0x2UL) << 48)
+
 /* User mode manages most of the SVM aperture address space. The low
  * 16MB are reserved for kernel use (CWSR trap handler and kernel IB
  * for now).
@@ -300,6 +312,55 @@
 #define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE)
 #define SVM_IB_BASE   (SVM_CWSR_BASE - PAGE_SIZE)
 
+static void kfd_init_apertures_vi(struct kfd_process_device *pdd, uint8_t id)
+{
+	/*
+	 * node id couldn't be 0 - the three MSB bits of
+	 * aperture shoudn't be 0
+	 */
+	pdd->lds_base = MAKE_LDS_APP_BASE_VI();
+	pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+
+	if (!pdd->dev->device_info->needs_iommu_device) {
+		/* dGPUs: SVM aperture starting at 0
+		 * with small reserved space for kernel.
+		 * Set them to CANONICAL addresses.
+		 */
+		pdd->gpuvm_base = SVM_USER_BASE;
+		pdd->gpuvm_limit =
+			pdd->dev->shared_resources.gpuvm_size - 1;
+	} else {
+		/* set them to non CANONICAL addresses, and no SVM is
+		 * allocated.
+		 */
+		pdd->gpuvm_base = MAKE_GPUVM_APP_BASE_VI(id + 1);
+		pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base,
+				pdd->dev->shared_resources.gpuvm_size);
+	}
+
+	pdd->scratch_base = MAKE_SCRATCH_APP_BASE_VI();
+	pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+}
+
+static void kfd_init_apertures_v9(struct kfd_process_device *pdd, uint8_t id)
+{
+	pdd->lds_base = MAKE_LDS_APP_BASE_V9();
+	pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
+
+	/* Raven needs SVM to support graphic handle, etc. Leave the small
+	 * reserved space before SVM on Raven as well, even though we don't
+	 * have to.
+	 * Set gpuvm_base and gpuvm_limit to CANONICAL addresses so that they
+	 * are used in Thunk to reserve SVM.
+	 */
+	pdd->gpuvm_base = SVM_USER_BASE;
+	pdd->gpuvm_limit =
+		pdd->dev->shared_resources.gpuvm_size - 1;
+
+	pdd->scratch_base = MAKE_SCRATCH_APP_BASE_V9();
+	pdd->scratch_limit = MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+}
+
 int kfd_init_apertures(struct kfd_process *process)
 {
 	uint8_t id  = 0;
@@ -316,7 +377,7 @@ int kfd_init_apertures(struct kfd_process *process)
 		pdd = kfd_create_process_device_data(dev, process);
 		if (!pdd) {
 			pr_err("Failed to create process device data\n");
-			return -1;
+			return -ENOMEM;
 		}
 		/*
 		 * For 64 bit process apertures will be statically reserved in
@@ -328,32 +389,30 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_base = pdd->gpuvm_limit = 0;
 			pdd->scratch_base = pdd->scratch_limit = 0;
 		} else {
-			/* Same LDS and scratch apertures can be used
-			 * on all GPUs. This allows using more dGPUs
-			 * than placement options for apertures.
-			 */
-			pdd->lds_base = MAKE_LDS_APP_BASE();
-			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
-
-			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
-			pdd->scratch_limit =
-				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+			switch (dev->device_info->asic_family) {
+			case CHIP_KAVERI:
+			case CHIP_HAWAII:
+			case CHIP_CARRIZO:
+			case CHIP_TONGA:
+			case CHIP_FIJI:
+			case CHIP_POLARIS10:
+			case CHIP_POLARIS11:
+				kfd_init_apertures_vi(pdd, id);
+				break;
+			case CHIP_VEGA10:
+			case CHIP_RAVEN:
+				kfd_init_apertures_v9(pdd, id);
+				break;
+			default:
+				WARN(1, "Unexpected ASIC family %u",
+				     dev->device_info->asic_family);
+				return -EINVAL;
+			}
 
-			if (dev->device_info->needs_iommu_device) {
-				/* APUs: GPUVM aperture in
-				 * non-canonical address space
-				 */
-				pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
-				pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(
-					pdd->gpuvm_base,
-					dev->shared_resources.gpuvm_size);
-			} else {
-				/* dGPUs: SVM aperture starting at 0
-				 * with small reserved space for kernel
+			if (!dev->device_info->needs_iommu_device) {
+				/* dGPUs: the reserved space for kernel
+				 * before SVM
 				 */
-				pdd->gpuvm_base = SVM_USER_BASE;
-				pdd->gpuvm_limit =
-					dev->shared_resources.gpuvm_size - 1;
 				pdd->qpd.cwsr_base = SVM_CWSR_BASE;
 				pdd->qpd.ib_base = SVM_IB_BASE;
 			}
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9 Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs Felix Kuehling
                     ` (4 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Jay Cornwall, Shaoyun Liu

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 67309 bytes --]

Signed-off-by: Shaoyun Liu <Shaoyun.Liu-5C7GfCeVMHo@public.gmane.org>
Signed-off-by: Jay Cornwall <Jay.Cornwall-5C7GfCeVMHo@public.gmane.org>
Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   13 +-
 2 files changed, 1505 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
new file mode 100644
index 0000000..da09794
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
@@ -0,0 +1,1495 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#if 0
+HW (GFX9) source code for CWSR trap handler
+#Version 18 + multiple trap handler
+
+// this performance-optimal version was originally from Seven Xu at SRDC
+
+// Revison #18	 --...
+/* Rev History
+** #1. Branch from gc dv.   //gfxip/gfx9/main/src/test/suites/block/cs/sr/cs_trap_handler.sp3#1,#50, #51, #52-53(Skip, Already Fixed by PV), #54-56(merged),#57-58(mergerd, skiped-already fixed by PV)
+** #4. SR Memory Layout:
+**			 1. VGPR-SGPR-HWREG-{LDS}
+**			 2. tba_hi.bits.26 - reconfigured as the first wave in tg bits, for defer Save LDS for a threadgroup.. performance concern..
+** #5. Update: 1. Accurate g8sr_ts_save_d timestamp
+** #6. Update: 1. Fix s_barrier usage; 2. VGPR s/r using swizzle buffer?(NoNeed, already matched the swizzle pattern, more investigation)
+** #7. Update: 1. don't barrier if noLDS
+** #8. Branch: 1. Branch to ver#0, which is very similar to gc dv version
+**	       2. Fix SQ issue by s_sleep 2
+** #9. Update: 1. Fix scc restore failed issue, restore wave_status at last
+**	       2. optimize s_buffer save by burst 16sgprs...
+** #10. Update 1. Optimize restore sgpr by busrt 16 sgprs.
+** #11. Update 1. Add 2 more timestamp for debug version
+** #12. Update 1. Add VGPR SR using DWx4, some case improve and some case drop performance
+** #13. Integ  1. Always use MUBUF for PV trap shader...
+** #14. Update 1. s_buffer_store soft clause...
+** #15. Update 1. PERF - sclar write with glc:0/mtype0 to allow L2 combine. perf improvement a lot.
+** #16. Update 1. PRRF - UNROLL LDS_DMA got 2500cycle save in IP tree
+** #17. Update 1. FUNC - LDS_DMA has issues while ATC, replace with ds_read/buffer_store for save part[TODO restore part]
+**	       2. PERF - Save LDS before save VGPR to cover LDS save long latency...
+** #18. Update 1. FUNC - Implicitly estore STATUS.VCCZ, which is not writable by s_setreg_b32
+**	       2. FUNC - Handle non-CWSR traps
+*/
+
+var G8SR_WDMEM_HWREG_OFFSET = 0
+var G8SR_WDMEM_SGPR_OFFSET  = 128  // in bytes
+
+// Keep definition same as the app shader, These 2 time stamps are part of the app shader... Should before any Save and after restore.
+
+var G8SR_DEBUG_TIMESTAMP = 0
+var G8SR_DEBUG_TS_SAVE_D_OFFSET = 40*4	// ts_save_d timestamp offset relative to SGPR_SR_memory_offset
+var s_g8sr_ts_save_s	= s[34:35]   // save start
+var s_g8sr_ts_sq_save_msg  = s[36:37]	// The save shader send SAVEWAVE msg to spi
+var s_g8sr_ts_spi_wrexec   = s[38:39]	// the SPI write the sr address to SQ
+var s_g8sr_ts_save_d	= s[40:41]   // save end
+var s_g8sr_ts_restore_s = s[42:43]   // restore start
+var s_g8sr_ts_restore_d = s[44:45]   // restore end
+
+var G8SR_VGPR_SR_IN_DWX4 = 0
+var G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4 = 0x00100000	 // DWx4 stride is 4*4Bytes
+var G8SR_RESTORE_BUF_RSRC_WORD1_STRIDE_DWx4  = G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4
+
+
+/*************************************************************************/
+/*		    control on how to run the shader			 */
+/*************************************************************************/
+//any hack that needs to be made to run this code in EMU (either because various EMU code are not ready or no compute save & restore in EMU run)
+var EMU_RUN_HACK		    =	0
+var EMU_RUN_HACK_RESTORE_NORMAL	    =	0
+var EMU_RUN_HACK_SAVE_NORMAL_EXIT   =	0
+var EMU_RUN_HACK_SAVE_SINGLE_WAVE   =	0
+var EMU_RUN_HACK_SAVE_FIRST_TIME    =	0		    //for interrupted restore in which the first save is through EMU_RUN_HACK
+var SAVE_LDS			    =	1
+var WG_BASE_ADDR_LO		    =	0x9000a000
+var WG_BASE_ADDR_HI		    =	0x0
+var WAVE_SPACE			    =	0x5000		    //memory size that each wave occupies in workgroup state mem
+var CTX_SAVE_CONTROL		    =	0x0
+var CTX_RESTORE_CONTROL		    =	CTX_SAVE_CONTROL
+var SIM_RUN_HACK		    =	0		    //any hack that needs to be made to run this code in SIM (either because various RTL code are not ready or no compute save & restore in RTL run)
+var SGPR_SAVE_USE_SQC		    =	1		    //use SQC D$ to do the write
+var USE_MTBUF_INSTEAD_OF_MUBUF	    =	0		    //because TC EMU currently asserts on 0 of // overload DFMT field to carry 4 more bits of stride for MUBUF opcodes
+var SWIZZLE_EN			    =	0		    //whether we use swizzled buffer addressing
+var ACK_SQC_STORE		    =	1		    //workaround for suspected SQC store bug causing incorrect stores under concurrency
+
+/**************************************************************************/
+/*			variables					  */
+/**************************************************************************/
+var SQ_WAVE_STATUS_INST_ATC_SHIFT  = 23
+var SQ_WAVE_STATUS_INST_ATC_MASK   = 0x00800000
+var SQ_WAVE_STATUS_SPI_PRIO_MASK   = 0x00000006
+var SQ_WAVE_STATUS_HALT_MASK       = 0x2000
+
+var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT	= 12
+var SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE	= 9
+var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT	= 8
+var SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE	= 6
+var SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SHIFT	= 24
+var SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SIZE	= 3			//FIXME	 sq.blk still has 4 bits at this time while SQ programming guide has 3 bits
+
+var SQ_WAVE_TRAPSTS_SAVECTX_MASK    =	0x400
+var SQ_WAVE_TRAPSTS_EXCE_MASK	    =	0x1FF			// Exception mask
+var SQ_WAVE_TRAPSTS_SAVECTX_SHIFT   =	10
+var SQ_WAVE_TRAPSTS_MEM_VIOL_MASK   =	0x100
+var SQ_WAVE_TRAPSTS_MEM_VIOL_SHIFT  =	8
+var SQ_WAVE_TRAPSTS_PRE_SAVECTX_MASK	=   0x3FF
+var SQ_WAVE_TRAPSTS_PRE_SAVECTX_SHIFT	=   0x0
+var SQ_WAVE_TRAPSTS_PRE_SAVECTX_SIZE	=   10
+var SQ_WAVE_TRAPSTS_POST_SAVECTX_MASK	=   0xFFFFF800
+var SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT	=   11
+var SQ_WAVE_TRAPSTS_POST_SAVECTX_SIZE	=   21
+var SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK	=   0x800
+
+var SQ_WAVE_IB_STS_RCNT_SHIFT		=   16			//FIXME
+var SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT	=   15			//FIXME
+var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK	= 0x1F8000
+var SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK_NEG	= 0x00007FFF	//FIXME
+
+var SQ_BUF_RSRC_WORD1_ATC_SHIFT	    =	24
+var SQ_BUF_RSRC_WORD3_MTYPE_SHIFT   =	27
+
+var TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT	=   26			// bits [31:26] unused by SPI debug data
+var TTMP11_SAVE_RCNT_FIRST_REPLAY_MASK	=   0xFC000000
+
+/*	Save	    */
+var S_SAVE_BUF_RSRC_WORD1_STRIDE	=   0x00040000		//stride is 4 bytes
+var S_SAVE_BUF_RSRC_WORD3_MISC		=   0x00807FAC		//SQ_SEL_X/Y/Z/W, BUF_NUM_FORMAT_FLOAT, (0 for MUBUF stride[17:14] when ADD_TID_ENABLE and BUF_DATA_FORMAT_32 for MTBUF), ADD_TID_ENABLE
+
+var S_SAVE_SPI_INIT_ATC_MASK		=   0x08000000		//bit[27]: ATC bit
+var S_SAVE_SPI_INIT_ATC_SHIFT		=   27
+var S_SAVE_SPI_INIT_MTYPE_MASK		=   0x70000000		//bit[30:28]: Mtype
+var S_SAVE_SPI_INIT_MTYPE_SHIFT		=   28
+var S_SAVE_SPI_INIT_FIRST_WAVE_MASK	=   0x04000000		//bit[26]: FirstWaveInTG
+var S_SAVE_SPI_INIT_FIRST_WAVE_SHIFT	=   26
+
+var S_SAVE_PC_HI_RCNT_SHIFT		=   28			//FIXME	 check with Brian to ensure all fields other than PC[47:0] can be used
+var S_SAVE_PC_HI_RCNT_MASK		=   0xF0000000		//FIXME
+var S_SAVE_PC_HI_FIRST_REPLAY_SHIFT	=   27			//FIXME
+var S_SAVE_PC_HI_FIRST_REPLAY_MASK	=   0x08000000		//FIXME
+
+var s_save_spi_init_lo		    =	exec_lo
+var s_save_spi_init_hi		    =	exec_hi
+
+var s_save_pc_lo	    =	ttmp0		//{TTMP1, TTMP0} = {3¡¯h0,pc_rewind[3:0], HT[0],trapID[7:0], PC[47:0]}
+var s_save_pc_hi	    =	ttmp1
+var s_save_exec_lo	    =	ttmp2
+var s_save_exec_hi	    =	ttmp3
+var s_save_tmp		    =	ttmp4
+var s_save_trapsts	    =	ttmp5		//not really used until the end of the SAVE routine
+var s_save_xnack_mask_lo    =	ttmp6
+var s_save_xnack_mask_hi    =	ttmp7
+var s_save_buf_rsrc0	    =	ttmp8
+var s_save_buf_rsrc1	    =	ttmp9
+var s_save_buf_rsrc2	    =	ttmp10
+var s_save_buf_rsrc3	    =	ttmp11
+var s_save_status	    =	ttmp12
+var s_save_mem_offset	    =	ttmp14
+var s_save_alloc_size	    =	s_save_trapsts		//conflict
+var s_save_m0		    =	ttmp15
+var s_save_ttmps_lo	    =	s_save_tmp		//no conflict
+var s_save_ttmps_hi	    =	s_save_trapsts		//no conflict
+
+/*	Restore	    */
+var S_RESTORE_BUF_RSRC_WORD1_STRIDE	    =	S_SAVE_BUF_RSRC_WORD1_STRIDE
+var S_RESTORE_BUF_RSRC_WORD3_MISC	    =	S_SAVE_BUF_RSRC_WORD3_MISC
+
+var S_RESTORE_SPI_INIT_ATC_MASK		    =	0x08000000	    //bit[27]: ATC bit
+var S_RESTORE_SPI_INIT_ATC_SHIFT	    =	27
+var S_RESTORE_SPI_INIT_MTYPE_MASK	    =	0x70000000	    //bit[30:28]: Mtype
+var S_RESTORE_SPI_INIT_MTYPE_SHIFT	    =	28
+var S_RESTORE_SPI_INIT_FIRST_WAVE_MASK	    =	0x04000000	    //bit[26]: FirstWaveInTG
+var S_RESTORE_SPI_INIT_FIRST_WAVE_SHIFT	    =	26
+
+var S_RESTORE_PC_HI_RCNT_SHIFT		    =	S_SAVE_PC_HI_RCNT_SHIFT
+var S_RESTORE_PC_HI_RCNT_MASK		    =	S_SAVE_PC_HI_RCNT_MASK
+var S_RESTORE_PC_HI_FIRST_REPLAY_SHIFT	    =	S_SAVE_PC_HI_FIRST_REPLAY_SHIFT
+var S_RESTORE_PC_HI_FIRST_REPLAY_MASK	    =	S_SAVE_PC_HI_FIRST_REPLAY_MASK
+
+var s_restore_spi_init_lo		    =	exec_lo
+var s_restore_spi_init_hi		    =	exec_hi
+
+var s_restore_mem_offset	=   ttmp12
+var s_restore_alloc_size	=   ttmp3
+var s_restore_tmp		=   ttmp2
+var s_restore_mem_offset_save	=   s_restore_tmp	//no conflict
+
+var s_restore_m0	    =	s_restore_alloc_size	//no conflict
+
+var s_restore_mode	    =	ttmp7
+
+var s_restore_pc_lo	    =	ttmp0
+var s_restore_pc_hi	    =	ttmp1
+var s_restore_exec_lo	    =	ttmp14
+var s_restore_exec_hi	    = 	ttmp15
+var s_restore_status	    =	ttmp4
+var s_restore_trapsts	    =	ttmp5
+var s_restore_xnack_mask_lo =	xnack_mask_lo
+var s_restore_xnack_mask_hi =	xnack_mask_hi
+var s_restore_buf_rsrc0	    =	ttmp8
+var s_restore_buf_rsrc1	    =	ttmp9
+var s_restore_buf_rsrc2	    =	ttmp10
+var s_restore_buf_rsrc3	    =	ttmp11
+var s_restore_ttmps_lo	    =	s_restore_tmp		//no conflict
+var s_restore_ttmps_hi	    =	s_restore_alloc_size	//no conflict
+
+/**************************************************************************/
+/*			trap handler entry points			  */
+/**************************************************************************/
+/* Shader Main*/
+
+shader main
+  asic(GFX9)
+  type(CS)
+
+
+    if ((EMU_RUN_HACK) && (!EMU_RUN_HACK_RESTORE_NORMAL))		    //hack to use trap_id for determining save/restore
+	//FIXME VCCZ un-init assertion s_getreg_b32	s_save_status, hwreg(HW_REG_STATUS)	    //save STATUS since we will change SCC
+	s_and_b32 s_save_tmp, s_save_pc_hi, 0xffff0000		    //change SCC
+	s_cmp_eq_u32 s_save_tmp, 0x007e0000			    //Save: trap_id = 0x7e. Restore: trap_id = 0x7f.
+	s_cbranch_scc0 L_JUMP_TO_RESTORE			    //do not need to recover STATUS here  since we are going to RESTORE
+	//FIXME	 s_setreg_b32	hwreg(HW_REG_STATUS),	s_save_status	    //need to recover STATUS since we are going to SAVE
+	s_branch L_SKIP_RESTORE					    //NOT restore, SAVE actually
+    else
+	s_branch L_SKIP_RESTORE					    //NOT restore. might be a regular trap or save
+    end
+
+L_JUMP_TO_RESTORE:
+    s_branch L_RESTORE						    //restore
+
+L_SKIP_RESTORE:
+
+    s_getreg_b32    s_save_status, hwreg(HW_REG_STATUS)				    //save STATUS since we will change SCC
+    s_andn2_b32	    s_save_status, s_save_status, SQ_WAVE_STATUS_SPI_PRIO_MASK	    //check whether this is for save
+    s_getreg_b32    s_save_trapsts, hwreg(HW_REG_TRAPSTS)
+    s_and_b32       ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_SAVECTX_MASK    //check whether this is for save
+    s_cbranch_scc1  L_SAVE					//this is the operation for save
+
+    // *********    Handle non-CWSR traps	*******************
+if (!EMU_RUN_HACK)
+    // Illegal instruction is a non-maskable exception which blocks context save.
+    // Halt the wavefront and return from the trap.
+    s_and_b32       ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_ILLEGAL_INST_MASK
+    s_cbranch_scc1  L_HALT_WAVE
+
+    // If STATUS.MEM_VIOL is asserted then we cannot fetch from the TMA.
+    // Instead, halt the wavefront and return from the trap.
+    s_and_b32       ttmp2, s_save_trapsts, SQ_WAVE_TRAPSTS_MEM_VIOL_MASK
+    s_cbranch_scc0  L_FETCH_2ND_TRAP
+
+L_HALT_WAVE:
+    // If STATUS.HALT is set then this fault must come from SQC instruction fetch.
+    // We cannot prevent further faults so just terminate the wavefront.
+    s_and_b32       ttmp2, s_save_status, SQ_WAVE_STATUS_HALT_MASK
+    s_cbranch_scc0  L_NOT_ALREADY_HALTED
+    s_endpgm
+L_NOT_ALREADY_HALTED:
+    s_or_b32        s_save_status, s_save_status, SQ_WAVE_STATUS_HALT_MASK
+
+    // If the PC points to S_ENDPGM then context save will fail if STATUS.HALT is set.
+    // Rewind the PC to prevent this from occurring. The debugger compensates for this.
+    s_sub_u32       ttmp0, ttmp0, 0x8
+    s_subb_u32      ttmp1, ttmp1, 0x0
+
+L_FETCH_2ND_TRAP:
+    // Preserve and clear scalar XNACK state before issuing scalar reads.
+    // Save IB_STS.FIRST_REPLAY[15] and IB_STS.RCNT[20:16] into unused space ttmp11[31:26].
+    s_getreg_b32    ttmp2, hwreg(HW_REG_IB_STS)
+    s_and_b32       ttmp3, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK
+    s_lshl_b32      ttmp3, ttmp3, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT)
+    s_andn2_b32     ttmp11, ttmp11, TTMP11_SAVE_RCNT_FIRST_REPLAY_MASK
+    s_or_b32        ttmp11, ttmp11, ttmp3
+
+    s_andn2_b32     ttmp2, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK
+    s_setreg_b32    hwreg(HW_REG_IB_STS), ttmp2
+
+    // Read second-level TBA/TMA from first-level TMA and jump if available.
+    // ttmp[2:5] and ttmp12 can be used (others hold SPI-initialized debug data)
+    // ttmp12 holds SQ_WAVE_STATUS
+    s_getreg_b32    ttmp4, hwreg(HW_REG_SQ_SHADER_TMA_LO)
+    s_getreg_b32    ttmp5, hwreg(HW_REG_SQ_SHADER_TMA_HI)
+    s_lshl_b64      [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8
+    s_load_dwordx2  [ttmp2, ttmp3], [ttmp4, ttmp5], 0x0 glc:1 // second-level TBA
+    s_waitcnt       lgkmcnt(0)
+    s_load_dwordx2  [ttmp4, ttmp5], [ttmp4, ttmp5], 0x8 glc:1 // second-level TMA
+    s_waitcnt       lgkmcnt(0)
+    s_and_b64       [ttmp2, ttmp3], [ttmp2, ttmp3], [ttmp2, ttmp3]
+    s_cbranch_scc0  L_NO_NEXT_TRAP // second-level trap handler not been set
+    s_setpc_b64     [ttmp2, ttmp3] // jump to second-level trap handler
+
+L_NO_NEXT_TRAP:
+    s_getreg_b32    s_save_trapsts, hwreg(HW_REG_TRAPSTS)
+    s_and_b32	    s_save_trapsts, s_save_trapsts, SQ_WAVE_TRAPSTS_EXCE_MASK // Check whether it is an exception
+    s_cbranch_scc1  L_EXCP_CASE	  // Exception, jump back to the shader program directly.
+    s_add_u32	    ttmp0, ttmp0, 4   // S_TRAP case, add 4 to ttmp0
+    s_addc_u32	ttmp1, ttmp1, 0
+L_EXCP_CASE:
+    s_and_b32	ttmp1, ttmp1, 0xFFFF
+
+    // Restore SQ_WAVE_IB_STS.
+    s_lshr_b32      ttmp2, ttmp11, (TTMP11_SAVE_RCNT_FIRST_REPLAY_SHIFT - SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT)
+    s_and_b32       ttmp2, ttmp2, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK
+    s_setreg_b32    hwreg(HW_REG_IB_STS), ttmp2
+
+    // Restore SQ_WAVE_STATUS.
+    s_and_b64       exec, exec, exec // Restore STATUS.EXECZ, not writable by s_setreg_b32
+    s_and_b64       vcc, vcc, vcc    // Restore STATUS.VCCZ, not writable by s_setreg_b32
+    s_setreg_b32    hwreg(HW_REG_STATUS), s_save_status
+
+    s_rfe_b64       [ttmp0, ttmp1]
+end
+    // *********	End handling of non-CWSR traps	 *******************
+
+/**************************************************************************/
+/*			save routine					  */
+/**************************************************************************/
+
+L_SAVE:
+
+if G8SR_DEBUG_TIMESTAMP
+	s_memrealtime	s_g8sr_ts_save_s
+	s_waitcnt lgkmcnt(0)	     //FIXME, will cause xnack??
+end
+
+    s_and_b32	    s_save_pc_hi, s_save_pc_hi, 0x0000ffff    //pc[47:32]
+
+    s_mov_b32	    s_save_tmp, 0							    //clear saveCtx bit
+    s_setreg_b32    hwreg(HW_REG_TRAPSTS, SQ_WAVE_TRAPSTS_SAVECTX_SHIFT, 1), s_save_tmp	    //clear saveCtx bit
+
+    s_getreg_b32    s_save_tmp, hwreg(HW_REG_IB_STS, SQ_WAVE_IB_STS_RCNT_SHIFT, SQ_WAVE_IB_STS_RCNT_SIZE)		    //save RCNT
+    s_lshl_b32	    s_save_tmp, s_save_tmp, S_SAVE_PC_HI_RCNT_SHIFT
+    s_or_b32	    s_save_pc_hi, s_save_pc_hi, s_save_tmp
+    s_getreg_b32    s_save_tmp, hwreg(HW_REG_IB_STS, SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT, SQ_WAVE_IB_STS_FIRST_REPLAY_SIZE)   //save FIRST_REPLAY
+    s_lshl_b32	    s_save_tmp, s_save_tmp, S_SAVE_PC_HI_FIRST_REPLAY_SHIFT
+    s_or_b32	    s_save_pc_hi, s_save_pc_hi, s_save_tmp
+    s_getreg_b32    s_save_tmp, hwreg(HW_REG_IB_STS)					    //clear RCNT and FIRST_REPLAY in IB_STS
+    s_and_b32	    s_save_tmp, s_save_tmp, SQ_WAVE_IB_STS_RCNT_FIRST_REPLAY_MASK_NEG
+
+    s_setreg_b32    hwreg(HW_REG_IB_STS), s_save_tmp
+
+    /*	    inform SPI the readiness and wait for SPI's go signal */
+    s_mov_b32	    s_save_exec_lo, exec_lo						    //save EXEC and use EXEC for the go signal from SPI
+    s_mov_b32	    s_save_exec_hi, exec_hi
+    s_mov_b64	    exec,   0x0								    //clear EXEC to get ready to receive
+
+if G8SR_DEBUG_TIMESTAMP
+	s_memrealtime  s_g8sr_ts_sq_save_msg
+	s_waitcnt lgkmcnt(0)
+end
+
+    if (EMU_RUN_HACK)
+
+    else
+	s_sendmsg   sendmsg(MSG_SAVEWAVE)  //send SPI a message and wait for SPI's write to EXEC
+    end
+
+  L_SLEEP:
+    s_sleep 0x2		       // sleep 1 (64clk) is not enough for 8 waves per SIMD, which will cause SQ hang, since the 7,8th wave could not get arbit to exec inst, while other waves are stuck into the sleep-loop and waiting for wrexec!=0
+
+    if (EMU_RUN_HACK)
+
+    else
+	s_cbranch_execz L_SLEEP
+    end
+
+if G8SR_DEBUG_TIMESTAMP
+	s_memrealtime  s_g8sr_ts_spi_wrexec
+	s_waitcnt lgkmcnt(0)
+end
+
+    if ((EMU_RUN_HACK) && (!EMU_RUN_HACK_SAVE_SINGLE_WAVE))
+	//calculate wd_addr using absolute thread id
+	v_readlane_b32 s_save_tmp, v9, 0
+	s_lshr_b32 s_save_tmp, s_save_tmp, 6
+	s_mul_i32 s_save_tmp, s_save_tmp, WAVE_SPACE
+	s_add_i32 s_save_spi_init_lo, s_save_tmp, WG_BASE_ADDR_LO
+	s_mov_b32 s_save_spi_init_hi, WG_BASE_ADDR_HI
+	s_and_b32 s_save_spi_init_hi, s_save_spi_init_hi, CTX_SAVE_CONTROL
+    else
+    end
+    if ((EMU_RUN_HACK) && (EMU_RUN_HACK_SAVE_SINGLE_WAVE))
+	s_add_i32 s_save_spi_init_lo, s_save_tmp, WG_BASE_ADDR_LO
+	s_mov_b32 s_save_spi_init_hi, WG_BASE_ADDR_HI
+	s_and_b32 s_save_spi_init_hi, s_save_spi_init_hi, CTX_SAVE_CONTROL
+    else
+    end
+
+    // Save trap temporaries 6-11, 13-15 initialized by SPI debug dispatch logic
+    // ttmp SR memory offset : size(VGPR)+size(SGPR)+0x40
+    get_vgpr_size_bytes(s_save_ttmps_lo)
+    get_sgpr_size_bytes(s_save_ttmps_hi)
+    s_add_u32	    s_save_ttmps_lo, s_save_ttmps_lo, s_save_ttmps_hi
+    s_add_u32	    s_save_ttmps_lo, s_save_ttmps_lo, s_save_spi_init_lo
+    s_addc_u32	    s_save_ttmps_hi, s_save_spi_init_hi, 0x0
+    s_and_b32	    s_save_ttmps_hi, s_save_ttmps_hi, 0xFFFF
+    s_store_dwordx2 [ttmp6, ttmp7], [s_save_ttmps_lo, s_save_ttmps_hi], 0x40 glc:1
+    ack_sqc_store_workaround()
+    s_store_dwordx4 [ttmp8, ttmp9, ttmp10, ttmp11], [s_save_ttmps_lo, s_save_ttmps_hi], 0x48 glc:1
+    ack_sqc_store_workaround()
+    s_store_dword   ttmp13, [s_save_ttmps_lo, s_save_ttmps_hi], 0x58 glc:1
+    ack_sqc_store_workaround()
+    s_store_dwordx2 [ttmp14, ttmp15], [s_save_ttmps_lo, s_save_ttmps_hi], 0x5C glc:1
+    ack_sqc_store_workaround()
+
+    /*	    setup Resource Contants    */
+    s_mov_b32	    s_save_buf_rsrc0,	s_save_spi_init_lo							//base_addr_lo
+    s_and_b32	    s_save_buf_rsrc1,	s_save_spi_init_hi, 0x0000FFFF						//base_addr_hi
+    s_or_b32	    s_save_buf_rsrc1,	s_save_buf_rsrc1,  S_SAVE_BUF_RSRC_WORD1_STRIDE
+    s_mov_b32	    s_save_buf_rsrc2,	0									//NUM_RECORDS initial value = 0 (in bytes) although not neccessarily inited
+    s_mov_b32	    s_save_buf_rsrc3,	S_SAVE_BUF_RSRC_WORD3_MISC
+    s_and_b32	    s_save_tmp,		s_save_spi_init_hi, S_SAVE_SPI_INIT_ATC_MASK
+    s_lshr_b32	    s_save_tmp,		s_save_tmp, (S_SAVE_SPI_INIT_ATC_SHIFT-SQ_BUF_RSRC_WORD1_ATC_SHIFT)	    //get ATC bit into position
+    s_or_b32	    s_save_buf_rsrc3,	s_save_buf_rsrc3,  s_save_tmp						//or ATC
+    s_and_b32	    s_save_tmp,		s_save_spi_init_hi, S_SAVE_SPI_INIT_MTYPE_MASK
+    s_lshr_b32	    s_save_tmp,		s_save_tmp, (S_SAVE_SPI_INIT_MTYPE_SHIFT-SQ_BUF_RSRC_WORD3_MTYPE_SHIFT)	    //get MTYPE bits into position
+    s_or_b32	    s_save_buf_rsrc3,	s_save_buf_rsrc3,  s_save_tmp						//or MTYPE
+
+    //FIXME  right now s_save_m0/s_save_mem_offset use tma_lo/tma_hi  (might need to save them before using them?)
+    s_mov_b32	    s_save_m0,		m0								    //save M0
+
+    /*	    global mem offset		*/
+    s_mov_b32	    s_save_mem_offset,	0x0									//mem offset initial value = 0
+
+
+
+
+    /*	    save HW registers	*/
+    //////////////////////////////
+
+  L_SAVE_HWREG:
+	// HWREG SR memory offset : size(VGPR)+size(SGPR)
+       get_vgpr_size_bytes(s_save_mem_offset)
+       get_sgpr_size_bytes(s_save_tmp)
+       s_add_u32 s_save_mem_offset, s_save_mem_offset, s_save_tmp
+
+
+    s_mov_b32	    s_save_buf_rsrc2, 0x4				//NUM_RECORDS	in bytes
+    if (SWIZZLE_EN)
+	s_add_u32	s_save_buf_rsrc2, s_save_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_save_buf_rsrc2,  0x1000000				    //NUM_RECORDS in bytes
+    end
+
+
+    write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset)			//M0
+
+    if ((EMU_RUN_HACK) && (EMU_RUN_HACK_SAVE_FIRST_TIME))
+	s_add_u32 s_save_pc_lo, s_save_pc_lo, 4		    //pc[31:0]+4
+	s_addc_u32 s_save_pc_hi, s_save_pc_hi, 0x0	    //carry bit over
+    end
+
+    write_hwreg_to_mem(s_save_pc_lo, s_save_buf_rsrc0, s_save_mem_offset)		    //PC
+    write_hwreg_to_mem(s_save_pc_hi, s_save_buf_rsrc0, s_save_mem_offset)
+    write_hwreg_to_mem(s_save_exec_lo, s_save_buf_rsrc0, s_save_mem_offset)		//EXEC
+    write_hwreg_to_mem(s_save_exec_hi, s_save_buf_rsrc0, s_save_mem_offset)
+    write_hwreg_to_mem(s_save_status, s_save_buf_rsrc0, s_save_mem_offset)		//STATUS
+
+    //s_save_trapsts conflicts with s_save_alloc_size
+    s_getreg_b32    s_save_trapsts, hwreg(HW_REG_TRAPSTS)
+    write_hwreg_to_mem(s_save_trapsts, s_save_buf_rsrc0, s_save_mem_offset)		//TRAPSTS
+
+    write_hwreg_to_mem(xnack_mask_lo, s_save_buf_rsrc0, s_save_mem_offset)	    //XNACK_MASK_LO
+    write_hwreg_to_mem(xnack_mask_hi, s_save_buf_rsrc0, s_save_mem_offset)	    //XNACK_MASK_HI
+
+    //use s_save_tmp would introduce conflict here between s_save_tmp and s_save_buf_rsrc2
+    s_getreg_b32    s_save_m0, hwreg(HW_REG_MODE)						    //MODE
+    write_hwreg_to_mem(s_save_m0, s_save_buf_rsrc0, s_save_mem_offset)
+
+
+
+    /*	    the first wave in the threadgroup	 */
+    s_and_b32	    s_save_tmp, s_save_spi_init_hi, S_SAVE_SPI_INIT_FIRST_WAVE_MASK	// extract fisrt wave bit
+    s_mov_b32	     s_save_exec_hi, 0x0
+    s_or_b32	     s_save_exec_hi, s_save_tmp, s_save_exec_hi				 // save first wave bit to s_save_exec_hi.bits[26]
+
+
+    /*		save SGPRs	*/
+	// Save SGPR before LDS save, then the s0 to s4 can be used during LDS save...
+    //////////////////////////////
+
+    // SGPR SR memory offset : size(VGPR)
+    get_vgpr_size_bytes(s_save_mem_offset)
+    // TODO, change RSRC word to rearrange memory layout for SGPRS
+
+    s_getreg_b32    s_save_alloc_size, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SIZE)		//spgr_size
+    s_add_u32	    s_save_alloc_size, s_save_alloc_size, 1
+    s_lshl_b32	    s_save_alloc_size, s_save_alloc_size, 4			    //Number of SGPRs = (sgpr_size + 1) * 16   (non-zero value)
+
+    if (SGPR_SAVE_USE_SQC)
+	s_lshl_b32	s_save_buf_rsrc2,   s_save_alloc_size, 2		    //NUM_RECORDS in bytes
+    else
+	s_lshl_b32	s_save_buf_rsrc2,   s_save_alloc_size, 8		    //NUM_RECORDS in bytes (64 threads)
+    end
+
+    if (SWIZZLE_EN)
+	s_add_u32	s_save_buf_rsrc2, s_save_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_save_buf_rsrc2,  0x1000000				    //NUM_RECORDS in bytes
+    end
+
+
+    // backup s_save_buf_rsrc0,1 to s_save_pc_lo/hi, since write_16sgpr_to_mem function will change the rsrc0
+    //s_mov_b64 s_save_pc_lo, s_save_buf_rsrc0
+    s_mov_b64 s_save_xnack_mask_lo, s_save_buf_rsrc0
+    s_add_u32 s_save_buf_rsrc0, s_save_buf_rsrc0, s_save_mem_offset
+    s_addc_u32 s_save_buf_rsrc1, s_save_buf_rsrc1, 0
+
+    s_mov_b32	    m0, 0x0			    //SGPR initial index value =0
+    s_nop	    0x0				    //Manually inserted wait states
+  L_SAVE_SGPR_LOOP:
+    // SGPR is allocated in 16 SGPR granularity
+    s_movrels_b64   s0, s0     //s0 = s[0+m0], s1 = s[1+m0]
+    s_movrels_b64   s2, s2     //s2 = s[2+m0], s3 = s[3+m0]
+    s_movrels_b64   s4, s4     //s4 = s[4+m0], s5 = s[5+m0]
+    s_movrels_b64   s6, s6     //s6 = s[6+m0], s7 = s[7+m0]
+    s_movrels_b64   s8, s8     //s8 = s[8+m0], s9 = s[9+m0]
+    s_movrels_b64   s10, s10   //s10 = s[10+m0], s11 = s[11+m0]
+    s_movrels_b64   s12, s12   //s12 = s[12+m0], s13 = s[13+m0]
+    s_movrels_b64   s14, s14   //s14 = s[14+m0], s15 = s[15+m0]
+
+    write_16sgpr_to_mem(s0, s_save_buf_rsrc0, s_save_mem_offset) //PV: the best performance should be using s_buffer_store_dwordx4
+    s_add_u32	    m0, m0, 16							    //next sgpr index
+    s_cmp_lt_u32    m0, s_save_alloc_size					    //scc = (m0 < s_save_alloc_size) ? 1 : 0
+    s_cbranch_scc1  L_SAVE_SGPR_LOOP					//SGPR save is complete?
+    // restore s_save_buf_rsrc0,1
+    //s_mov_b64 s_save_buf_rsrc0, s_save_pc_lo
+    s_mov_b64 s_save_buf_rsrc0, s_save_xnack_mask_lo
+
+
+
+
+    /*		save first 4 VGPR, then LDS save could use   */
+	// each wave will alloc 4 vgprs at least...
+    /////////////////////////////////////////////////////////////////////////////////////
+
+    s_mov_b32	    s_save_mem_offset, 0
+    s_mov_b32	    exec_lo, 0xFFFFFFFF						    //need every thread from now on
+    s_mov_b32	    exec_hi, 0xFFFFFFFF
+    s_mov_b32	    xnack_mask_lo, 0x0
+    s_mov_b32	    xnack_mask_hi, 0x0
+
+    if (SWIZZLE_EN)
+	s_add_u32	s_save_buf_rsrc2, s_save_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_save_buf_rsrc2,  0x1000000				    //NUM_RECORDS in bytes
+    end
+
+
+    // VGPR Allocated in 4-GPR granularity
+
+if G8SR_VGPR_SR_IN_DWX4
+	// the const stride for DWx4 is 4*4 bytes
+	s_and_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+	s_or_b32  s_save_buf_rsrc1, s_save_buf_rsrc1, G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4  // const stride to 4*4 bytes
+
+	buffer_store_dwordx4 v0, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1
+
+	s_and_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+	s_or_b32  s_save_buf_rsrc1, s_save_buf_rsrc1, S_SAVE_BUF_RSRC_WORD1_STRIDE  // reset const stride to 4 bytes
+else
+	buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1
+	buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256
+	buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256*2
+	buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256*3
+end
+
+
+
+    /*		save LDS	*/
+    //////////////////////////////
+
+  L_SAVE_LDS:
+
+	// Change EXEC to all threads...
+    s_mov_b32	    exec_lo, 0xFFFFFFFF	  //need every thread from now on
+    s_mov_b32	    exec_hi, 0xFFFFFFFF
+
+    s_getreg_b32    s_save_alloc_size, hwreg(HW_REG_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE)		    //lds_size
+    s_and_b32	    s_save_alloc_size, s_save_alloc_size, 0xFFFFFFFF		    //lds_size is zero?
+    s_cbranch_scc0  L_SAVE_LDS_DONE									       //no lds used? jump to L_SAVE_DONE
+
+    s_barrier		    //LDS is used? wait for other waves in the same TG
+    s_and_b32	    s_save_tmp, s_save_exec_hi, S_SAVE_SPI_INIT_FIRST_WAVE_MASK		       //exec is still used here
+    s_cbranch_scc0  L_SAVE_LDS_DONE
+
+	// first wave do LDS save;
+
+    s_lshl_b32	    s_save_alloc_size, s_save_alloc_size, 6			    //LDS size in dwords = lds_size * 64dw
+    s_lshl_b32	    s_save_alloc_size, s_save_alloc_size, 2			    //LDS size in bytes
+    s_mov_b32	    s_save_buf_rsrc2,  s_save_alloc_size			    //NUM_RECORDS in bytes
+
+    // LDS at offset: size(VGPR)+SIZE(SGPR)+SIZE(HWREG)
+    //
+    get_vgpr_size_bytes(s_save_mem_offset)
+    get_sgpr_size_bytes(s_save_tmp)
+    s_add_u32  s_save_mem_offset, s_save_mem_offset, s_save_tmp
+    s_add_u32 s_save_mem_offset, s_save_mem_offset, get_hwreg_size_bytes()
+
+
+    if (SWIZZLE_EN)
+	s_add_u32	s_save_buf_rsrc2, s_save_buf_rsrc2, 0x0	      //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_save_buf_rsrc2,  0x1000000		      //NUM_RECORDS in bytes
+    end
+
+    s_mov_b32	    m0, 0x0						  //lds_offset initial value = 0
+
+
+var LDS_DMA_ENABLE = 0
+var UNROLL = 0
+if UNROLL==0 && LDS_DMA_ENABLE==1
+	s_mov_b32  s3, 256*2
+	s_nop 0
+	s_nop 0
+	s_nop 0
+  L_SAVE_LDS_LOOP:
+	//TODO: looks the 2 buffer_store/load clause for s/r will hurt performance.???
+    if (SAVE_LDS)     //SPI always alloc LDS space in 128DW granularity
+	    buffer_store_lds_dword s_save_buf_rsrc0, s_save_mem_offset lds:1		// first 64DW
+	    buffer_store_lds_dword s_save_buf_rsrc0, s_save_mem_offset lds:1 offset:256 // second 64DW
+    end
+
+    s_add_u32	    m0, m0, s3						//every buffer_store_lds does 256 bytes
+    s_add_u32	    s_save_mem_offset, s_save_mem_offset, s3				//mem offset increased by 256 bytes
+    s_cmp_lt_u32    m0, s_save_alloc_size						//scc=(m0 < s_save_alloc_size) ? 1 : 0
+    s_cbranch_scc1  L_SAVE_LDS_LOOP							//LDS save is complete?
+
+elsif LDS_DMA_ENABLE==1 && UNROLL==1 // UNROOL	, has ichace miss
+      // store from higest LDS address to lowest
+      s_mov_b32	 s3, 256*2
+      s_sub_u32	 m0, s_save_alloc_size, s3
+      s_add_u32 s_save_mem_offset, s_save_mem_offset, m0
+      s_lshr_b32 s_save_alloc_size, s_save_alloc_size, 9   // how many 128 trunks...
+      s_sub_u32 s_save_alloc_size, 128, s_save_alloc_size   // store from higheset addr to lowest
+      s_mul_i32 s_save_alloc_size, s_save_alloc_size, 6*4   // PC offset increment,  each LDS save block cost 6*4 Bytes instruction
+      s_add_u32 s_save_alloc_size, s_save_alloc_size, 3*4   //2is the below 2 inst...//s_addc and s_setpc
+      s_nop 0
+      s_nop 0
+      s_nop 0	//pad 3 dw to let LDS_DMA align with 64Bytes
+      s_getpc_b64 s[0:1]			      // reuse s[0:1], since s[0:1] already saved
+      s_add_u32	  s0, s0,s_save_alloc_size
+      s_addc_u32  s1, s1, 0
+      s_setpc_b64 s[0:1]
+
+
+       for var i =0; i< 128; i++
+	    // be careful to make here a 64Byte aligned address, which could improve performance...
+	    buffer_store_lds_dword s_save_buf_rsrc0, s_save_mem_offset lds:1 offset:0		// first 64DW
+	    buffer_store_lds_dword s_save_buf_rsrc0, s_save_mem_offset lds:1 offset:256		  // second 64DW
+
+	if i!=127
+	s_sub_u32  m0, m0, s3	   // use a sgpr to shrink 2DW-inst to 1DW inst to improve performance , i.e.  pack more LDS_DMA inst to one Cacheline
+	    s_sub_u32  s_save_mem_offset, s_save_mem_offset,  s3
+	    end
+       end
+
+else   // BUFFER_STORE
+      v_mbcnt_lo_u32_b32 v2, 0xffffffff, 0x0
+      v_mbcnt_hi_u32_b32 v3, 0xffffffff, v2	// tid
+      v_mul_i32_i24 v2, v3, 8	// tid*8
+      v_mov_b32 v3, 256*2
+      s_mov_b32 m0, 0x10000
+      s_mov_b32 s0, s_save_buf_rsrc3
+      s_and_b32 s_save_buf_rsrc3, s_save_buf_rsrc3, 0xFF7FFFFF	  // disable add_tid
+      s_or_b32 s_save_buf_rsrc3, s_save_buf_rsrc3, 0x58000   //DFMT
+
+L_SAVE_LDS_LOOP_VECTOR:
+      ds_read_b64 v[0:1], v2	//x =LDS[a], byte address
+      s_waitcnt lgkmcnt(0)
+      buffer_store_dwordx2  v[0:1], v2, s_save_buf_rsrc0, s_save_mem_offset offen:1  glc:1  slc:1
+//	s_waitcnt vmcnt(0)
+//	v_add_u32 v2, vcc[0:1], v2, v3
+      v_add_u32 v2, v2, v3
+      v_cmp_lt_u32 vcc[0:1], v2, s_save_alloc_size
+      s_cbranch_vccnz L_SAVE_LDS_LOOP_VECTOR
+
+      // restore rsrc3
+      s_mov_b32 s_save_buf_rsrc3, s0
+
+end
+
+L_SAVE_LDS_DONE:
+
+
+    /*		save VGPRs  - set the Rest VGPRs	*/
+    //////////////////////////////////////////////////////////////////////////////////////
+  L_SAVE_VGPR:
+    // VGPR SR memory offset: 0
+    // TODO rearrange the RSRC words to use swizzle for VGPR save...
+
+    s_mov_b32	    s_save_mem_offset, (0+256*4)				    // for the rest VGPRs
+    s_mov_b32	    exec_lo, 0xFFFFFFFF						    //need every thread from now on
+    s_mov_b32	    exec_hi, 0xFFFFFFFF
+
+    s_getreg_b32    s_save_alloc_size, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE)		    //vpgr_size
+    s_add_u32	    s_save_alloc_size, s_save_alloc_size, 1
+    s_lshl_b32	    s_save_alloc_size, s_save_alloc_size, 2			    //Number of VGPRs = (vgpr_size + 1) * 4    (non-zero value)	  //FIXME for GFX, zero is possible
+    s_lshl_b32	    s_save_buf_rsrc2,  s_save_alloc_size, 8			    //NUM_RECORDS in bytes (64 threads*4)
+    if (SWIZZLE_EN)
+	s_add_u32	s_save_buf_rsrc2, s_save_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_save_buf_rsrc2,  0x1000000				    //NUM_RECORDS in bytes
+    end
+
+
+    // VGPR Allocated in 4-GPR granularity
+
+if G8SR_VGPR_SR_IN_DWX4
+	// the const stride for DWx4 is 4*4 bytes
+	s_and_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+	s_or_b32  s_save_buf_rsrc1, s_save_buf_rsrc1, G8SR_SAVE_BUF_RSRC_WORD1_STRIDE_DWx4  // const stride to 4*4 bytes
+
+	s_mov_b32	  m0, 4	    // skip first 4 VGPRs
+	s_cmp_lt_u32	  m0, s_save_alloc_size
+	s_cbranch_scc0	  L_SAVE_VGPR_LOOP_END	    // no more vgprs
+
+	s_set_gpr_idx_on  m0, 0x1   // This will change M0
+	s_add_u32	  s_save_alloc_size, s_save_alloc_size, 0x1000	// because above inst change m0
+L_SAVE_VGPR_LOOP:
+	v_mov_b32	  v0, v0   // v0 = v[0+m0]
+	v_mov_b32	  v1, v1
+	v_mov_b32	  v2, v2
+	v_mov_b32	  v3, v3
+
+
+	buffer_store_dwordx4 v0, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1
+	s_add_u32	  m0, m0, 4
+	s_add_u32	  s_save_mem_offset, s_save_mem_offset, 256*4
+	s_cmp_lt_u32	  m0, s_save_alloc_size
+    s_cbranch_scc1  L_SAVE_VGPR_LOOP						    //VGPR save is complete?
+    s_set_gpr_idx_off
+L_SAVE_VGPR_LOOP_END:
+
+	s_and_b32 s_save_buf_rsrc1, s_save_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+	s_or_b32  s_save_buf_rsrc1, s_save_buf_rsrc1, S_SAVE_BUF_RSRC_WORD1_STRIDE  // reset const stride to 4 bytes
+else
+    // VGPR store using dw burst
+    s_mov_b32	      m0, 0x4	//VGPR initial index value =0
+    s_cmp_lt_u32      m0, s_save_alloc_size
+    s_cbranch_scc0    L_SAVE_VGPR_END
+
+
+    s_set_gpr_idx_on	m0, 0x1 //M0[7:0] = M0[7:0] and M0[15:12] = 0x1
+    s_add_u32	    s_save_alloc_size, s_save_alloc_size, 0x1000		    //add 0x1000 since we compare m0 against it later
+
+  L_SAVE_VGPR_LOOP:
+    v_mov_b32	    v0, v0		//v0 = v[0+m0]
+    v_mov_b32	    v1, v1		//v0 = v[0+m0]
+    v_mov_b32	    v2, v2		//v0 = v[0+m0]
+    v_mov_b32	    v3, v3		//v0 = v[0+m0]
+
+    if(USE_MTBUF_INSTEAD_OF_MUBUF)
+	tbuffer_store_format_x v0, v0, s_save_buf_rsrc0, s_save_mem_offset format:BUF_NUM_FORMAT_FLOAT format: BUF_DATA_FORMAT_32 slc:1 glc:1
+    else
+	buffer_store_dword v0, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1
+	buffer_store_dword v1, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256
+	buffer_store_dword v2, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256*2
+	buffer_store_dword v3, v0, s_save_buf_rsrc0, s_save_mem_offset slc:1 glc:1  offset:256*3
+    end
+
+    s_add_u32	    m0, m0, 4							    //next vgpr index
+    s_add_u32	    s_save_mem_offset, s_save_mem_offset, 256*4			    //every buffer_store_dword does 256 bytes
+    s_cmp_lt_u32    m0, s_save_alloc_size					    //scc = (m0 < s_save_alloc_size) ? 1 : 0
+    s_cbranch_scc1  L_SAVE_VGPR_LOOP						    //VGPR save is complete?
+    s_set_gpr_idx_off
+end
+
+L_SAVE_VGPR_END:
+
+
+
+
+
+
+    /*	   S_PGM_END_SAVED  */				    //FIXME  graphics ONLY
+    if ((EMU_RUN_HACK) && (!EMU_RUN_HACK_SAVE_NORMAL_EXIT))
+	s_and_b32 s_save_pc_hi, s_save_pc_hi, 0x0000ffff    //pc[47:32]
+	s_add_u32 s_save_pc_lo, s_save_pc_lo, 4		    //pc[31:0]+4
+	s_addc_u32 s_save_pc_hi, s_save_pc_hi, 0x0	    //carry bit over
+	s_rfe_b64 s_save_pc_lo				    //Return to the main shader program
+    else
+    end
+
+// Save Done timestamp
+if G8SR_DEBUG_TIMESTAMP
+	s_memrealtime	s_g8sr_ts_save_d
+	// SGPR SR memory offset : size(VGPR)
+	get_vgpr_size_bytes(s_save_mem_offset)
+	s_add_u32 s_save_mem_offset, s_save_mem_offset, G8SR_DEBUG_TS_SAVE_D_OFFSET
+	s_waitcnt lgkmcnt(0)	     //FIXME, will cause xnack??
+	// Need reset rsrc2??
+	s_mov_b32 m0, s_save_mem_offset
+	s_mov_b32 s_save_buf_rsrc2,  0x1000000					//NUM_RECORDS in bytes
+	s_buffer_store_dwordx2 s_g8sr_ts_save_d, s_save_buf_rsrc0, m0	    glc:1
+end
+
+
+    s_branch	L_END_PGM
+
+
+
+/**************************************************************************/
+/*			restore routine					  */
+/**************************************************************************/
+
+L_RESTORE:
+    /*	    Setup Resource Contants    */
+    if ((EMU_RUN_HACK) && (!EMU_RUN_HACK_RESTORE_NORMAL))
+	//calculate wd_addr using absolute thread id
+	v_readlane_b32 s_restore_tmp, v9, 0
+	s_lshr_b32 s_restore_tmp, s_restore_tmp, 6
+	s_mul_i32 s_restore_tmp, s_restore_tmp, WAVE_SPACE
+	s_add_i32 s_restore_spi_init_lo, s_restore_tmp, WG_BASE_ADDR_LO
+	s_mov_b32 s_restore_spi_init_hi, WG_BASE_ADDR_HI
+	s_and_b32 s_restore_spi_init_hi, s_restore_spi_init_hi, CTX_RESTORE_CONTROL
+    else
+    end
+
+if G8SR_DEBUG_TIMESTAMP
+	s_memrealtime	s_g8sr_ts_restore_s
+	s_waitcnt lgkmcnt(0)	     //FIXME, will cause xnack??
+	// tma_lo/hi are sgpr 110, 111, which will not used for 112 SGPR allocated case...
+	s_mov_b32 s_restore_pc_lo, s_g8sr_ts_restore_s[0]
+	s_mov_b32 s_restore_pc_hi, s_g8sr_ts_restore_s[1]   //backup ts to ttmp0/1, sicne exec will be finally restored..
+end
+
+
+
+    s_mov_b32	    s_restore_buf_rsrc0,    s_restore_spi_init_lo							    //base_addr_lo
+    s_and_b32	    s_restore_buf_rsrc1,    s_restore_spi_init_hi, 0x0000FFFF						    //base_addr_hi
+    s_or_b32	    s_restore_buf_rsrc1,    s_restore_buf_rsrc1,  S_RESTORE_BUF_RSRC_WORD1_STRIDE
+    s_mov_b32	    s_restore_buf_rsrc2,    0										    //NUM_RECORDS initial value = 0 (in bytes)
+    s_mov_b32	    s_restore_buf_rsrc3,    S_RESTORE_BUF_RSRC_WORD3_MISC
+    s_and_b32	    s_restore_tmp,	    s_restore_spi_init_hi, S_RESTORE_SPI_INIT_ATC_MASK
+    s_lshr_b32	    s_restore_tmp,	    s_restore_tmp, (S_RESTORE_SPI_INIT_ATC_SHIFT-SQ_BUF_RSRC_WORD1_ATC_SHIFT)	    //get ATC bit into position
+    s_or_b32	    s_restore_buf_rsrc3,    s_restore_buf_rsrc3,  s_restore_tmp						    //or ATC
+    s_and_b32	    s_restore_tmp,	    s_restore_spi_init_hi, S_RESTORE_SPI_INIT_MTYPE_MASK
+    s_lshr_b32	    s_restore_tmp,	    s_restore_tmp, (S_RESTORE_SPI_INIT_MTYPE_SHIFT-SQ_BUF_RSRC_WORD3_MTYPE_SHIFT)   //get MTYPE bits into position
+    s_or_b32	    s_restore_buf_rsrc3,    s_restore_buf_rsrc3,  s_restore_tmp						    //or MTYPE
+
+    /*	    global mem offset		*/
+//  s_mov_b32	    s_restore_mem_offset, 0x0				    //mem offset initial value = 0
+
+    /*	    the first wave in the threadgroup	 */
+    s_and_b32	    s_restore_tmp, s_restore_spi_init_hi, S_RESTORE_SPI_INIT_FIRST_WAVE_MASK
+    s_cbranch_scc0  L_RESTORE_VGPR
+
+    /*		restore LDS	*/
+    //////////////////////////////
+  L_RESTORE_LDS:
+
+    s_mov_b32	    exec_lo, 0xFFFFFFFF							    //need every thread from now on   //be consistent with SAVE although can be moved ahead
+    s_mov_b32	    exec_hi, 0xFFFFFFFF
+
+    s_getreg_b32    s_restore_alloc_size, hwreg(HW_REG_LDS_ALLOC,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT,SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE)		//lds_size
+    s_and_b32	    s_restore_alloc_size, s_restore_alloc_size, 0xFFFFFFFF		    //lds_size is zero?
+    s_cbranch_scc0  L_RESTORE_VGPR							    //no lds used? jump to L_RESTORE_VGPR
+    s_lshl_b32	    s_restore_alloc_size, s_restore_alloc_size, 6			    //LDS size in dwords = lds_size * 64dw
+    s_lshl_b32	    s_restore_alloc_size, s_restore_alloc_size, 2			    //LDS size in bytes
+    s_mov_b32	    s_restore_buf_rsrc2,    s_restore_alloc_size			    //NUM_RECORDS in bytes
+
+    // LDS at offset: size(VGPR)+SIZE(SGPR)+SIZE(HWREG)
+    //
+    get_vgpr_size_bytes(s_restore_mem_offset)
+    get_sgpr_size_bytes(s_restore_tmp)
+    s_add_u32  s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp
+    s_add_u32  s_restore_mem_offset, s_restore_mem_offset, get_hwreg_size_bytes()	     //FIXME, Check if offset overflow???
+
+
+    if (SWIZZLE_EN)
+	s_add_u32	s_restore_buf_rsrc2, s_restore_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_restore_buf_rsrc2,  0x1000000					    //NUM_RECORDS in bytes
+    end
+    s_mov_b32	    m0, 0x0								    //lds_offset initial value = 0
+
+  L_RESTORE_LDS_LOOP:
+    if (SAVE_LDS)
+	buffer_load_dword   v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1		       // first 64DW
+	buffer_load_dword   v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset lds:1 offset:256	       // second 64DW
+    end
+    s_add_u32	    m0, m0, 256*2						// 128 DW
+    s_add_u32	    s_restore_mem_offset, s_restore_mem_offset, 256*2		//mem offset increased by 128DW
+    s_cmp_lt_u32    m0, s_restore_alloc_size					//scc=(m0 < s_restore_alloc_size) ? 1 : 0
+    s_cbranch_scc1  L_RESTORE_LDS_LOOP							    //LDS restore is complete?
+
+
+    /*		restore VGPRs	    */
+    //////////////////////////////
+  L_RESTORE_VGPR:
+	// VGPR SR memory offset : 0
+    s_mov_b32	    s_restore_mem_offset, 0x0
+    s_mov_b32	    exec_lo, 0xFFFFFFFF							    //need every thread from now on   //be consistent with SAVE although can be moved ahead
+    s_mov_b32	    exec_hi, 0xFFFFFFFF
+
+    s_getreg_b32    s_restore_alloc_size, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE)	//vpgr_size
+    s_add_u32	    s_restore_alloc_size, s_restore_alloc_size, 1
+    s_lshl_b32	    s_restore_alloc_size, s_restore_alloc_size, 2			    //Number of VGPRs = (vgpr_size + 1) * 4    (non-zero value)
+    s_lshl_b32	    s_restore_buf_rsrc2,  s_restore_alloc_size, 8			    //NUM_RECORDS in bytes (64 threads*4)
+    if (SWIZZLE_EN)
+	s_add_u32	s_restore_buf_rsrc2, s_restore_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_restore_buf_rsrc2,  0x1000000					    //NUM_RECORDS in bytes
+    end
+
+if G8SR_VGPR_SR_IN_DWX4
+     get_vgpr_size_bytes(s_restore_mem_offset)
+     s_sub_u32	       s_restore_mem_offset, s_restore_mem_offset, 256*4
+
+     // the const stride for DWx4 is 4*4 bytes
+     s_and_b32 s_restore_buf_rsrc1, s_restore_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+     s_or_b32  s_restore_buf_rsrc1, s_restore_buf_rsrc1, G8SR_RESTORE_BUF_RSRC_WORD1_STRIDE_DWx4  // const stride to 4*4 bytes
+
+     s_mov_b32	       m0, s_restore_alloc_size
+     s_set_gpr_idx_on  m0, 0x8	  // Note.. This will change m0
+
+L_RESTORE_VGPR_LOOP:
+     buffer_load_dwordx4 v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset slc:1 glc:1
+     s_waitcnt vmcnt(0)
+     s_sub_u32	       m0, m0, 4
+     v_mov_b32	       v0, v0	// v[0+m0] = v0
+     v_mov_b32	       v1, v1
+     v_mov_b32	       v2, v2
+     v_mov_b32	       v3, v3
+     s_sub_u32	       s_restore_mem_offset, s_restore_mem_offset, 256*4
+     s_cmp_eq_u32      m0, 0x8000
+     s_cbranch_scc0    L_RESTORE_VGPR_LOOP
+     s_set_gpr_idx_off
+
+     s_and_b32 s_restore_buf_rsrc1, s_restore_buf_rsrc1, 0x0000FFFF   // reset const stride to 0
+     s_or_b32  s_restore_buf_rsrc1, s_restore_buf_rsrc1, S_RESTORE_BUF_RSRC_WORD1_STRIDE  // const stride to 4*4 bytes
+
+else
+    // VGPR load using dw burst
+    s_mov_b32	    s_restore_mem_offset_save, s_restore_mem_offset	// restore start with v1, v0 will be the last
+    s_add_u32	    s_restore_mem_offset, s_restore_mem_offset, 256*4
+    s_mov_b32	    m0, 4				//VGPR initial index value = 1
+    s_set_gpr_idx_on  m0, 0x8			    //M0[7:0] = M0[7:0] and M0[15:12] = 0x8
+    s_add_u32	    s_restore_alloc_size, s_restore_alloc_size, 0x8000			    //add 0x8000 since we compare m0 against it later
+
+  L_RESTORE_VGPR_LOOP:
+    if(USE_MTBUF_INSTEAD_OF_MUBUF)
+	tbuffer_load_format_x v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset format:BUF_NUM_FORMAT_FLOAT format: BUF_DATA_FORMAT_32 slc:1 glc:1
+    else
+	buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset slc:1 glc:1
+	buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset slc:1 glc:1 offset:256
+	buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset slc:1 glc:1 offset:256*2
+	buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset slc:1 glc:1 offset:256*3
+    end
+    s_waitcnt	    vmcnt(0)								    //ensure data ready
+    v_mov_b32	    v0, v0								    //v[0+m0] = v0
+    v_mov_b32	    v1, v1
+    v_mov_b32	    v2, v2
+    v_mov_b32	    v3, v3
+    s_add_u32	    m0, m0, 4								    //next vgpr index
+    s_add_u32	    s_restore_mem_offset, s_restore_mem_offset, 256*4				//every buffer_load_dword does 256 bytes
+    s_cmp_lt_u32    m0, s_restore_alloc_size						    //scc = (m0 < s_restore_alloc_size) ? 1 : 0
+    s_cbranch_scc1  L_RESTORE_VGPR_LOOP							    //VGPR restore (except v0) is complete?
+    s_set_gpr_idx_off
+											    /* VGPR restore on v0 */
+    if(USE_MTBUF_INSTEAD_OF_MUBUF)
+	tbuffer_load_format_x v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save format:BUF_NUM_FORMAT_FLOAT format: BUF_DATA_FORMAT_32 slc:1 glc:1
+    else
+	buffer_load_dword v0, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save    slc:1 glc:1
+	buffer_load_dword v1, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save    slc:1 glc:1 offset:256
+	buffer_load_dword v2, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save    slc:1 glc:1 offset:256*2
+	buffer_load_dword v3, v0, s_restore_buf_rsrc0, s_restore_mem_offset_save    slc:1 glc:1 offset:256*3
+    end
+
+end
+
+    /*		restore SGPRs	    */
+    //////////////////////////////
+
+    // SGPR SR memory offset : size(VGPR)
+    get_vgpr_size_bytes(s_restore_mem_offset)
+    get_sgpr_size_bytes(s_restore_tmp)
+    s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp
+    s_sub_u32 s_restore_mem_offset, s_restore_mem_offset, 16*4	   // restore SGPR from S[n] to S[0], by 16 sgprs group
+    // TODO, change RSRC word to rearrange memory layout for SGPRS
+
+    s_getreg_b32    s_restore_alloc_size, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SIZE)		    //spgr_size
+    s_add_u32	    s_restore_alloc_size, s_restore_alloc_size, 1
+    s_lshl_b32	    s_restore_alloc_size, s_restore_alloc_size, 4			    //Number of SGPRs = (sgpr_size + 1) * 16   (non-zero value)
+
+    if (SGPR_SAVE_USE_SQC)
+	s_lshl_b32	s_restore_buf_rsrc2,	s_restore_alloc_size, 2			    //NUM_RECORDS in bytes
+    else
+	s_lshl_b32	s_restore_buf_rsrc2,	s_restore_alloc_size, 8			    //NUM_RECORDS in bytes (64 threads)
+    end
+    if (SWIZZLE_EN)
+	s_add_u32	s_restore_buf_rsrc2, s_restore_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_restore_buf_rsrc2,  0x1000000					    //NUM_RECORDS in bytes
+    end
+
+    s_mov_b32 m0, s_restore_alloc_size
+
+ L_RESTORE_SGPR_LOOP:
+    read_16sgpr_from_mem(s0, s_restore_buf_rsrc0, s_restore_mem_offset)	 //PV: further performance improvement can be made
+    s_waitcnt	    lgkmcnt(0)								    //ensure data ready
+
+    s_sub_u32 m0, m0, 16    // Restore from S[n] to S[0]
+    s_nop 0 // hazard SALU M0=> S_MOVREL
+
+    s_movreld_b64   s0, s0	//s[0+m0] = s0
+    s_movreld_b64   s2, s2
+    s_movreld_b64   s4, s4
+    s_movreld_b64   s6, s6
+    s_movreld_b64   s8, s8
+    s_movreld_b64   s10, s10
+    s_movreld_b64   s12, s12
+    s_movreld_b64   s14, s14
+
+    s_cmp_eq_u32    m0, 0		//scc = (m0 < s_restore_alloc_size) ? 1 : 0
+    s_cbranch_scc0  L_RESTORE_SGPR_LOOP		    //SGPR restore (except s0) is complete?
+
+    /*	    restore HW registers    */
+    //////////////////////////////
+  L_RESTORE_HWREG:
+
+
+if G8SR_DEBUG_TIMESTAMP
+      s_mov_b32 s_g8sr_ts_restore_s[0], s_restore_pc_lo
+      s_mov_b32 s_g8sr_ts_restore_s[1], s_restore_pc_hi
+end
+
+    // HWREG SR memory offset : size(VGPR)+size(SGPR)
+    get_vgpr_size_bytes(s_restore_mem_offset)
+    get_sgpr_size_bytes(s_restore_tmp)
+    s_add_u32 s_restore_mem_offset, s_restore_mem_offset, s_restore_tmp
+
+
+    s_mov_b32	    s_restore_buf_rsrc2, 0x4						    //NUM_RECORDS   in bytes
+    if (SWIZZLE_EN)
+	s_add_u32	s_restore_buf_rsrc2, s_restore_buf_rsrc2, 0x0			    //FIXME need to use swizzle to enable bounds checking?
+    else
+	s_mov_b32	s_restore_buf_rsrc2,  0x1000000					    //NUM_RECORDS in bytes
+    end
+
+    read_hwreg_from_mem(s_restore_m0, s_restore_buf_rsrc0, s_restore_mem_offset)		    //M0
+    read_hwreg_from_mem(s_restore_pc_lo, s_restore_buf_rsrc0, s_restore_mem_offset)		//PC
+    read_hwreg_from_mem(s_restore_pc_hi, s_restore_buf_rsrc0, s_restore_mem_offset)
+    read_hwreg_from_mem(s_restore_exec_lo, s_restore_buf_rsrc0, s_restore_mem_offset)		    //EXEC
+    read_hwreg_from_mem(s_restore_exec_hi, s_restore_buf_rsrc0, s_restore_mem_offset)
+    read_hwreg_from_mem(s_restore_status, s_restore_buf_rsrc0, s_restore_mem_offset)		    //STATUS
+    read_hwreg_from_mem(s_restore_trapsts, s_restore_buf_rsrc0, s_restore_mem_offset)		    //TRAPSTS
+    read_hwreg_from_mem(xnack_mask_lo, s_restore_buf_rsrc0, s_restore_mem_offset)		    //XNACK_MASK_LO
+    read_hwreg_from_mem(xnack_mask_hi, s_restore_buf_rsrc0, s_restore_mem_offset)		    //XNACK_MASK_HI
+    read_hwreg_from_mem(s_restore_mode, s_restore_buf_rsrc0, s_restore_mem_offset)		//MODE
+
+    s_waitcnt	    lgkmcnt(0)											    //from now on, it is safe to restore STATUS and IB_STS
+
+    s_and_b32 s_restore_pc_hi, s_restore_pc_hi, 0x0000ffff	//pc[47:32]	   //Do it here in order not to affect STATUS
+
+    //for normal save & restore, the saved PC points to the next inst to execute, no adjustment needs to be made, otherwise:
+    if ((EMU_RUN_HACK) && (!EMU_RUN_HACK_RESTORE_NORMAL))
+	s_add_u32 s_restore_pc_lo, s_restore_pc_lo, 8		 //pc[31:0]+8	  //two back-to-back s_trap are used (first for save and second for restore)
+	s_addc_u32  s_restore_pc_hi, s_restore_pc_hi, 0x0	 //carry bit over
+    end
+    if ((EMU_RUN_HACK) && (EMU_RUN_HACK_RESTORE_NORMAL))
+	s_add_u32 s_restore_pc_lo, s_restore_pc_lo, 4		 //pc[31:0]+4	  // save is hack through s_trap but restore is normal
+	s_addc_u32  s_restore_pc_hi, s_restore_pc_hi, 0x0	 //carry bit over
+    end
+
+    s_mov_b32	    m0,		s_restore_m0
+    s_mov_b32	    exec_lo,	s_restore_exec_lo
+    s_mov_b32	    exec_hi,	s_restore_exec_hi
+
+    s_and_b32	    s_restore_m0, SQ_WAVE_TRAPSTS_PRE_SAVECTX_MASK, s_restore_trapsts
+    s_setreg_b32    hwreg(HW_REG_TRAPSTS, SQ_WAVE_TRAPSTS_PRE_SAVECTX_SHIFT, SQ_WAVE_TRAPSTS_PRE_SAVECTX_SIZE), s_restore_m0
+    s_and_b32	    s_restore_m0, SQ_WAVE_TRAPSTS_POST_SAVECTX_MASK, s_restore_trapsts
+    s_lshr_b32	    s_restore_m0, s_restore_m0, SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT
+    s_setreg_b32    hwreg(HW_REG_TRAPSTS, SQ_WAVE_TRAPSTS_POST_SAVECTX_SHIFT, SQ_WAVE_TRAPSTS_POST_SAVECTX_SIZE), s_restore_m0
+    //s_setreg_b32  hwreg(HW_REG_TRAPSTS),  s_restore_trapsts	   //don't overwrite SAVECTX bit as it may be set through external SAVECTX during restore
+    s_setreg_b32    hwreg(HW_REG_MODE),	    s_restore_mode
+
+    // Restore trap temporaries 6-11, 13-15 initialized by SPI debug dispatch logic
+    // ttmp SR memory offset : size(VGPR)+size(SGPR)+0x40
+    get_vgpr_size_bytes(s_restore_ttmps_lo)
+    get_sgpr_size_bytes(s_restore_ttmps_hi)
+    s_add_u32	    s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_ttmps_hi
+    s_add_u32	    s_restore_ttmps_lo, s_restore_ttmps_lo, s_restore_buf_rsrc0
+    s_addc_u32	    s_restore_ttmps_hi, s_restore_buf_rsrc1, 0x0
+    s_and_b32	    s_restore_ttmps_hi, s_restore_ttmps_hi, 0xFFFF
+    s_load_dwordx2  [ttmp6, ttmp7], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x40 glc:1
+    s_load_dwordx4  [ttmp8, ttmp9, ttmp10, ttmp11], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x48 glc:1
+    s_load_dword    ttmp13, [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x58 glc:1
+    s_load_dwordx2  [ttmp14, ttmp15], [s_restore_ttmps_lo, s_restore_ttmps_hi], 0x5C glc:1
+    s_waitcnt	    lgkmcnt(0)
+
+    //reuse s_restore_m0 as a temp register
+    s_and_b32	    s_restore_m0, s_restore_pc_hi, S_SAVE_PC_HI_RCNT_MASK
+    s_lshr_b32	    s_restore_m0, s_restore_m0, S_SAVE_PC_HI_RCNT_SHIFT
+    s_lshl_b32	    s_restore_m0, s_restore_m0, SQ_WAVE_IB_STS_RCNT_SHIFT
+    s_mov_b32	    s_restore_tmp, 0x0										    //IB_STS is zero
+    s_or_b32	    s_restore_tmp, s_restore_tmp, s_restore_m0
+    s_and_b32	    s_restore_m0, s_restore_pc_hi, S_SAVE_PC_HI_FIRST_REPLAY_MASK
+    s_lshr_b32	    s_restore_m0, s_restore_m0, S_SAVE_PC_HI_FIRST_REPLAY_SHIFT
+    s_lshl_b32	    s_restore_m0, s_restore_m0, SQ_WAVE_IB_STS_FIRST_REPLAY_SHIFT
+    s_or_b32	    s_restore_tmp, s_restore_tmp, s_restore_m0
+    s_and_b32	    s_restore_m0, s_restore_status, SQ_WAVE_STATUS_INST_ATC_MASK
+    s_lshr_b32	    s_restore_m0, s_restore_m0, SQ_WAVE_STATUS_INST_ATC_SHIFT
+    s_setreg_b32    hwreg(HW_REG_IB_STS),   s_restore_tmp
+
+    s_and_b64	 exec, exec, exec  // Restore STATUS.EXECZ, not writable by s_setreg_b32
+    s_and_b64	 vcc, vcc, vcc	// Restore STATUS.VCCZ, not writable by s_setreg_b32
+    s_setreg_b32    hwreg(HW_REG_STATUS),   s_restore_status	 // SCC is included, which is changed by previous salu
+
+    s_barrier							//barrier to ensure the readiness of LDS before access attempts from any other wave in the same TG //FIXME not performance-optimal at this time
+
+if G8SR_DEBUG_TIMESTAMP
+    s_memrealtime s_g8sr_ts_restore_d
+    s_waitcnt lgkmcnt(0)
+end
+
+//  s_rfe_b64 s_restore_pc_lo					//Return to the main shader program and resume execution
+    s_rfe_restore_b64  s_restore_pc_lo, s_restore_m0		// s_restore_m0[0] is used to set STATUS.inst_atc
+
+
+/**************************************************************************/
+/*			the END						  */
+/**************************************************************************/
+L_END_PGM:
+    s_endpgm
+
+end
+
+
+/**************************************************************************/
+/*			the helper functions				  */
+/**************************************************************************/
+
+//Only for save hwreg to mem
+function write_hwreg_to_mem(s, s_rsrc, s_mem_offset)
+	s_mov_b32 exec_lo, m0			//assuming exec_lo is not needed anymore from this point on
+	s_mov_b32 m0, s_mem_offset
+	s_buffer_store_dword s, s_rsrc, m0	glc:1
+	ack_sqc_store_workaround()
+	s_add_u32	s_mem_offset, s_mem_offset, 4
+	s_mov_b32   m0, exec_lo
+end
+
+
+// HWREG are saved before SGPRs, so all HWREG could be use.
+function write_16sgpr_to_mem(s, s_rsrc, s_mem_offset)
+
+	s_buffer_store_dwordx4 s[0], s_rsrc, 0	glc:1
+	ack_sqc_store_workaround()
+	s_buffer_store_dwordx4 s[4], s_rsrc, 16	 glc:1
+	ack_sqc_store_workaround()
+	s_buffer_store_dwordx4 s[8], s_rsrc, 32	 glc:1
+	ack_sqc_store_workaround()
+	s_buffer_store_dwordx4 s[12], s_rsrc, 48 glc:1
+	ack_sqc_store_workaround()
+	s_add_u32	s_rsrc[0], s_rsrc[0], 4*16
+	s_addc_u32	s_rsrc[1], s_rsrc[1], 0x0	      // +scc
+end
+
+
+function read_hwreg_from_mem(s, s_rsrc, s_mem_offset)
+    s_buffer_load_dword s, s_rsrc, s_mem_offset	    glc:1
+    s_add_u32	    s_mem_offset, s_mem_offset, 4
+end
+
+function read_16sgpr_from_mem(s, s_rsrc, s_mem_offset)
+    s_buffer_load_dwordx16 s, s_rsrc, s_mem_offset	glc:1
+    s_sub_u32	    s_mem_offset, s_mem_offset, 4*16
+end
+
+
+
+function get_lds_size_bytes(s_lds_size_byte)
+    // SQ LDS granularity is 64DW, while PGM_RSRC2.lds_size is in granularity 128DW
+    s_getreg_b32   s_lds_size_byte, hwreg(HW_REG_LDS_ALLOC, SQ_WAVE_LDS_ALLOC_LDS_SIZE_SHIFT, SQ_WAVE_LDS_ALLOC_LDS_SIZE_SIZE)		// lds_size
+    s_lshl_b32	   s_lds_size_byte, s_lds_size_byte, 8			    //LDS size in dwords = lds_size * 64 *4Bytes    // granularity 64DW
+end
+
+function get_vgpr_size_bytes(s_vgpr_size_byte)
+    s_getreg_b32   s_vgpr_size_byte, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_VGPR_SIZE_SIZE)	 //vpgr_size
+    s_add_u32	   s_vgpr_size_byte, s_vgpr_size_byte, 1
+    s_lshl_b32	   s_vgpr_size_byte, s_vgpr_size_byte, (2+8) //Number of VGPRs = (vgpr_size + 1) * 4 * 64 * 4	(non-zero value)   //FIXME for GFX, zero is possible
+end
+
+function get_sgpr_size_bytes(s_sgpr_size_byte)
+    s_getreg_b32   s_sgpr_size_byte, hwreg(HW_REG_GPR_ALLOC,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SHIFT,SQ_WAVE_GPR_ALLOC_SGPR_SIZE_SIZE)	 //spgr_size
+    s_add_u32	   s_sgpr_size_byte, s_sgpr_size_byte, 1
+    s_lshl_b32	   s_sgpr_size_byte, s_sgpr_size_byte, 6 //Number of SGPRs = (sgpr_size + 1) * 16 *4   (non-zero value)
+end
+
+function get_hwreg_size_bytes
+    return 128 //HWREG size 128 bytes
+end
+
+function ack_sqc_store_workaround
+    if ACK_SQC_STORE
+        s_waitcnt lgkmcnt(0)
+    end
+end
+
+
+#endif
+
+static const uint32_t cwsr_trap_gfx9_hex[] = {
+	0xbf820001, 0xbf820158,
+	0xb8f8f802, 0x89788678,
+	0xb8f1f803, 0x866eff71,
+	0x00000400, 0xbf850034,
+	0x866eff71, 0x00000800,
+	0xbf850003, 0x866eff71,
+	0x00000100, 0xbf840008,
+	0x866eff78, 0x00002000,
+	0xbf840001, 0xbf810000,
+	0x8778ff78, 0x00002000,
+	0x80ec886c, 0x82ed806d,
+	0xb8eef807, 0x866fff6e,
+	0x001f8000, 0x8e6f8b6f,
+	0x8977ff77, 0xfc000000,
+	0x87776f77, 0x896eff6e,
+	0x001f8000, 0xb96ef807,
+	0xb8f0f812, 0xb8f1f813,
+	0x8ef08870, 0xc0071bb8,
+	0x00000000, 0xbf8cc07f,
+	0xc0071c38, 0x00000008,
+	0xbf8cc07f, 0x86ee6e6e,
+	0xbf840001, 0xbe801d6e,
+	0xb8f1f803, 0x8671ff71,
+	0x000001ff, 0xbf850002,
+	0x806c846c, 0x826d806d,
+	0x866dff6d, 0x0000ffff,
+	0x8f6e8b77, 0x866eff6e,
+	0x001f8000, 0xb96ef807,
+	0x86fe7e7e, 0x86ea6a6a,
+	0xb978f802, 0xbe801f6c,
+	0x866dff6d, 0x0000ffff,
+	0xbef00080, 0xb9700283,
+	0xb8f02407, 0x8e709c70,
+	0x876d706d, 0xb8f003c7,
+	0x8e709b70, 0x876d706d,
+	0xb8f0f807, 0x8670ff70,
+	0x00007fff, 0xb970f807,
+	0xbeee007e, 0xbeef007f,
+	0xbefe0180, 0xbf900004,
+	0xbf8e0002, 0xbf88fffe,
+	0xb8f02a05, 0x80708170,
+	0x8e708a70, 0xb8f11605,
+	0x80718171, 0x8e718671,
+	0x80707170, 0x80707e70,
+	0x8271807f, 0x8671ff71,
+	0x0000ffff, 0xc0471cb8,
+	0x00000040, 0xbf8cc07f,
+	0xc04b1d38, 0x00000048,
+	0xbf8cc07f, 0xc0431e78,
+	0x00000058, 0xbf8cc07f,
+	0xc0471eb8, 0x0000005c,
+	0xbf8cc07f, 0xbef4007e,
+	0x8675ff7f, 0x0000ffff,
+	0x8775ff75, 0x00040000,
+	0xbef60080, 0xbef700ff,
+	0x00807fac, 0x8670ff7f,
+	0x08000000, 0x8f708370,
+	0x87777077, 0x8670ff7f,
+	0x70000000, 0x8f708170,
+	0x87777077, 0xbefb007c,
+	0xbefa0080, 0xb8fa2a05,
+	0x807a817a, 0x8e7a8a7a,
+	0xb8f01605, 0x80708170,
+	0x8e708670, 0x807a707a,
+	0xbef60084, 0xbef600ff,
+	0x01000000, 0xbefe007c,
+	0xbefc007a, 0xc0611efa,
+	0x0000007c, 0xbf8cc07f,
+	0x807a847a, 0xbefc007e,
+	0xbefe007c, 0xbefc007a,
+	0xc0611b3a, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0xbefe007c,
+	0xbefc007a, 0xc0611b7a,
+	0x0000007c, 0xbf8cc07f,
+	0x807a847a, 0xbefc007e,
+	0xbefe007c, 0xbefc007a,
+	0xc0611bba, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0xbefe007c,
+	0xbefc007a, 0xc0611bfa,
+	0x0000007c, 0xbf8cc07f,
+	0x807a847a, 0xbefc007e,
+	0xbefe007c, 0xbefc007a,
+	0xc0611e3a, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0xb8f1f803,
+	0xbefe007c, 0xbefc007a,
+	0xc0611c7a, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0xbefe007c,
+	0xbefc007a, 0xc0611a3a,
+	0x0000007c, 0xbf8cc07f,
+	0x807a847a, 0xbefc007e,
+	0xbefe007c, 0xbefc007a,
+	0xc0611a7a, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0xb8fbf801,
+	0xbefe007c, 0xbefc007a,
+	0xc0611efa, 0x0000007c,
+	0xbf8cc07f, 0x807a847a,
+	0xbefc007e, 0x8670ff7f,
+	0x04000000, 0xbeef0080,
+	0x876f6f70, 0xb8fa2a05,
+	0x807a817a, 0x8e7a8a7a,
+	0xb8f11605, 0x80718171,
+	0x8e718471, 0x8e768271,
+	0xbef600ff, 0x01000000,
+	0xbef20174, 0x80747a74,
+	0x82758075, 0xbefc0080,
+	0xbf800000, 0xbe802b00,
+	0xbe822b02, 0xbe842b04,
+	0xbe862b06, 0xbe882b08,
+	0xbe8a2b0a, 0xbe8c2b0c,
+	0xbe8e2b0e, 0xc06b003a,
+	0x00000000, 0xbf8cc07f,
+	0xc06b013a, 0x00000010,
+	0xbf8cc07f, 0xc06b023a,
+	0x00000020, 0xbf8cc07f,
+	0xc06b033a, 0x00000030,
+	0xbf8cc07f, 0x8074c074,
+	0x82758075, 0x807c907c,
+	0xbf0a717c, 0xbf85ffe7,
+	0xbef40172, 0xbefa0080,
+	0xbefe00c1, 0xbeff00c1,
+	0xbee80080, 0xbee90080,
+	0xbef600ff, 0x01000000,
+	0xe0724000, 0x7a1d0000,
+	0xe0724100, 0x7a1d0100,
+	0xe0724200, 0x7a1d0200,
+	0xe0724300, 0x7a1d0300,
+	0xbefe00c1, 0xbeff00c1,
+	0xb8f14306, 0x8671c171,
+	0xbf84002c, 0xbf8a0000,
+	0x8670ff6f, 0x04000000,
+	0xbf840028, 0x8e718671,
+	0x8e718271, 0xbef60071,
+	0xb8fa2a05, 0x807a817a,
+	0x8e7a8a7a, 0xb8f01605,
+	0x80708170, 0x8e708670,
+	0x807a707a, 0x807aff7a,
+	0x00000080, 0xbef600ff,
+	0x01000000, 0xbefc0080,
+	0xd28c0002, 0x000100c1,
+	0xd28d0003, 0x000204c1,
+	0xd1060002, 0x00011103,
+	0x7e0602ff, 0x00000200,
+	0xbefc00ff, 0x00010000,
+	0xbe800077, 0x8677ff77,
+	0xff7fffff, 0x8777ff77,
+	0x00058000, 0xd8ec0000,
+	0x00000002, 0xbf8cc07f,
+	0xe0765000, 0x7a1d0002,
+	0x68040702, 0xd0c9006a,
+	0x0000e302, 0xbf87fff7,
+	0xbef70000, 0xbefa00ff,
+	0x00000400, 0xbefe00c1,
+	0xbeff00c1, 0xb8f12a05,
+	0x80718171, 0x8e718271,
+	0x8e768871, 0xbef600ff,
+	0x01000000, 0xbefc0084,
+	0xbf0a717c, 0xbf840015,
+	0xbf11017c, 0x8071ff71,
+	0x00001000, 0x7e000300,
+	0x7e020301, 0x7e040302,
+	0x7e060303, 0xe0724000,
+	0x7a1d0000, 0xe0724100,
+	0x7a1d0100, 0xe0724200,
+	0x7a1d0200, 0xe0724300,
+	0x7a1d0300, 0x807c847c,
+	0x807aff7a, 0x00000400,
+	0xbf0a717c, 0xbf85ffef,
+	0xbf9c0000, 0xbf8200d9,
+	0xbef4007e, 0x8675ff7f,
+	0x0000ffff, 0x8775ff75,
+	0x00040000, 0xbef60080,
+	0xbef700ff, 0x00807fac,
+	0x866eff7f, 0x08000000,
+	0x8f6e836e, 0x87776e77,
+	0x866eff7f, 0x70000000,
+	0x8f6e816e, 0x87776e77,
+	0x866eff7f, 0x04000000,
+	0xbf84001e, 0xbefe00c1,
+	0xbeff00c1, 0xb8ef4306,
+	0x866fc16f, 0xbf840019,
+	0x8e6f866f, 0x8e6f826f,
+	0xbef6006f, 0xb8f82a05,
+	0x80788178, 0x8e788a78,
+	0xb8ee1605, 0x806e816e,
+	0x8e6e866e, 0x80786e78,
+	0x8078ff78, 0x00000080,
+	0xbef600ff, 0x01000000,
+	0xbefc0080, 0xe0510000,
+	0x781d0000, 0xe0510100,
+	0x781d0000, 0x807cff7c,
+	0x00000200, 0x8078ff78,
+	0x00000200, 0xbf0a6f7c,
+	0xbf85fff6, 0xbef80080,
+	0xbefe00c1, 0xbeff00c1,
+	0xb8ef2a05, 0x806f816f,
+	0x8e6f826f, 0x8e76886f,
+	0xbef600ff, 0x01000000,
+	0xbeee0078, 0x8078ff78,
+	0x00000400, 0xbefc0084,
+	0xbf11087c, 0x806fff6f,
+	0x00008000, 0xe0524000,
+	0x781d0000, 0xe0524100,
+	0x781d0100, 0xe0524200,
+	0x781d0200, 0xe0524300,
+	0x781d0300, 0xbf8c0f70,
+	0x7e000300, 0x7e020301,
+	0x7e040302, 0x7e060303,
+	0x807c847c, 0x8078ff78,
+	0x00000400, 0xbf0a6f7c,
+	0xbf85ffee, 0xbf9c0000,
+	0xe0524000, 0x6e1d0000,
+	0xe0524100, 0x6e1d0100,
+	0xe0524200, 0x6e1d0200,
+	0xe0524300, 0x6e1d0300,
+	0xb8f82a05, 0x80788178,
+	0x8e788a78, 0xb8ee1605,
+	0x806e816e, 0x8e6e866e,
+	0x80786e78, 0x80f8c078,
+	0xb8ef1605, 0x806f816f,
+	0x8e6f846f, 0x8e76826f,
+	0xbef600ff, 0x01000000,
+	0xbefc006f, 0xc031003a,
+	0x00000078, 0x80f8c078,
+	0xbf8cc07f, 0x80fc907c,
+	0xbf800000, 0xbe802d00,
+	0xbe822d02, 0xbe842d04,
+	0xbe862d06, 0xbe882d08,
+	0xbe8a2d0a, 0xbe8c2d0c,
+	0xbe8e2d0e, 0xbf06807c,
+	0xbf84fff0, 0xb8f82a05,
+	0x80788178, 0x8e788a78,
+	0xb8ee1605, 0x806e816e,
+	0x8e6e866e, 0x80786e78,
+	0xbef60084, 0xbef600ff,
+	0x01000000, 0xc0211bfa,
+	0x00000078, 0x80788478,
+	0xc0211b3a, 0x00000078,
+	0x80788478, 0xc0211b7a,
+	0x00000078, 0x80788478,
+	0xc0211eba, 0x00000078,
+	0x80788478, 0xc0211efa,
+	0x00000078, 0x80788478,
+	0xc0211c3a, 0x00000078,
+	0x80788478, 0xc0211c7a,
+	0x00000078, 0x80788478,
+	0xc0211a3a, 0x00000078,
+	0x80788478, 0xc0211a7a,
+	0x00000078, 0x80788478,
+	0xc0211cfa, 0x00000078,
+	0x80788478, 0xbf8cc07f,
+	0x866dff6d, 0x0000ffff,
+	0xbefc006f, 0xbefe007a,
+	0xbeff007b, 0x866f71ff,
+	0x000003ff, 0xb96f4803,
+	0x866f71ff, 0xfffff800,
+	0x8f6f8b6f, 0xb96fa2c3,
+	0xb973f801, 0xb8ee2a05,
+	0x806e816e, 0x8e6e8a6e,
+	0xb8ef1605, 0x806f816f,
+	0x8e6f866f, 0x806e6f6e,
+	0x806e746e, 0x826f8075,
+	0x866fff6f, 0x0000ffff,
+	0xc0071cb7, 0x00000040,
+	0xc00b1d37, 0x00000048,
+	0xc0031e77, 0x00000058,
+	0xc0071eb7, 0x0000005c,
+	0xbf8cc07f, 0x866fff6d,
+	0xf0000000, 0x8f6f9c6f,
+	0x8e6f906f, 0xbeee0080,
+	0x876e6f6e, 0x866fff6d,
+	0x08000000, 0x8f6f9b6f,
+	0x8e6f8f6f, 0x876e6f6e,
+	0x866fff70, 0x00800000,
+	0x8f6f976f, 0xb96ef807,
+	0x86fe7e7e, 0x86ea6a6a,
+	0xb970f802, 0xbf8a0000,
+	0x95806f6c, 0xbf810000,
+};
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index c368ce3..053f1d0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -30,6 +30,7 @@
 #include "kfd_device_queue_manager.h"
 #include "kfd_pm4_headers_vi.h"
 #include "cwsr_trap_handler_gfx8.asm"
+#include "cwsr_trap_handler_gfx9.asm"
 #include "kfd_iommu.h"
 
 #define MQD_SIZE_ALIGNED 768
@@ -333,10 +334,16 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 static void kfd_cwsr_init(struct kfd_dev *kfd)
 {
 	if (cwsr_enable && kfd->device_info->supports_cwsr) {
-		BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) > PAGE_SIZE);
+		if (kfd->device_info->asic_family < CHIP_VEGA10) {
+			BUILD_BUG_ON(sizeof(cwsr_trap_gfx8_hex) > PAGE_SIZE);
+			kfd->cwsr_isa = cwsr_trap_gfx8_hex;
+			kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx8_hex);
+		} else {
+			BUILD_BUG_ON(sizeof(cwsr_trap_gfx9_hex) > PAGE_SIZE);
+			kfd->cwsr_isa = cwsr_trap_gfx9_hex;
+			kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx9_hex);
+		}
 
-		kfd->cwsr_isa = cwsr_trap_gfx8_hex;
-		kfd->cwsr_isa_size = sizeof(cwsr_trap_gfx8_hex);
 		kfd->cwsr_enabled = true;
 	}
 }
-- 
2.7.4


[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:33   ` [PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info Felix Kuehling
                     ` (3 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: welu, Felix Kuehling

From: welu <Wei.Lu2@amd.com>

Report failure to enable atomics only on GPUs that require them.
This allows GPUs that don't require atomics to function, but can
benefit if they are available. This is the case for Vega10, which
doesn't use atomics for basic functioning of the MEC, AQL and HWS
microcode. So it can work without atomics. But shader programs can
still use atomic instructions on systems that support PCIe atomics.

Signed-off-by: welu <Wei.Lu2@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 27 +++++++++++++--------------
 1 file changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 053f1d0..ea95f3b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -290,7 +290,7 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 	struct pci_dev *pdev, const struct kfd2kgd_calls *f2g)
 {
 	struct kfd_dev *kfd;
-
+	int ret;
 	const struct kfd_device_info *device_info =
 					lookup_device_info(pdev->device);
 
@@ -299,19 +299,18 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 		return NULL;
 	}
 
-	if (device_info->needs_pci_atomics) {
-		/* Allow BIF to recode atomics to PCIe 3.0
-		 * AtomicOps. 32 and 64-bit requests are possible and
-		 * must be supported.
-		 */
-		if (pci_enable_atomic_ops_to_root(pdev,
-				PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
-				PCI_EXP_DEVCAP2_ATOMIC_COMP64) < 0) {
-			dev_info(kfd_device,
-				"skipped device %x:%x, PCI rejects atomics",
-				 pdev->vendor, pdev->device);
-			return NULL;
-		}
+	/* Allow BIF to recode atomics to PCIe 3.0 AtomicOps.
+	 * 32 and 64-bit requests are possible and must be
+	 * supported.
+	 */
+	ret = pci_enable_atomic_ops_to_root(pdev,
+			PCI_EXP_DEVCAP2_ATOMIC_COMP32 |
+			PCI_EXP_DEVCAP2_ATOMIC_COMP64);
+	if (device_info->needs_pci_atomics && ret  < 0) {
+		dev_info(kfd_device,
+			 "skipped device %x:%x, PCI rejects atomics\n",
+			 pdev->vendor, pdev->device);
+		return NULL;
 	}
 
 	kfd = kzalloc(sizeof(*kfd), GFP_KERNEL);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs Felix Kuehling
@ 2018-04-10 21:33   ` Felix Kuehling
  2018-04-10 21:58   ` [PATCH 00/21] GFXv9/Vega10 support for KFD Oded Gabbay
                     ` (2 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-04-10 21:33 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

* Report 64-bit doorbells as HSA_CAP_DOORBELL_TYPE_2_0 in topology
* Report cache information in topology (duplicates GFXv8 info for now)
* Add device info for Vega10 support in KFD

Raven is not enabled at this time as it needs additional changes in
DQM to work with a single SDMA engine.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     | 11 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 37 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  6 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |  1 +
 4 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 4f126ef..296b3f2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -132,6 +132,9 @@ static struct kfd_gpu_cache_info carrizo_cache_info[] = {
 #define fiji_cache_info  carrizo_cache_info
 #define polaris10_cache_info carrizo_cache_info
 #define polaris11_cache_info carrizo_cache_info
+/* TODO - check & update Vega10 cache details */
+#define vega10_cache_info carrizo_cache_info
+#define raven_cache_info carrizo_cache_info
 
 static void kfd_populated_cu_info_cpu(struct kfd_topology_device *dev,
 		struct crat_subtype_computeunit *cu)
@@ -603,6 +606,14 @@ static int kfd_fill_gpu_cache_info(struct kfd_dev *kdev,
 		pcache_info = polaris11_cache_info;
 		num_of_cache_types = ARRAY_SIZE(polaris11_cache_info);
 		break;
+	case CHIP_VEGA10:
+		pcache_info = vega10_cache_info;
+		num_of_cache_types = ARRAY_SIZE(vega10_cache_info);
+		break;
+	case CHIP_RAVEN:
+		pcache_info = raven_cache_info;
+		num_of_cache_types = ARRAY_SIZE(raven_cache_info);
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index ea95f3b..fb4a72d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -182,6 +182,34 @@ static const struct kfd_device_info polaris11_device_info = {
 	.needs_pci_atomics = true,
 };
 
+static const struct kfd_device_info vega10_device_info = {
+	.asic_family = CHIP_VEGA10,
+	.max_pasid_bits = 16,
+	.max_no_of_hqd  = 24,
+	.doorbell_size  = 8,
+	.ih_ring_entry_size = 8 * sizeof(uint32_t),
+	.event_interrupt_class = &event_interrupt_class_v9,
+	.num_of_watch_points = 4,
+	.mqd_size_aligned = MQD_SIZE_ALIGNED,
+	.supports_cwsr = true,
+	.needs_iommu_device = false,
+	.needs_pci_atomics = false,
+};
+
+static const struct kfd_device_info vega10_vf_device_info = {
+	.asic_family = CHIP_VEGA10,
+	.max_pasid_bits = 16,
+	.max_no_of_hqd  = 24,
+	.doorbell_size  = 8,
+	.ih_ring_entry_size = 8 * sizeof(uint32_t),
+	.event_interrupt_class = &event_interrupt_class_v9,
+	.num_of_watch_points = 4,
+	.mqd_size_aligned = MQD_SIZE_ALIGNED,
+	.supports_cwsr = true,
+	.needs_iommu_device = false,
+	.needs_pci_atomics = false,
+};
+
 
 struct kfd_deviceid {
 	unsigned short did;
@@ -261,6 +289,15 @@ static const struct kfd_deviceid supported_devices[] = {
 	{ 0x67EB, &polaris11_device_info },	/* Polaris11 */
 	{ 0x67EF, &polaris11_device_info },	/* Polaris11 */
 	{ 0x67FF, &polaris11_device_info },	/* Polaris11 */
+	{ 0x6860, &vega10_device_info },	/* Vega10 */
+	{ 0x6861, &vega10_device_info },	/* Vega10 */
+	{ 0x6862, &vega10_device_info },	/* Vega10 */
+	{ 0x6863, &vega10_device_info },	/* Vega10 */
+	{ 0x6864, &vega10_device_info },	/* Vega10 */
+	{ 0x6867, &vega10_device_info },	/* Vega10 */
+	{ 0x6868, &vega10_device_info },	/* Vega10 */
+	{ 0x686C, &vega10_vf_device_info },	/* Vega10  vf*/
+	{ 0x687F, &vega10_device_info },	/* Vega10 */
 };
 
 static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index ac28abc..bc95d4df 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -1239,6 +1239,12 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 			HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
 			HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
 		break;
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		dev->node_props.capability |= ((HSA_CAP_DOORBELL_TYPE_2_0 <<
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_SHIFT) &
+			HSA_CAP_DOORBELL_TYPE_TOTALBITS_MASK);
+		break;
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dev->gpu->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index eb54cfc..7d9c3f9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -45,6 +45,7 @@
 
 #define HSA_CAP_DOORBELL_TYPE_PRE_1_0		0x0
 #define HSA_CAP_DOORBELL_TYPE_1_0		0x1
+#define HSA_CAP_DOORBELL_TYPE_2_0		0x2
 #define HSA_CAP_AQL_QUEUE_DOUBLE_MAP		0x00004000
 
 struct kfd_node_properties {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/21] GFXv9/Vega10 support for KFD
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (20 preceding siblings ...)
  2018-04-10 21:33   ` [PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info Felix Kuehling
@ 2018-04-10 21:58   ` Oded Gabbay
  2018-05-11 20:08   ` Oded Gabbay
  2018-05-14 14:27   ` Tom Stellard
  23 siblings, 0 replies; 38+ messages in thread
From: Oded Gabbay @ 2018-04-10 21:58 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list


[-- Attachment #1.1: Type: text/plain, Size: 4907 bytes --]

Hi Felix,
Just to let you know that I am currently on vacation and will be back home
only on 4/21 so all patch reviews from my side will be done after that
date.

Thanks,
Oded

On Tue, 10 Apr 2018, 17:33 Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org> wrote:

> This patch series adds support for GFXv9 GPUs to KFD. In this series it
> enables support for Vega10. Raven support requires some extra work that
> will follow shortly, but Raven support is already included and I didn't
> go out of my way to keep it out.
>
> Felix Kuehling (19):
>   drm/amdgpu: Remove unused interface from kfd2kgd interface
>   drm/amd: Update GFXv9 SDMA MQD structure
>   drm/amdgpu: Add GFXv9 TLB invalidation packet definition
>   drm/amdgpu: Add GFXv9 kfd2kgd interface functions
>   drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
>   drm/amdkfd: Make doorbell size ASIC-dependent
>   drm/amdkfd: Implement doorbell allocation for SOC15
>   drm/amdkfd: Move packet writer functions into ASIC-specific file
>   drm/amdkfd: Add GFXv9 PM4 packet writer functions
>   drm/amdkfd: Add GFXv9 MQD manager
>   drm/amdkfd: Add GFXv9 device queue manager
>   drm/amdkfd: Add SOC15 interrupt processing support
>   drm/amdkfd: Fix goto usage
>   drm/amdkfd: Fix kernel queue rollback_packet
>   drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
>   drm/amdkfd: Remove limit on number of GPUs (follow-up)
>   drm/amdkfd: Support flat memory apertures for GFXv9
>   drm/amdkfd: Add GFXv9 CWSR trap handler
>   drm/amdkfd: Add Vega10 topology and device info
>
> Harish Kasiviswanathan (1):
>   drm/amdkfd: Clean up KFD_MMAP_ offset handling
>
> welu (1):
>   drm/amdkfd: Try to enable atomics for all GPUs
>
>  MAINTAINERS                                        |    2 +
>  drivers/gpu/drm/amd/amdgpu/Makefile                |    3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   26 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c              |    1 +
>  drivers/gpu/drm/amd/amdgpu/soc15d.h                |    5 +
>  drivers/gpu/drm/amd/amdkfd/Makefile                |   10 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495
> ++++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |   42 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   11 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   89 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    2 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          |   65 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  119 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c    |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |   39 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h      |    7 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |    9 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    3 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    |  443 ++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  385 +----
>  drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h    |  583 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  106 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   40 +-
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
>  drivers/gpu/drm/amd/amdkfd/soc15_int.h             |   47 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   20 +-
>  drivers/gpu/drm/amd/include/v9_structs.h           |   48 +-
>  39 files changed, 5118 insertions(+), 501 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
>  create mode 100644
> drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h
>
> --
> 2.7.4
>
>

[-- Attachment #1.2: Type: text/html, Size: 5828 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
       [not found]     ` <1523395998-31314-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-04-24 21:42       ` Felix Kuehling
       [not found]         ` <ba222e85-d524-d611-0efe-11850ba48ff3-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-04-24 21:42 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 9755 bytes --]

A minor update to this patch is attached. The rest of the series is
unchanged and rebased cleanly on 4.17-rc2 on my system.

Regards,
  Felix


On 2018-04-10 05:33 PM, Felix Kuehling wrote:
> Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c         | 10 +++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c     | 25 +++++++++++++++++------
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h     |  7 ++++++-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |  9 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  9 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |  9 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h             |  1 +
>  7 files changed, 63 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 36c9269e..5d7cccc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
>  	}
>  }
>  
> +void write_kernel_doorbell64(void __iomem *db, u64 value)
> +{
> +	if (db) {
> +		WARN(((unsigned long)db & 7) != 0,
> +		     "Unaligned 64-bit doorbell");
> +		writeq(value, (u64 __iomem *)db);
> +		pr_debug("writing %llu to doorbell address 0x%p\n", value, db);+	}
> +}
> +
>  unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
>  					struct kfd_process *process,
>  					unsigned int doorbell_id)
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> index 9f38161..476951d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
> @@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
>  	kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
>  	kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
>  
> -	retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel),
> +	retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size,
>  					&kq->wptr_mem);
>  
>  	if (retval != 0)
> @@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>  	size_t available_size;
>  	size_t queue_size_dwords;
>  	uint32_t wptr, rptr;
> +	uint64_t wptr64;
>  	unsigned int *queue_address;
>  
>  	/* When rptr == wptr, the buffer is empty.
> @@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>  	 * the opposite. So we can only use up to queue_size_dwords - 1 dwords.
>  	 */
>  	rptr = *kq->rptr_kernel;
> -	wptr = *kq->wptr_kernel;
> +	wptr = kq->pending_wptr;
> +	wptr64 = kq->pending_wptr64;
>  	queue_address = (unsigned int *)kq->pq_kernel_addr;
>  	queue_size_dwords = kq->queue->properties.queue_size / 4;
>  
> @@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>  		while (wptr > 0) {
>  			queue_address[wptr] = kq->nop_packet;
>  			wptr = (wptr + 1) % queue_size_dwords;
> +			wptr64++;
>  		}
>  	}
>  
>  	*buffer_ptr = &queue_address[wptr];
>  	kq->pending_wptr = wptr + packet_size_in_dwords;
> +	kq->pending_wptr64 = wptr64 + packet_size_in_dwords;
>  
>  	return 0;
>  
> @@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq)
>  	pr_debug("\n");
>  #endif
>  
> -	*kq->wptr_kernel = kq->pending_wptr;
> -	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
> -				kq->pending_wptr);
> +	kq->ops_asic_specific.submit_packet(kq);
>  }
>  
>  static void rollback_packet(struct kernel_queue *kq)
>  {
> -	kq->pending_wptr = *kq->wptr_kernel;
> +	if (kq->dev->device_info->doorbell_size == 8) {
> +		kq->pending_wptr64 = *kq->wptr64_kernel;
> +		kq->pending_wptr = *kq->wptr_kernel %
> +			(kq->queue->properties.queue_size / 4);
> +	} else {
> +		kq->pending_wptr = *kq->wptr_kernel;
> +	}
>  }
>  
>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
> @@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>  	case CHIP_HAWAII:
>  		kernel_queue_init_cik(&kq->ops_asic_specific);
>  		break;
> +
> +	case CHIP_VEGA10:
> +	case CHIP_RAVEN:
> +		kernel_queue_init_v9(&kq->ops_asic_specific);
> +		break;
>  	default:
>  		WARN(1, "Unexpected ASIC family %u",
>  		     dev->device_info->asic_family);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> index 5940531..97aff20 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
> @@ -72,6 +72,7 @@ struct kernel_queue {
>  	struct kfd_dev		*dev;
>  	struct mqd_manager	*mqd;
>  	struct queue		*queue;
> +	uint64_t		pending_wptr64;
>  	uint32_t		pending_wptr;
>  	unsigned int		nop_packet;
>  
> @@ -79,7 +80,10 @@ struct kernel_queue {
>  	uint32_t		*rptr_kernel;
>  	uint64_t		rptr_gpu_addr;
>  	struct kfd_mem_obj	*wptr_mem;
> -	uint32_t		*wptr_kernel;
> +	union {
> +		uint64_t	*wptr64_kernel;
> +		uint32_t	*wptr_kernel;
> +	};
>  	uint64_t		wptr_gpu_addr;
>  	struct kfd_mem_obj	*pq;
>  	uint64_t		pq_gpu_addr;
> @@ -97,5 +101,6 @@ struct kernel_queue {
>  
>  void kernel_queue_init_cik(struct kernel_queue_ops *ops);
>  void kernel_queue_init_vi(struct kernel_queue_ops *ops);
> +void kernel_queue_init_v9(struct kernel_queue_ops *ops);
>  
>  #endif /* KFD_KERNEL_QUEUE_H_ */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
> index a90eb44..19e54ac 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
> @@ -26,11 +26,13 @@
>  static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
>  			enum kfd_queue_type type, unsigned int queue_size);
>  static void uninitialize_cik(struct kernel_queue *kq);
> +static void submit_packet_cik(struct kernel_queue *kq);
>  
>  void kernel_queue_init_cik(struct kernel_queue_ops *ops)
>  {
>  	ops->initialize = initialize_cik;
>  	ops->uninitialize = uninitialize_cik;
> +	ops->submit_packet = submit_packet_cik;
>  }
>  
>  static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
> @@ -42,3 +44,10 @@ static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
>  static void uninitialize_cik(struct kernel_queue *kq)
>  {
>  }
> +
> +static void submit_packet_cik(struct kernel_queue *kq)
> +{
> +	*kq->wptr_kernel = kq->pending_wptr;
> +	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
> +				kq->pending_wptr);
> +}
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> index ece7d59..684a3bf 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
> @@ -29,11 +29,13 @@
>  static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
>  			enum kfd_queue_type type, unsigned int queue_size);
>  static void uninitialize_v9(struct kernel_queue *kq);
> +static void submit_packet_v9(struct kernel_queue *kq);
>  
>  void kernel_queue_init_v9(struct kernel_queue_ops *ops)
>  {
>  	ops->initialize = initialize_v9;
>  	ops->uninitialize = uninitialize_v9;
> +	ops->submit_packet = submit_packet_v9;
>  }
>  
>  static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
> @@ -58,6 +60,13 @@ static void uninitialize_v9(struct kernel_queue *kq)
>  	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
>  }
>  
> +static void submit_packet_v9(struct kernel_queue *kq)
> +{
> +	*kq->wptr64_kernel = kq->pending_wptr64;
> +	write_kernel_doorbell64(kq->queue->properties.doorbell_ptr,
> +				kq->pending_wptr64);
> +}
> +
>  static int pm_map_process_v9(struct packet_manager *pm,
>  		uint32_t *buffer, struct qcm_process_device *qpd)
>  {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
> index f9019ef..bf20c6d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
> @@ -29,11 +29,13 @@
>  static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
>  			enum kfd_queue_type type, unsigned int queue_size);
>  static void uninitialize_vi(struct kernel_queue *kq);
> +static void submit_packet_vi(struct kernel_queue *kq);
>  
>  void kernel_queue_init_vi(struct kernel_queue_ops *ops)
>  {
>  	ops->initialize = initialize_vi;
>  	ops->uninitialize = uninitialize_vi;
> +	ops->submit_packet = submit_packet_vi;
>  }
>  
>  static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
> @@ -58,6 +60,13 @@ static void uninitialize_vi(struct kernel_queue *kq)
>  	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
>  }
>  
> +static void submit_packet_vi(struct kernel_queue *kq)
> +{
> +	*kq->wptr_kernel = kq->pending_wptr;
> +	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
> +				kq->pending_wptr);
> +}
> +
>  unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size)
>  {
>  	union PM4_MES_TYPE_3_HEADER header;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 06b210b..10d5b54 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -769,6 +769,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>  void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
>  u32 read_kernel_doorbell(u32 __iomem *db);
>  void write_kernel_doorbell(void __iomem *db, u32 value);
> +void write_kernel_doorbell64(void __iomem *db, u64 value);
>  unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
>  					struct kfd_process *process,
>  					unsigned int doorbell_id);


[-- Attachment #2: 0001-drm-amdkfd-Add-64-bit-doorbell-and-wptr-support-to-k.patch --]
[-- Type: text/x-patch, Size: 9381 bytes --]

>From 6ef689698ee1599a5c72e2fbfa3c1b6b5e532cd9 Mon Sep 17 00:00:00 2001
From: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
Date: Sun, 8 Apr 2018 22:03:51 -0400
Subject: [PATCH 1/1] drm/amdkfd: Add 64-bit doorbell and wptr support to
 kernel queue

v2: Removed redundant 0x before %p.

Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c         | 10 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c     | 25 +++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h     |  7 ++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |  9 ++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h             |  1 +
 7 files changed, 63 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
index 36c9269e..c3744d8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
@@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
 	}
 }
 
+void write_kernel_doorbell64(void __iomem *db, u64 value)
+{
+	if (db) {
+		WARN(((unsigned long)db & 7) != 0,
+		     "Unaligned 64-bit doorbell");
+		writeq(value, (u64 __iomem *)db);
+		pr_debug("writing %llu to doorbell address %p\n", value, db);
+	}
+}
+
 unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
 					unsigned int doorbell_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
index 9f38161..476951d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
@@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
 	kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
 	kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
 
-	retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel),
+	retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size,
 					&kq->wptr_mem);
 
 	if (retval != 0)
@@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 	size_t available_size;
 	size_t queue_size_dwords;
 	uint32_t wptr, rptr;
+	uint64_t wptr64;
 	unsigned int *queue_address;
 
 	/* When rptr == wptr, the buffer is empty.
@@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 	 * the opposite. So we can only use up to queue_size_dwords - 1 dwords.
 	 */
 	rptr = *kq->rptr_kernel;
-	wptr = *kq->wptr_kernel;
+	wptr = kq->pending_wptr;
+	wptr64 = kq->pending_wptr64;
 	queue_address = (unsigned int *)kq->pq_kernel_addr;
 	queue_size_dwords = kq->queue->properties.queue_size / 4;
 
@@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
 		while (wptr > 0) {
 			queue_address[wptr] = kq->nop_packet;
 			wptr = (wptr + 1) % queue_size_dwords;
+			wptr64++;
 		}
 	}
 
 	*buffer_ptr = &queue_address[wptr];
 	kq->pending_wptr = wptr + packet_size_in_dwords;
+	kq->pending_wptr64 = wptr64 + packet_size_in_dwords;
 
 	return 0;
 
@@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq)
 	pr_debug("\n");
 #endif
 
-	*kq->wptr_kernel = kq->pending_wptr;
-	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
-				kq->pending_wptr);
+	kq->ops_asic_specific.submit_packet(kq);
 }
 
 static void rollback_packet(struct kernel_queue *kq)
 {
-	kq->pending_wptr = *kq->wptr_kernel;
+	if (kq->dev->device_info->doorbell_size == 8) {
+		kq->pending_wptr64 = *kq->wptr64_kernel;
+		kq->pending_wptr = *kq->wptr_kernel %
+			(kq->queue->properties.queue_size / 4);
+	} else {
+		kq->pending_wptr = *kq->wptr_kernel;
+	}
 }
 
 struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
@@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
 	case CHIP_HAWAII:
 		kernel_queue_init_cik(&kq->ops_asic_specific);
 		break;
+
+	case CHIP_VEGA10:
+	case CHIP_RAVEN:
+		kernel_queue_init_v9(&kq->ops_asic_specific);
+		break;
 	default:
 		WARN(1, "Unexpected ASIC family %u",
 		     dev->device_info->asic_family);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
index 5940531..97aff20 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
@@ -72,6 +72,7 @@ struct kernel_queue {
 	struct kfd_dev		*dev;
 	struct mqd_manager	*mqd;
 	struct queue		*queue;
+	uint64_t		pending_wptr64;
 	uint32_t		pending_wptr;
 	unsigned int		nop_packet;
 
@@ -79,7 +80,10 @@ struct kernel_queue {
 	uint32_t		*rptr_kernel;
 	uint64_t		rptr_gpu_addr;
 	struct kfd_mem_obj	*wptr_mem;
-	uint32_t		*wptr_kernel;
+	union {
+		uint64_t	*wptr64_kernel;
+		uint32_t	*wptr_kernel;
+	};
 	uint64_t		wptr_gpu_addr;
 	struct kfd_mem_obj	*pq;
 	uint64_t		pq_gpu_addr;
@@ -97,5 +101,6 @@ struct kernel_queue {
 
 void kernel_queue_init_cik(struct kernel_queue_ops *ops);
 void kernel_queue_init_vi(struct kernel_queue_ops *ops);
+void kernel_queue_init_v9(struct kernel_queue_ops *ops);
 
 #endif /* KFD_KERNEL_QUEUE_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
index a90eb44..19e54ac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
@@ -26,11 +26,13 @@
 static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_cik(struct kernel_queue *kq);
+static void submit_packet_cik(struct kernel_queue *kq);
 
 void kernel_queue_init_cik(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_cik;
 	ops->uninitialize = uninitialize_cik;
+	ops->submit_packet = submit_packet_cik;
 }
 
 static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -42,3 +44,10 @@ static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
 static void uninitialize_cik(struct kernel_queue *kq)
 {
 }
+
+static void submit_packet_cik(struct kernel_queue *kq)
+{
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr);
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
index ece7d59..684a3bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
@@ -29,11 +29,13 @@
 static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_v9(struct kernel_queue *kq);
+static void submit_packet_v9(struct kernel_queue *kq);
 
 void kernel_queue_init_v9(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_v9;
 	ops->uninitialize = uninitialize_v9;
+	ops->submit_packet = submit_packet_v9;
 }
 
 static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -58,6 +60,13 @@ static void uninitialize_v9(struct kernel_queue *kq)
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
 
+static void submit_packet_v9(struct kernel_queue *kq)
+{
+	*kq->wptr64_kernel = kq->pending_wptr64;
+	write_kernel_doorbell64(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr64);
+}
+
 static int pm_map_process_v9(struct packet_manager *pm,
 		uint32_t *buffer, struct qcm_process_device *qpd)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
index f9019ef..bf20c6d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
@@ -29,11 +29,13 @@
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
 			enum kfd_queue_type type, unsigned int queue_size);
 static void uninitialize_vi(struct kernel_queue *kq);
+static void submit_packet_vi(struct kernel_queue *kq);
 
 void kernel_queue_init_vi(struct kernel_queue_ops *ops)
 {
 	ops->initialize = initialize_vi;
 	ops->uninitialize = uninitialize_vi;
+	ops->submit_packet = submit_packet_vi;
 }
 
 static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
@@ -58,6 +60,13 @@ static void uninitialize_vi(struct kernel_queue *kq)
 	kfd_gtt_sa_free(kq->dev, kq->eop_mem);
 }
 
+static void submit_packet_vi(struct kernel_queue *kq)
+{
+	*kq->wptr_kernel = kq->pending_wptr;
+	write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
+				kq->pending_wptr);
+}
+
 unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size)
 {
 	union PM4_MES_TYPE_3_HEADER header;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 06b210b..10d5b54 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -769,6 +769,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
 void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
 u32 read_kernel_doorbell(u32 __iomem *db);
 void write_kernel_doorbell(void __iomem *db, u32 value);
+void write_kernel_doorbell64(void __iomem *db, u64 value);
 unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
 					struct kfd_process *process,
 					unsigned int doorbell_id);
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling
       [not found]     ` <1523395998-31314-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-11  8:52       ` Oded Gabbay
       [not found]         ` <CAFCwf12Z9zSADyx9k1ps4o8-W72N_nZ-mSznZLo-vbMF=8veLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11  8:52 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Wed, Apr 11, 2018 at 12:33 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>
> Use bit-rotate for better clarity and remove _MASK from the #defines as
> these represent mmap types.
>
> Centralize all the parsing of the mmap offset in kfd_mmap and add device
> parameter to doorbell and reserved_mem map functions.
>
> Encode gpu_id into upper bits of vm_pgoff. This frees up the lower bits
> for encoding the the doorbell ID on Vega10.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  | 35 ++++++++++++++++++----------
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c |  9 ++------
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c   |  2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h     | 38 ++++++++++++++++++++++++-------
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c  |  8 +++----
>  5 files changed, 59 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index b5e5f0e..f6b35f4 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -292,7 +292,8 @@ static int kfd_ioctl_create_queue(struct file *filep, struct kfd_process *p,
>
>
>         /* Return gpu_id as doorbell offset for mmap usage */
> -       args->doorbell_offset = (KFD_MMAP_DOORBELL_MASK | args->gpu_id);
> +       args->doorbell_offset = KFD_MMAP_TYPE_DOORBELL;
> +       args->doorbell_offset |= KFD_MMAP_GPU_ID(args->gpu_id);
>         args->doorbell_offset <<= PAGE_SHIFT;
>
>         mutex_unlock(&p->mutex);
> @@ -1645,23 +1646,33 @@ static long kfd_ioctl(struct file *filep, unsigned int cmd, unsigned long arg)
>  static int kfd_mmap(struct file *filp, struct vm_area_struct *vma)
>  {
>         struct kfd_process *process;
> +       struct kfd_dev *dev = NULL;
> +       unsigned long vm_pgoff;
> +       unsigned int gpu_id;
>
>         process = kfd_get_process(current);
>         if (IS_ERR(process))
>                 return PTR_ERR(process);
>
> -       if ((vma->vm_pgoff & KFD_MMAP_DOORBELL_MASK) ==
> -                       KFD_MMAP_DOORBELL_MASK) {
> -               vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_DOORBELL_MASK;
> -               return kfd_doorbell_mmap(process, vma);
> -       } else if ((vma->vm_pgoff & KFD_MMAP_EVENTS_MASK) ==
> -                       KFD_MMAP_EVENTS_MASK) {
> -               vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_EVENTS_MASK;
> +       vm_pgoff = vma->vm_pgoff;
> +       vma->vm_pgoff = KFD_MMAP_OFFSET_VALUE_GET(vm_pgoff);
> +       gpu_id = KFD_MMAP_GPU_ID_GET(vm_pgoff);
> +       if (gpu_id)
> +               dev = kfd_device_by_id(gpu_id);
> +
> +       switch (vm_pgoff & KFD_MMAP_TYPE_MASK) {
> +       case KFD_MMAP_TYPE_DOORBELL:
> +               if (!dev)
> +                       return -ENODEV;
> +               return kfd_doorbell_mmap(dev, process, vma);
> +
> +       case KFD_MMAP_TYPE_EVENTS:
>                 return kfd_event_mmap(process, vma);
> -       } else if ((vma->vm_pgoff & KFD_MMAP_RESERVED_MEM_MASK) ==
> -                       KFD_MMAP_RESERVED_MEM_MASK) {
> -               vma->vm_pgoff = vma->vm_pgoff ^ KFD_MMAP_RESERVED_MEM_MASK;
> -               return kfd_reserved_mem_mmap(process, vma);
> +
> +       case KFD_MMAP_TYPE_RESERVED_MEM:
> +               if (!dev)
> +                       return -ENODEV;
> +               return kfd_reserved_mem_mmap(dev, process, vma);
>         }
>
>         return -EFAULT;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> index 4840314..efc59de 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
> @@ -126,15 +126,10 @@ void kfd_doorbell_fini(struct kfd_dev *kfd)
>                 iounmap(kfd->doorbell_kernel_ptr);
>  }
>
> -int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma)
> +int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
> +                     struct vm_area_struct *vma)
>  {
>         phys_addr_t address;
> -       struct kfd_dev *dev;
> -
> -       /* Find kfd device according to gpu id */
> -       dev = kfd_device_by_id(vma->vm_pgoff);
> -       if (!dev)
> -               return -EINVAL;
>
>         /*
>          * For simplicitly we only allow mapping of the entire doorbell
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 4890a90..bccf2f7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -345,7 +345,7 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>         case KFD_EVENT_TYPE_DEBUG:
>                 ret = create_signal_event(devkfd, p, ev);
>                 if (!ret) {
> -                       *event_page_offset = KFD_MMAP_EVENTS_MASK;
> +                       *event_page_offset = KFD_MMAP_TYPE_EVENTS;
>                         *event_page_offset <<= PAGE_SHIFT;
>                         *event_slot_index = ev->event_id;
>                 }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index d9c0fe12..2d575c0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -41,9 +41,33 @@
>
>  #define KFD_SYSFS_FILE_MODE 0444
>
> -#define KFD_MMAP_DOORBELL_MASK 0x8000000000000ull
> -#define KFD_MMAP_EVENTS_MASK 0x4000000000000ull
> -#define KFD_MMAP_RESERVED_MEM_MASK 0x2000000000000ull
> +/* GPU ID hash width in bits */
> +#define KFD_GPU_ID_HASH_WIDTH 16
> +
> +/* Use upper bits of mmap offset to store KFD driver specific information.
> + * BITS[63:62] - Encode MMAP type
> + * BITS[61:46] - Encode gpu_id. To identify to which GPU the offset belongs to
> + * BITS[45:0]  - MMAP offset value
> + *
> + * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
> + *  defines are w.r.t to PAGE_SIZE
> + */
> +#define KFD_MMAP_TYPE_SHIFT    (62 - PAGE_SHIFT)
> +#define KFD_MMAP_TYPE_MASK     (0x3ULL << KFD_MMAP_TYPE_SHIFT)
> +#define KFD_MMAP_TYPE_DOORBELL (0x3ULL << KFD_MMAP_TYPE_SHIFT)
> +#define KFD_MMAP_TYPE_EVENTS   (0x2ULL << KFD_MMAP_TYPE_SHIFT)
> +#define KFD_MMAP_TYPE_RESERVED_MEM     (0x1ULL << KFD_MMAP_TYPE_SHIFT)

Isn't this new definition breaks existing user-space library (kfd thunk) ?
If that is the case we have a problem here.

Oded


> +
> +#define KFD_MMAP_GPU_ID_SHIFT (46 - PAGE_SHIFT)
> +#define KFD_MMAP_GPU_ID_MASK (((1ULL << KFD_GPU_ID_HASH_WIDTH) - 1) \
> +                               << KFD_MMAP_GPU_ID_SHIFT)
> +#define KFD_MMAP_GPU_ID(gpu_id) ((((uint64_t)gpu_id) << KFD_MMAP_GPU_ID_SHIFT)\
> +                               & KFD_MMAP_GPU_ID_MASK)
> +#define KFD_MMAP_GPU_ID_GET(offset)    ((offset & KFD_MMAP_GPU_ID_MASK) \
> +                               >> KFD_MMAP_GPU_ID_SHIFT)
> +
> +#define KFD_MMAP_OFFSET_VALUE_MASK     (0x3FFFFFFFFFFFULL >> PAGE_SHIFT)
> +#define KFD_MMAP_OFFSET_VALUE_GET(offset) (offset & KFD_MMAP_OFFSET_VALUE_MASK)
>
>  /*
>   * When working with cp scheduler we should assign the HIQ manually or via
> @@ -55,9 +79,6 @@
>  #define KFD_CIK_HIQ_PIPE 4
>  #define KFD_CIK_HIQ_QUEUE 0
>
> -/* GPU ID hash width in bits */
> -#define KFD_GPU_ID_HASH_WIDTH 16
> -
>  /* Macro for allocating structures */
>  #define kfd_alloc_struct(ptr_to_struct)        \
>         ((typeof(ptr_to_struct)) kzalloc(sizeof(*ptr_to_struct), GFP_KERNEL))
> @@ -698,7 +719,7 @@ struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>  struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>                                                         struct kfd_process *p);
>
> -int kfd_reserved_mem_mmap(struct kfd_process *process,
> +int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
>                           struct vm_area_struct *vma);
>
>  /* KFD process API for creating and translating handles */
> @@ -728,7 +749,8 @@ void kfd_pasid_free(unsigned int pasid);
>  /* Doorbells */
>  int kfd_doorbell_init(struct kfd_dev *kfd);
>  void kfd_doorbell_fini(struct kfd_dev *kfd);
> -int kfd_doorbell_mmap(struct kfd_process *process, struct vm_area_struct *vma);
> +int kfd_doorbell_mmap(struct kfd_dev *dev, struct kfd_process *process,
> +                     struct vm_area_struct *vma);
>  void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>                                         unsigned int *doorbell_off);
>  void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 2791e72..131fe2a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -451,7 +451,8 @@ static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
>                 if (!dev->cwsr_enabled || qpd->cwsr_kaddr || qpd->cwsr_base)
>                         continue;
>
> -               offset = (dev->id | KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
> +               offset = (KFD_MMAP_TYPE_RESERVED_MEM | KFD_MMAP_GPU_ID(dev->id))
> +                       << PAGE_SHIFT;
>                 qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
>                         KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
>                         MAP_SHARED, offset);
> @@ -989,15 +990,12 @@ int kfd_resume_all_processes(void)
>         return ret;
>  }
>
> -int kfd_reserved_mem_mmap(struct kfd_process *process,
> +int kfd_reserved_mem_mmap(struct kfd_dev *dev, struct kfd_process *process,
>                           struct vm_area_struct *vma)
>  {
> -       struct kfd_dev *dev = kfd_device_by_id(vma->vm_pgoff);
>         struct kfd_process_device *pdd;
>         struct qcm_process_device *qpd;
>
> -       if (!dev)
> -               return -EINVAL;
>         if ((vma->vm_end - vma->vm_start) != KFD_CWSR_TBA_TMA_SIZE) {
>                 pr_err("Incorrect CWSR mapping size.\n");
>                 return -EINVAL;
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
       [not found]     ` <1523395998-31314-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-11  9:10       ` Oded Gabbay
       [not found]         ` <CAFCwf12Lj17xZTk43bD3YEM9Gc=poNN_G7w+JxLbPU=EqH7Y5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11  9:10 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Jay Cornwall, amd-gfx list, John Bridgman

On Wed, Apr 11, 2018 at 12:33 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Signed-off-by: John Bridgman <john.bridgman@amd.com>
> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/Makefile             |   1 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c         |   2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c    |   3 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 ++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   3 +
>  5 files changed, 451 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
> index 52b3c1b..094b591 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -30,6 +30,7 @@ amdkfd-y      := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>                 kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
>                 kfd_process.o kfd_queue.o kfd_mqd_manager.o \
>                 kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
> +               kfd_mqd_manager_v9.o \
>                 kfd_kernel_queue.o kfd_kernel_queue_cik.o \
>                 kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
>                 kfd_packet_manager.o kfd_process_queue_manager.o \
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index f563acb..c368ce3 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
>         if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
>                 return -ENOMEM;
>
> -       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> +       *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
This assumes the patch in the userptr patch-set is applied. I changed
it to GFP_KERNEL for now.

>         if ((*mem_obj) == NULL)
>                 return -ENOMEM;
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> index ee7061e..4b8eb50 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
> @@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
>         case CHIP_POLARIS10:
>         case CHIP_POLARIS11:
>                 return mqd_manager_init_vi_tonga(type, dev);
> +       case CHIP_VEGA10:
> +       case CHIP_RAVEN:
> +               return mqd_manager_init_v9(type, dev);
>         default:
>                 WARN(1, "Unexpected ASIC family %u",
>                      dev->device_info->asic_family);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> new file mode 100644
> index 0000000..684054f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> @@ -0,0 +1,443 @@
> +/*
> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include "kfd_priv.h"
> +#include "kfd_mqd_manager.h"
> +#include "v9_structs.h"
> +#include "gc/gc_9_0_offset.h"
> +#include "gc/gc_9_0_sh_mask.h"
> +#include "sdma0/sdma0_4_0_sh_mask.h"
> +
> +static inline struct v9_mqd *get_mqd(void *mqd)
> +{
> +       return (struct v9_mqd *)mqd;
> +}
> +
> +static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
> +{
> +       return (struct v9_sdma_mqd *)mqd;
> +}
> +
> +static int init_mqd(struct mqd_manager *mm, void **mqd,
> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
> +                       struct queue_properties *q)
> +{
> +       int retval;
> +       uint64_t addr;
> +       struct v9_mqd *m;
> +       struct kfd_dev *kfd = mm->dev;
> +
> +       /* From V9,  for CWSR, the control stack is located on the next page
> +        * boundary after the mqd, we will use the gtt allocation function
> +        * instead of sub-allocation function.
> +        */
> +       if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
> +               *mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
Using GFP_NOIO directly is not recommended. Can we use the scope
functions instead ?

> +               if (!*mqd_mem_obj)
> +                       return -ENOMEM;
> +               retval = kfd->kfd2kgd->init_gtt_mem_allocation(kfd->kgd,
> +                       ALIGN(q->ctl_stack_size, PAGE_SIZE) +
> +                               ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
> +                       &((*mqd_mem_obj)->gtt_mem),
> +                       &((*mqd_mem_obj)->gpu_addr),
> +                       (void *)&((*mqd_mem_obj)->cpu_ptr));
> +       } else
> +               retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
> +                               mqd_mem_obj);
> +       if (retval != 0)
> +               return -ENOMEM;
> +
> +       m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
> +       addr = (*mqd_mem_obj)->gpu_addr;
> +
> +       memset(m, 0, sizeof(struct v9_mqd));
> +
> +       m->header = 0xC0310800;
> +       m->compute_pipelinestat_enable = 1;
> +       m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
> +       m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
> +       m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
> +       m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
> +
> +       m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |
> +                       0x53 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;
> +
> +       m->cp_mqd_control = 1 << CP_MQD_CONTROL__PRIV_STATE__SHIFT;
> +
> +       m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
> +       m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
> +
> +       m->cp_hqd_quantum = 1 << CP_HQD_QUANTUM__QUANTUM_EN__SHIFT |
> +                       1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
> +                       10 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
> +
> +       m->cp_hqd_pipe_priority = 1;
> +       m->cp_hqd_queue_priority = 15;
> +
> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
> +               m->cp_hqd_aql_control =
> +                       1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
> +       }
> +
> +       if (q->tba_addr) {
> +               m->compute_pgm_rsrc2 |=
> +                       (1 << COMPUTE_PGM_RSRC2__TRAP_PRESENT__SHIFT);
> +       }
> +
> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address) {
> +               m->cp_hqd_persistent_state |=
> +                       (1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
> +               m->cp_hqd_ctx_save_base_addr_lo =
> +                       lower_32_bits(q->ctx_save_restore_area_address);
> +               m->cp_hqd_ctx_save_base_addr_hi =
> +                       upper_32_bits(q->ctx_save_restore_area_address);
> +               m->cp_hqd_ctx_save_size = q->ctx_save_restore_area_size;
> +               m->cp_hqd_cntl_stack_size = q->ctl_stack_size;
> +               m->cp_hqd_cntl_stack_offset = q->ctl_stack_size;
> +               m->cp_hqd_wg_state_offset = q->ctl_stack_size;
> +       }
> +
> +       *mqd = m;
> +       if (gart_addr)
> +               *gart_addr = addr;
> +       retval = mm->update_mqd(mm, m, q);
> +
> +       return retval;
> +}
> +
> +static int load_mqd(struct mqd_manager *mm, void *mqd,
> +                       uint32_t pipe_id, uint32_t queue_id,
> +                       struct queue_properties *p, struct mm_struct *mms)
> +{
> +       /* AQL write pointer counts in 64B packets, PM4/CP counts in dwords. */
> +       uint32_t wptr_shift = (p->format == KFD_QUEUE_FORMAT_AQL ? 4 : 0);
> +
> +       return mm->dev->kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id,
> +                                         (uint32_t __user *)p->write_ptr,
> +                                         wptr_shift, 0, mms);
> +}
> +
> +static int update_mqd(struct mqd_manager *mm, void *mqd,
> +                     struct queue_properties *q)
> +{
> +       struct v9_mqd *m;
> +
> +       m = get_mqd(mqd);
> +
> +       m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
> +       m->cp_hqd_pq_control |= order_base_2(q->queue_size / 4) - 1;
> +       pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
> +
> +       m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
> +       m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
> +
> +       m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
> +       m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
> +       m->cp_hqd_pq_wptr_poll_addr_lo = lower_32_bits((uint64_t)q->write_ptr);
> +       m->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits((uint64_t)q->write_ptr);
> +
> +       m->cp_hqd_pq_doorbell_control =
> +               q->doorbell_off <<
> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
> +       pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
> +                       m->cp_hqd_pq_doorbell_control);
> +
> +       m->cp_hqd_ib_control =
> +               3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
> +               1 << CP_HQD_IB_CONTROL__IB_EXE_DISABLE__SHIFT;
> +
> +       /*
> +        * HW does not clamp this field correctly. Maximum EOP queue size
> +        * is constrained by per-SE EOP done signal count, which is 8-bit.
> +        * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
> +        * more than (EOP entry count - 1) so a queue size of 0x800 dwords
> +        * is safe, giving a maximum field value of 0xA.
> +        */
> +       m->cp_hqd_eop_control = min(0xA,
> +               order_base_2(q->eop_ring_buffer_size / 4) - 1);
> +       m->cp_hqd_eop_base_addr_lo =
> +                       lower_32_bits(q->eop_ring_buffer_address >> 8);
> +       m->cp_hqd_eop_base_addr_hi =
> +                       upper_32_bits(q->eop_ring_buffer_address >> 8);
> +
> +       m->cp_hqd_iq_timer = 0;
> +
> +       m->cp_hqd_vmid = q->vmid;
> +
> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
> +               m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
> +                               2 << CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
> +                               1 << CP_HQD_PQ_CONTROL__QUEUE_FULL_EN__SHIFT |
> +                               1 << CP_HQD_PQ_CONTROL__WPP_CLAMP_EN__SHIFT;
> +               m->cp_hqd_pq_doorbell_control |= 1 <<
> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_BIF_DROP__SHIFT;
> +       }
> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address)
> +               m->cp_hqd_ctx_save_control = 0;
> +
> +       q->is_active = (q->queue_size > 0 &&
> +                       q->queue_address != 0 &&
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
> +
> +       return 0;
> +}
> +
> +
> +static int destroy_mqd(struct mqd_manager *mm, void *mqd,
> +                       enum kfd_preempt_type type,
> +                       unsigned int timeout, uint32_t pipe_id,
> +                       uint32_t queue_id)
> +{
> +       return mm->dev->kfd2kgd->hqd_destroy
> +               (mm->dev->kgd, mqd, type, timeout,
> +               pipe_id, queue_id);
> +}
> +
> +static void uninit_mqd(struct mqd_manager *mm, void *mqd,
> +                       struct kfd_mem_obj *mqd_mem_obj)
> +{
> +       struct kfd_dev *kfd = mm->dev;
> +
> +       if (mqd_mem_obj->gtt_mem) {
> +               kfd->kfd2kgd->free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
> +               kfree(mqd_mem_obj);
> +       } else {
> +               kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
> +       }
> +}
> +
> +static bool is_occupied(struct mqd_manager *mm, void *mqd,
> +                       uint64_t queue_address, uint32_t pipe_id,
> +                       uint32_t queue_id)
> +{
> +       return mm->dev->kfd2kgd->hqd_is_occupied(
> +               mm->dev->kgd, queue_address,
> +               pipe_id, queue_id);
> +}
> +
> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
> +                       struct queue_properties *q)
> +{
> +       struct v9_mqd *m;
> +       int retval = init_mqd(mm, mqd, mqd_mem_obj, gart_addr, q);
> +
> +       if (retval != 0)
> +               return retval;
> +
> +       m = get_mqd(*mqd);
> +
> +       m->cp_hqd_pq_control |= 1 << CP_HQD_PQ_CONTROL__PRIV_STATE__SHIFT |
> +                       1 << CP_HQD_PQ_CONTROL__KMD_QUEUE__SHIFT;
> +
> +       return retval;
> +}
> +
> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
> +                       struct queue_properties *q)
> +{
> +       struct v9_mqd *m;
> +       int retval = update_mqd(mm, mqd, q);
> +
> +       if (retval != 0)
> +               return retval;
> +
> +       /* TODO: what's the point? update_mqd already does this. */
> +       m = get_mqd(mqd);
> +       m->cp_hqd_vmid = q->vmid;
> +       return retval;
> +}
> +
> +static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
> +               struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
> +               struct queue_properties *q)
> +{
> +       int retval;
> +       struct v9_sdma_mqd *m;
> +
> +
> +       retval = kfd_gtt_sa_allocate(mm->dev,
> +                       sizeof(struct v9_sdma_mqd),
> +                       mqd_mem_obj);
> +
> +       if (retval != 0)
> +               return -ENOMEM;
> +
> +       m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
> +
> +       memset(m, 0, sizeof(struct v9_sdma_mqd));
> +
> +       *mqd = m;
> +       if (gart_addr)
> +               *gart_addr = (*mqd_mem_obj)->gpu_addr;
> +
> +       retval = mm->update_mqd(mm, m, q);
> +
> +       return retval;
> +}
> +
> +static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
> +               struct kfd_mem_obj *mqd_mem_obj)
> +{
> +       kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
> +}
> +
> +static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
> +               uint32_t pipe_id, uint32_t queue_id,
> +               struct queue_properties *p, struct mm_struct *mms)
> +{
> +       return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd,
> +                                              (uint32_t __user *)p->write_ptr,
> +                                              mms);
> +}
> +
> +#define SDMA_RLC_DUMMY_DEFAULT 0xf
> +
> +static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
> +               struct queue_properties *q)
> +{
> +       struct v9_sdma_mqd *m;
> +
> +       m = get_sdma_mqd(mqd);
> +       m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
> +               << SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
> +               q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
> +               1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
> +               6 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_TIMER__SHIFT;
> +
> +       m->sdmax_rlcx_rb_base = lower_32_bits(q->queue_address >> 8);
> +       m->sdmax_rlcx_rb_base_hi = upper_32_bits(q->queue_address >> 8);
> +       m->sdmax_rlcx_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
> +       m->sdmax_rlcx_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
> +       m->sdmax_rlcx_doorbell_offset =
> +               q->doorbell_off << SDMA0_RLC0_DOORBELL_OFFSET__OFFSET__SHIFT;
> +
> +       m->sdma_engine_id = q->sdma_engine_id;
> +       m->sdma_queue_id = q->sdma_queue_id;
> +       m->sdmax_rlcx_dummy_reg = SDMA_RLC_DUMMY_DEFAULT;
> +
> +       q->is_active = (q->queue_size > 0 &&
> +                       q->queue_address != 0 &&
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
> +
> +       return 0;
> +}
> +
> +/*
> + *  * preempt type here is ignored because there is only one way
> + *  * to preempt sdma queue
> + */
> +static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
> +               enum kfd_preempt_type type,
> +               unsigned int timeout, uint32_t pipe_id,
> +               uint32_t queue_id)
> +{
> +       return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
> +}
> +
> +static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
> +               uint64_t queue_address, uint32_t pipe_id,
> +               uint32_t queue_id)
> +{
> +       return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
> +}
> +
> +#if defined(CONFIG_DEBUG_FS)
> +
> +static int debugfs_show_mqd(struct seq_file *m, void *data)
> +{
> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
> +                    data, sizeof(struct v9_mqd), false);
> +       return 0;
> +}
> +
> +static int debugfs_show_mqd_sdma(struct seq_file *m, void *data)
> +{
> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
> +                    data, sizeof(struct v9_sdma_mqd), false);
> +       return 0;
> +}
> +
> +#endif
> +
> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
> +               struct kfd_dev *dev)
> +{
> +       struct mqd_manager *mqd;
> +
> +       if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
> +               return NULL;
> +
> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
Using GFP_NOIO directly is not recommended. Can we use the scope
functions instead ?

> +       if (!mqd)
> +               return NULL;
> +
> +       mqd->dev = dev;
> +
> +       switch (type) {
> +       case KFD_MQD_TYPE_CP:
> +       case KFD_MQD_TYPE_COMPUTE:
> +               mqd->init_mqd = init_mqd;
> +               mqd->uninit_mqd = uninit_mqd;
> +               mqd->load_mqd = load_mqd;
> +               mqd->update_mqd = update_mqd;
> +               mqd->destroy_mqd = destroy_mqd;
> +               mqd->is_occupied = is_occupied;
> +#if defined(CONFIG_DEBUG_FS)
> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
> +#endif
> +               break;
> +       case KFD_MQD_TYPE_HIQ:
> +               mqd->init_mqd = init_mqd_hiq;
> +               mqd->uninit_mqd = uninit_mqd;
> +               mqd->load_mqd = load_mqd;
> +               mqd->update_mqd = update_mqd_hiq;
> +               mqd->destroy_mqd = destroy_mqd;
> +               mqd->is_occupied = is_occupied;
> +#if defined(CONFIG_DEBUG_FS)
> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
> +#endif
> +               break;
> +       case KFD_MQD_TYPE_SDMA:
> +               mqd->init_mqd = init_mqd_sdma;
> +               mqd->uninit_mqd = uninit_mqd_sdma;
> +               mqd->load_mqd = load_mqd_sdma;
> +               mqd->update_mqd = update_mqd_sdma;
> +               mqd->destroy_mqd = destroy_mqd_sdma;
> +               mqd->is_occupied = is_occupied_sdma;
> +#if defined(CONFIG_DEBUG_FS)
> +               mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
> +#endif
> +               break;
> +       default:
> +               kfree(mqd);
> +               return NULL;
> +       }
> +
> +       return mqd;
> +}
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index b68299a..fac2882 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -197,6 +197,7 @@ struct kfd_mem_obj {
>         uint32_t range_end;
>         uint64_t gpu_addr;
>         uint32_t *cpu_ptr;
> +       void *gtt_mem;
>  };
>
>  struct kfd_vmid_info {
> @@ -822,6 +823,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
>                 struct kfd_dev *dev);
>  struct mqd_manager *mqd_manager_init_vi_tonga(enum KFD_MQD_TYPE type,
>                 struct kfd_dev *dev);
> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
> +               struct kfd_dev *dev);
>  struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
>  void device_queue_manager_uninit(struct device_queue_manager *dqm);
>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling
       [not found]         ` <CAFCwf12Z9zSADyx9k1ps4o8-W72N_nZ-mSznZLo-vbMF=8veLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-11 15:57           ` Felix Kuehling
       [not found]             ` <f7e5a2e1-8297-ab85-e190-4c0431e91080-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-05-11 15:57 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Harish Kasiviswanathan, amd-gfx list

On 2018-05-11 04:52 AM, Oded Gabbay wrote:
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -41,9 +41,33 @@
>>
>>  #define KFD_SYSFS_FILE_MODE 0444
>>
>> -#define KFD_MMAP_DOORBELL_MASK 0x8000000000000ull
>> -#define KFD_MMAP_EVENTS_MASK 0x4000000000000ull
>> -#define KFD_MMAP_RESERVED_MEM_MASK 0x2000000000000ull
>> +/* GPU ID hash width in bits */
>> +#define KFD_GPU_ID_HASH_WIDTH 16
>> +
>> +/* Use upper bits of mmap offset to store KFD driver specific information.
>> + * BITS[63:62] - Encode MMAP type
>> + * BITS[61:46] - Encode gpu_id. To identify to which GPU the offset belongs to
>> + * BITS[45:0]  - MMAP offset value
>> + *
>> + * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
>> + *  defines are w.r.t to PAGE_SIZE
>> + */
>> +#define KFD_MMAP_TYPE_SHIFT    (62 - PAGE_SHIFT)
>> +#define KFD_MMAP_TYPE_MASK     (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>> +#define KFD_MMAP_TYPE_DOORBELL (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>> +#define KFD_MMAP_TYPE_EVENTS   (0x2ULL << KFD_MMAP_TYPE_SHIFT)
>> +#define KFD_MMAP_TYPE_RESERVED_MEM     (0x1ULL << KFD_MMAP_TYPE_SHIFT)
> Isn't this new definition breaks existing user-space library (kfd thunk) ?
> If that is the case we have a problem here.

No, this does not break user mode, because user mode isn't aware of
these definitions at all. The mmap offset comes from kernel mode and is
opaque to user mode. User mode just passes it back down to kernel mode
in the offset parameter of mmap. So the kernel can change the encoding
of the mmap offset without breaking user mode.

Regards,
  Felix

>
> Oded
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
       [not found]         ` <CAFCwf12Lj17xZTk43bD3YEM9Gc=poNN_G7w+JxLbPU=EqH7Y5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-11 18:15           ` Felix Kuehling
       [not found]             ` <6c9f12ce-3b30-93bc-7a7b-499208861420-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-05-11 18:15 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Jay Cornwall, amd-gfx list, John Bridgman

This patch series was meant to be applied after the userptr changes. I
haven't tested this without the userptr changes.

I think your main concern about userptr is the use of GFP_NOIO. I
remember considering memalloc_noio_save/restore when I worked on this
over a year ago. I found an old email thread I had with Christian about
this (subject: MMU notifier deadlock on kernel 4.9):

> memalloc_noio_save doesn't affect kmalloc directly. It sets
> current->flags, which is used deep inside the page allocator, after
> lockdep_trace_alloc(gfp) flags a lock as being used with IO enabled. The
> slob allocator looks at gfp_allowed_mask, but I'm not sure how safe it
> is to change that in a driver and it doesn't seem to be an exported
> symbol anyway.
Maybe this has changed in the mean time, but a year ago, using
memalloc_noio_save/restore may have prevented real deadlocks, but would
not shut up the lockdep warnings. Maybe this has changed in the mean
time. FWIW, I don't find lockdep_trace_alloc in the current kernel.

Regards,
  Felix


On 2018-05-11 05:10 AM, Oded Gabbay wrote:
> On Wed, Apr 11, 2018 at 12:33 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>> Signed-off-by: John Bridgman <john.bridgman@amd.com>
>> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/Makefile             |   1 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_device.c         |   2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c    |   3 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 ++++++++++++++++++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   3 +
>>  5 files changed, 451 insertions(+), 1 deletion(-)
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
>> index 52b3c1b..094b591 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>> @@ -30,6 +30,7 @@ amdkfd-y      := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>>                 kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
>>                 kfd_process.o kfd_queue.o kfd_mqd_manager.o \
>>                 kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
>> +               kfd_mqd_manager_v9.o \
>>                 kfd_kernel_queue.o kfd_kernel_queue_cik.o \
>>                 kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
>>                 kfd_packet_manager.o kfd_process_queue_manager.o \
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index f563acb..c368ce3 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
>>         if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
>>                 return -ENOMEM;
>>
>> -       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>> +       *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> This assumes the patch in the userptr patch-set is applied. I changed
> it to GFP_KERNEL for now.
>
>>         if ((*mem_obj) == NULL)
>>                 return -ENOMEM;
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>> index ee7061e..4b8eb50 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>> @@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
>>         case CHIP_POLARIS10:
>>         case CHIP_POLARIS11:
>>                 return mqd_manager_init_vi_tonga(type, dev);
>> +       case CHIP_VEGA10:
>> +       case CHIP_RAVEN:
>> +               return mqd_manager_init_v9(type, dev);
>>         default:
>>                 WARN(1, "Unexpected ASIC family %u",
>>                      dev->device_info->asic_family);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> new file mode 100644
>> index 0000000..684054f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>> @@ -0,0 +1,443 @@
>> +/*
>> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a
>> + * copy of this software and associated documentation files (the "Software"),
>> + * to deal in the Software without restriction, including without limitation
>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + *
>> + */
>> +
>> +#include <linux/printk.h>
>> +#include <linux/slab.h>
>> +#include <linux/uaccess.h>
>> +#include "kfd_priv.h"
>> +#include "kfd_mqd_manager.h"
>> +#include "v9_structs.h"
>> +#include "gc/gc_9_0_offset.h"
>> +#include "gc/gc_9_0_sh_mask.h"
>> +#include "sdma0/sdma0_4_0_sh_mask.h"
>> +
>> +static inline struct v9_mqd *get_mqd(void *mqd)
>> +{
>> +       return (struct v9_mqd *)mqd;
>> +}
>> +
>> +static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
>> +{
>> +       return (struct v9_sdma_mqd *)mqd;
>> +}
>> +
>> +static int init_mqd(struct mqd_manager *mm, void **mqd,
>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>> +                       struct queue_properties *q)
>> +{
>> +       int retval;
>> +       uint64_t addr;
>> +       struct v9_mqd *m;
>> +       struct kfd_dev *kfd = mm->dev;
>> +
>> +       /* From V9,  for CWSR, the control stack is located on the next page
>> +        * boundary after the mqd, we will use the gtt allocation function
>> +        * instead of sub-allocation function.
>> +        */
>> +       if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
>> +               *mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
> Using GFP_NOIO directly is not recommended. Can we use the scope
> functions instead ?
>
>> +               if (!*mqd_mem_obj)
>> +                       return -ENOMEM;
>> +               retval = kfd->kfd2kgd->init_gtt_mem_allocation(kfd->kgd,
>> +                       ALIGN(q->ctl_stack_size, PAGE_SIZE) +
>> +                               ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
>> +                       &((*mqd_mem_obj)->gtt_mem),
>> +                       &((*mqd_mem_obj)->gpu_addr),
>> +                       (void *)&((*mqd_mem_obj)->cpu_ptr));
>> +       } else
>> +               retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
>> +                               mqd_mem_obj);
>> +       if (retval != 0)
>> +               return -ENOMEM;
>> +
>> +       m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
>> +       addr = (*mqd_mem_obj)->gpu_addr;
>> +
>> +       memset(m, 0, sizeof(struct v9_mqd));
>> +
>> +       m->header = 0xC0310800;
>> +       m->compute_pipelinestat_enable = 1;
>> +       m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
>> +       m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
>> +       m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
>> +       m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
>> +
>> +       m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |
>> +                       0x53 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;
>> +
>> +       m->cp_mqd_control = 1 << CP_MQD_CONTROL__PRIV_STATE__SHIFT;
>> +
>> +       m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
>> +       m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
>> +
>> +       m->cp_hqd_quantum = 1 << CP_HQD_QUANTUM__QUANTUM_EN__SHIFT |
>> +                       1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
>> +                       10 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
>> +
>> +       m->cp_hqd_pipe_priority = 1;
>> +       m->cp_hqd_queue_priority = 15;
>> +
>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>> +               m->cp_hqd_aql_control =
>> +                       1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
>> +       }
>> +
>> +       if (q->tba_addr) {
>> +               m->compute_pgm_rsrc2 |=
>> +                       (1 << COMPUTE_PGM_RSRC2__TRAP_PRESENT__SHIFT);
>> +       }
>> +
>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address) {
>> +               m->cp_hqd_persistent_state |=
>> +                       (1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
>> +               m->cp_hqd_ctx_save_base_addr_lo =
>> +                       lower_32_bits(q->ctx_save_restore_area_address);
>> +               m->cp_hqd_ctx_save_base_addr_hi =
>> +                       upper_32_bits(q->ctx_save_restore_area_address);
>> +               m->cp_hqd_ctx_save_size = q->ctx_save_restore_area_size;
>> +               m->cp_hqd_cntl_stack_size = q->ctl_stack_size;
>> +               m->cp_hqd_cntl_stack_offset = q->ctl_stack_size;
>> +               m->cp_hqd_wg_state_offset = q->ctl_stack_size;
>> +       }
>> +
>> +       *mqd = m;
>> +       if (gart_addr)
>> +               *gart_addr = addr;
>> +       retval = mm->update_mqd(mm, m, q);
>> +
>> +       return retval;
>> +}
>> +
>> +static int load_mqd(struct mqd_manager *mm, void *mqd,
>> +                       uint32_t pipe_id, uint32_t queue_id,
>> +                       struct queue_properties *p, struct mm_struct *mms)
>> +{
>> +       /* AQL write pointer counts in 64B packets, PM4/CP counts in dwords. */
>> +       uint32_t wptr_shift = (p->format == KFD_QUEUE_FORMAT_AQL ? 4 : 0);
>> +
>> +       return mm->dev->kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id,
>> +                                         (uint32_t __user *)p->write_ptr,
>> +                                         wptr_shift, 0, mms);
>> +}
>> +
>> +static int update_mqd(struct mqd_manager *mm, void *mqd,
>> +                     struct queue_properties *q)
>> +{
>> +       struct v9_mqd *m;
>> +
>> +       m = get_mqd(mqd);
>> +
>> +       m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
>> +       m->cp_hqd_pq_control |= order_base_2(q->queue_size / 4) - 1;
>> +       pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
>> +
>> +       m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
>> +       m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
>> +
>> +       m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>> +       m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>> +       m->cp_hqd_pq_wptr_poll_addr_lo = lower_32_bits((uint64_t)q->write_ptr);
>> +       m->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits((uint64_t)q->write_ptr);
>> +
>> +       m->cp_hqd_pq_doorbell_control =
>> +               q->doorbell_off <<
>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
>> +       pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
>> +                       m->cp_hqd_pq_doorbell_control);
>> +
>> +       m->cp_hqd_ib_control =
>> +               3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
>> +               1 << CP_HQD_IB_CONTROL__IB_EXE_DISABLE__SHIFT;
>> +
>> +       /*
>> +        * HW does not clamp this field correctly. Maximum EOP queue size
>> +        * is constrained by per-SE EOP done signal count, which is 8-bit.
>> +        * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
>> +        * more than (EOP entry count - 1) so a queue size of 0x800 dwords
>> +        * is safe, giving a maximum field value of 0xA.
>> +        */
>> +       m->cp_hqd_eop_control = min(0xA,
>> +               order_base_2(q->eop_ring_buffer_size / 4) - 1);
>> +       m->cp_hqd_eop_base_addr_lo =
>> +                       lower_32_bits(q->eop_ring_buffer_address >> 8);
>> +       m->cp_hqd_eop_base_addr_hi =
>> +                       upper_32_bits(q->eop_ring_buffer_address >> 8);
>> +
>> +       m->cp_hqd_iq_timer = 0;
>> +
>> +       m->cp_hqd_vmid = q->vmid;
>> +
>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>> +               m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
>> +                               2 << CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
>> +                               1 << CP_HQD_PQ_CONTROL__QUEUE_FULL_EN__SHIFT |
>> +                               1 << CP_HQD_PQ_CONTROL__WPP_CLAMP_EN__SHIFT;
>> +               m->cp_hqd_pq_doorbell_control |= 1 <<
>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_BIF_DROP__SHIFT;
>> +       }
>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address)
>> +               m->cp_hqd_ctx_save_control = 0;
>> +
>> +       q->is_active = (q->queue_size > 0 &&
>> +                       q->queue_address != 0 &&
>> +                       q->queue_percent > 0 &&
>> +                       !q->is_evicted);
>> +
>> +       return 0;
>> +}
>> +
>> +
>> +static int destroy_mqd(struct mqd_manager *mm, void *mqd,
>> +                       enum kfd_preempt_type type,
>> +                       unsigned int timeout, uint32_t pipe_id,
>> +                       uint32_t queue_id)
>> +{
>> +       return mm->dev->kfd2kgd->hqd_destroy
>> +               (mm->dev->kgd, mqd, type, timeout,
>> +               pipe_id, queue_id);
>> +}
>> +
>> +static void uninit_mqd(struct mqd_manager *mm, void *mqd,
>> +                       struct kfd_mem_obj *mqd_mem_obj)
>> +{
>> +       struct kfd_dev *kfd = mm->dev;
>> +
>> +       if (mqd_mem_obj->gtt_mem) {
>> +               kfd->kfd2kgd->free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
>> +               kfree(mqd_mem_obj);
>> +       } else {
>> +               kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>> +       }
>> +}
>> +
>> +static bool is_occupied(struct mqd_manager *mm, void *mqd,
>> +                       uint64_t queue_address, uint32_t pipe_id,
>> +                       uint32_t queue_id)
>> +{
>> +       return mm->dev->kfd2kgd->hqd_is_occupied(
>> +               mm->dev->kgd, queue_address,
>> +               pipe_id, queue_id);
>> +}
>> +
>> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>> +                       struct queue_properties *q)
>> +{
>> +       struct v9_mqd *m;
>> +       int retval = init_mqd(mm, mqd, mqd_mem_obj, gart_addr, q);
>> +
>> +       if (retval != 0)
>> +               return retval;
>> +
>> +       m = get_mqd(*mqd);
>> +
>> +       m->cp_hqd_pq_control |= 1 << CP_HQD_PQ_CONTROL__PRIV_STATE__SHIFT |
>> +                       1 << CP_HQD_PQ_CONTROL__KMD_QUEUE__SHIFT;
>> +
>> +       return retval;
>> +}
>> +
>> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
>> +                       struct queue_properties *q)
>> +{
>> +       struct v9_mqd *m;
>> +       int retval = update_mqd(mm, mqd, q);
>> +
>> +       if (retval != 0)
>> +               return retval;
>> +
>> +       /* TODO: what's the point? update_mqd already does this. */
>> +       m = get_mqd(mqd);
>> +       m->cp_hqd_vmid = q->vmid;
>> +       return retval;
>> +}
>> +
>> +static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
>> +               struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>> +               struct queue_properties *q)
>> +{
>> +       int retval;
>> +       struct v9_sdma_mqd *m;
>> +
>> +
>> +       retval = kfd_gtt_sa_allocate(mm->dev,
>> +                       sizeof(struct v9_sdma_mqd),
>> +                       mqd_mem_obj);
>> +
>> +       if (retval != 0)
>> +               return -ENOMEM;
>> +
>> +       m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
>> +
>> +       memset(m, 0, sizeof(struct v9_sdma_mqd));
>> +
>> +       *mqd = m;
>> +       if (gart_addr)
>> +               *gart_addr = (*mqd_mem_obj)->gpu_addr;
>> +
>> +       retval = mm->update_mqd(mm, m, q);
>> +
>> +       return retval;
>> +}
>> +
>> +static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
>> +               struct kfd_mem_obj *mqd_mem_obj)
>> +{
>> +       kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>> +}
>> +
>> +static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
>> +               uint32_t pipe_id, uint32_t queue_id,
>> +               struct queue_properties *p, struct mm_struct *mms)
>> +{
>> +       return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd,
>> +                                              (uint32_t __user *)p->write_ptr,
>> +                                              mms);
>> +}
>> +
>> +#define SDMA_RLC_DUMMY_DEFAULT 0xf
>> +
>> +static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
>> +               struct queue_properties *q)
>> +{
>> +       struct v9_sdma_mqd *m;
>> +
>> +       m = get_sdma_mqd(mqd);
>> +       m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
>> +               << SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
>> +               q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
>> +               1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
>> +               6 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_TIMER__SHIFT;
>> +
>> +       m->sdmax_rlcx_rb_base = lower_32_bits(q->queue_address >> 8);
>> +       m->sdmax_rlcx_rb_base_hi = upper_32_bits(q->queue_address >> 8);
>> +       m->sdmax_rlcx_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>> +       m->sdmax_rlcx_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>> +       m->sdmax_rlcx_doorbell_offset =
>> +               q->doorbell_off << SDMA0_RLC0_DOORBELL_OFFSET__OFFSET__SHIFT;
>> +
>> +       m->sdma_engine_id = q->sdma_engine_id;
>> +       m->sdma_queue_id = q->sdma_queue_id;
>> +       m->sdmax_rlcx_dummy_reg = SDMA_RLC_DUMMY_DEFAULT;
>> +
>> +       q->is_active = (q->queue_size > 0 &&
>> +                       q->queue_address != 0 &&
>> +                       q->queue_percent > 0 &&
>> +                       !q->is_evicted);
>> +
>> +       return 0;
>> +}
>> +
>> +/*
>> + *  * preempt type here is ignored because there is only one way
>> + *  * to preempt sdma queue
>> + */
>> +static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
>> +               enum kfd_preempt_type type,
>> +               unsigned int timeout, uint32_t pipe_id,
>> +               uint32_t queue_id)
>> +{
>> +       return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
>> +}
>> +
>> +static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
>> +               uint64_t queue_address, uint32_t pipe_id,
>> +               uint32_t queue_id)
>> +{
>> +       return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
>> +}
>> +
>> +#if defined(CONFIG_DEBUG_FS)
>> +
>> +static int debugfs_show_mqd(struct seq_file *m, void *data)
>> +{
>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>> +                    data, sizeof(struct v9_mqd), false);
>> +       return 0;
>> +}
>> +
>> +static int debugfs_show_mqd_sdma(struct seq_file *m, void *data)
>> +{
>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>> +                    data, sizeof(struct v9_sdma_mqd), false);
>> +       return 0;
>> +}
>> +
>> +#endif
>> +
>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>> +               struct kfd_dev *dev)
>> +{
>> +       struct mqd_manager *mqd;
>> +
>> +       if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
>> +               return NULL;
>> +
>> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
> Using GFP_NOIO directly is not recommended. Can we use the scope
> functions instead ?
>
>> +       if (!mqd)
>> +               return NULL;
>> +
>> +       mqd->dev = dev;
>> +
>> +       switch (type) {
>> +       case KFD_MQD_TYPE_CP:
>> +       case KFD_MQD_TYPE_COMPUTE:
>> +               mqd->init_mqd = init_mqd;
>> +               mqd->uninit_mqd = uninit_mqd;
>> +               mqd->load_mqd = load_mqd;
>> +               mqd->update_mqd = update_mqd;
>> +               mqd->destroy_mqd = destroy_mqd;
>> +               mqd->is_occupied = is_occupied;
>> +#if defined(CONFIG_DEBUG_FS)
>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>> +#endif
>> +               break;
>> +       case KFD_MQD_TYPE_HIQ:
>> +               mqd->init_mqd = init_mqd_hiq;
>> +               mqd->uninit_mqd = uninit_mqd;
>> +               mqd->load_mqd = load_mqd;
>> +               mqd->update_mqd = update_mqd_hiq;
>> +               mqd->destroy_mqd = destroy_mqd;
>> +               mqd->is_occupied = is_occupied;
>> +#if defined(CONFIG_DEBUG_FS)
>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>> +#endif
>> +               break;
>> +       case KFD_MQD_TYPE_SDMA:
>> +               mqd->init_mqd = init_mqd_sdma;
>> +               mqd->uninit_mqd = uninit_mqd_sdma;
>> +               mqd->load_mqd = load_mqd_sdma;
>> +               mqd->update_mqd = update_mqd_sdma;
>> +               mqd->destroy_mqd = destroy_mqd_sdma;
>> +               mqd->is_occupied = is_occupied_sdma;
>> +#if defined(CONFIG_DEBUG_FS)
>> +               mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
>> +#endif
>> +               break;
>> +       default:
>> +               kfree(mqd);
>> +               return NULL;
>> +       }
>> +
>> +       return mqd;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index b68299a..fac2882 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -197,6 +197,7 @@ struct kfd_mem_obj {
>>         uint32_t range_end;
>>         uint64_t gpu_addr;
>>         uint32_t *cpu_ptr;
>> +       void *gtt_mem;
>>  };
>>
>>  struct kfd_vmid_info {
>> @@ -822,6 +823,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
>>                 struct kfd_dev *dev);
>>  struct mqd_manager *mqd_manager_init_vi_tonga(enum KFD_MQD_TYPE type,
>>                 struct kfd_dev *dev);
>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>> +               struct kfd_dev *dev);
>>  struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
>>  void device_queue_manager_uninit(struct device_queue_manager *dqm);
>>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>> --
>> 2.7.4
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling
       [not found]             ` <f7e5a2e1-8297-ab85-e190-4c0431e91080-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-11 18:57               ` Oded Gabbay
  0 siblings, 0 replies; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11 18:57 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Fri, May 11, 2018 at 6:57 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> On 2018-05-11 04:52 AM, Oded Gabbay wrote:
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -41,9 +41,33 @@
>>>
>>>  #define KFD_SYSFS_FILE_MODE 0444
>>>
>>> -#define KFD_MMAP_DOORBELL_MASK 0x8000000000000ull
>>> -#define KFD_MMAP_EVENTS_MASK 0x4000000000000ull
>>> -#define KFD_MMAP_RESERVED_MEM_MASK 0x2000000000000ull
>>> +/* GPU ID hash width in bits */
>>> +#define KFD_GPU_ID_HASH_WIDTH 16
>>> +
>>> +/* Use upper bits of mmap offset to store KFD driver specific information.
>>> + * BITS[63:62] - Encode MMAP type
>>> + * BITS[61:46] - Encode gpu_id. To identify to which GPU the offset belongs to
>>> + * BITS[45:0]  - MMAP offset value
>>> + *
>>> + * NOTE: struct vm_area_struct.vm_pgoff uses offset in pages. Hence, these
>>> + *  defines are w.r.t to PAGE_SIZE
>>> + */
>>> +#define KFD_MMAP_TYPE_SHIFT    (62 - PAGE_SHIFT)
>>> +#define KFD_MMAP_TYPE_MASK     (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>>> +#define KFD_MMAP_TYPE_DOORBELL (0x3ULL << KFD_MMAP_TYPE_SHIFT)
>>> +#define KFD_MMAP_TYPE_EVENTS   (0x2ULL << KFD_MMAP_TYPE_SHIFT)
>>> +#define KFD_MMAP_TYPE_RESERVED_MEM     (0x1ULL << KFD_MMAP_TYPE_SHIFT)
>> Isn't this new definition breaks existing user-space library (kfd thunk) ?
>> If that is the case we have a problem here.
>
> No, this does not break user mode, because user mode isn't aware of
> these definitions at all. The mmap offset comes from kernel mode and is
> opaque to user mode. User mode just passes it back down to kernel mode
> in the offset parameter of mmap. So the kernel can change the encoding
> of the mmap offset without breaking user mode.
>
Oops, you are correct of course, I forgot. That's what happens after 4
years of not seeing that particular code :)

Oded

> Regards,
>   Felix
>
>>
>> Oded
>>
>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
       [not found]             ` <6c9f12ce-3b30-93bc-7a7b-499208861420-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-11 19:06               ` Oded Gabbay
       [not found]                 ` <CAFCwf10UquT1mycsc9HGveaX7rgwSJhez+e9F-N6E=GMFH=-GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11 19:06 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Jay Cornwall, amd-gfx list, John Bridgman

On Fri, May 11, 2018 at 9:15 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> This patch series was meant to be applied after the userptr changes. I
> haven't tested this without the userptr changes.
>
> I think your main concern about userptr is the use of GFP_NOIO. I
> remember considering memalloc_noio_save/restore when I worked on this
> over a year ago. I found an old email thread I had with Christian about
> this (subject: MMU notifier deadlock on kernel 4.9):
>
>> memalloc_noio_save doesn't affect kmalloc directly. It sets
>> current->flags, which is used deep inside the page allocator, after
>> lockdep_trace_alloc(gfp) flags a lock as being used with IO enabled. The
>> slob allocator looks at gfp_allowed_mask, but I'm not sure how safe it
>> is to change that in a driver and it doesn't seem to be an exported
>> symbol anyway.
> Maybe this has changed in the mean time, but a year ago, using
> memalloc_noio_save/restore may have prevented real deadlocks, but would
> not shut up the lockdep warnings. Maybe this has changed in the mean
> time. FWIW, I don't find lockdep_trace_alloc in the current kernel.
>
> Regards,
>   Felix

I'm not familiar with this API, but from reading about it, it seems a
more robust solution then to change the GFP flags directly in each
kmalloc from the reasons I mentioned in the original email I sent.
Having said that, if no one else objects and we say we will look at
moving to that API in the future, I don't object to taking your
patch-set as is now.

Oded

>
>
> On 2018-05-11 05:10 AM, Oded Gabbay wrote:
>> On Wed, Apr 11, 2018 at 12:33 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>> Signed-off-by: John Bridgman <john.bridgman@amd.com>
>>> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>  drivers/gpu/drm/amd/amdkfd/Makefile             |   1 +
>>>  drivers/gpu/drm/amd/amdkfd/kfd_device.c         |   2 +-
>>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c    |   3 +
>>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 ++++++++++++++++++++++++
>>>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   3 +
>>>  5 files changed, 451 insertions(+), 1 deletion(-)
>>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> index 52b3c1b..094b591 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> @@ -30,6 +30,7 @@ amdkfd-y      := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>>>                 kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
>>>                 kfd_process.o kfd_queue.o kfd_mqd_manager.o \
>>>                 kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
>>> +               kfd_mqd_manager_v9.o \
>>>                 kfd_kernel_queue.o kfd_kernel_queue_cik.o \
>>>                 kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
>>>                 kfd_packet_manager.o kfd_process_queue_manager.o \
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> index f563acb..c368ce3 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> @@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
>>>         if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
>>>                 return -ENOMEM;
>>>
>>> -       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>>> +       *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>> This assumes the patch in the userptr patch-set is applied. I changed
>> it to GFP_KERNEL for now.
>>
>>>         if ((*mem_obj) == NULL)
>>>                 return -ENOMEM;
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>> index ee7061e..4b8eb50 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>> @@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
>>>         case CHIP_POLARIS10:
>>>         case CHIP_POLARIS11:
>>>                 return mqd_manager_init_vi_tonga(type, dev);
>>> +       case CHIP_VEGA10:
>>> +       case CHIP_RAVEN:
>>> +               return mqd_manager_init_v9(type, dev);
>>>         default:
>>>                 WARN(1, "Unexpected ASIC family %u",
>>>                      dev->device_info->asic_family);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> new file mode 100644
>>> index 0000000..684054f
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>> @@ -0,0 +1,443 @@
>>> +/*
>>> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>> + * copy of this software and associated documentation files (the "Software"),
>>> + * to deal in the Software without restriction, including without limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + *
>>> + */
>>> +
>>> +#include <linux/printk.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/uaccess.h>
>>> +#include "kfd_priv.h"
>>> +#include "kfd_mqd_manager.h"
>>> +#include "v9_structs.h"
>>> +#include "gc/gc_9_0_offset.h"
>>> +#include "gc/gc_9_0_sh_mask.h"
>>> +#include "sdma0/sdma0_4_0_sh_mask.h"
>>> +
>>> +static inline struct v9_mqd *get_mqd(void *mqd)
>>> +{
>>> +       return (struct v9_mqd *)mqd;
>>> +}
>>> +
>>> +static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
>>> +{
>>> +       return (struct v9_sdma_mqd *)mqd;
>>> +}
>>> +
>>> +static int init_mqd(struct mqd_manager *mm, void **mqd,
>>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>> +                       struct queue_properties *q)
>>> +{
>>> +       int retval;
>>> +       uint64_t addr;
>>> +       struct v9_mqd *m;
>>> +       struct kfd_dev *kfd = mm->dev;
>>> +
>>> +       /* From V9,  for CWSR, the control stack is located on the next page
>>> +        * boundary after the mqd, we will use the gtt allocation function
>>> +        * instead of sub-allocation function.
>>> +        */
>>> +       if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
>>> +               *mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>> Using GFP_NOIO directly is not recommended. Can we use the scope
>> functions instead ?
>>
>>> +               if (!*mqd_mem_obj)
>>> +                       return -ENOMEM;
>>> +               retval = kfd->kfd2kgd->init_gtt_mem_allocation(kfd->kgd,
>>> +                       ALIGN(q->ctl_stack_size, PAGE_SIZE) +
>>> +                               ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
>>> +                       &((*mqd_mem_obj)->gtt_mem),
>>> +                       &((*mqd_mem_obj)->gpu_addr),
>>> +                       (void *)&((*mqd_mem_obj)->cpu_ptr));
>>> +       } else
>>> +               retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
>>> +                               mqd_mem_obj);
>>> +       if (retval != 0)
>>> +               return -ENOMEM;
>>> +
>>> +       m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
>>> +       addr = (*mqd_mem_obj)->gpu_addr;
>>> +
>>> +       memset(m, 0, sizeof(struct v9_mqd));
>>> +
>>> +       m->header = 0xC0310800;
>>> +       m->compute_pipelinestat_enable = 1;
>>> +       m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
>>> +       m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
>>> +       m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
>>> +       m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
>>> +
>>> +       m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |
>>> +                       0x53 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;
>>> +
>>> +       m->cp_mqd_control = 1 << CP_MQD_CONTROL__PRIV_STATE__SHIFT;
>>> +
>>> +       m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
>>> +       m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
>>> +
>>> +       m->cp_hqd_quantum = 1 << CP_HQD_QUANTUM__QUANTUM_EN__SHIFT |
>>> +                       1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
>>> +                       10 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
>>> +
>>> +       m->cp_hqd_pipe_priority = 1;
>>> +       m->cp_hqd_queue_priority = 15;
>>> +
>>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>>> +               m->cp_hqd_aql_control =
>>> +                       1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
>>> +       }
>>> +
>>> +       if (q->tba_addr) {
>>> +               m->compute_pgm_rsrc2 |=
>>> +                       (1 << COMPUTE_PGM_RSRC2__TRAP_PRESENT__SHIFT);
>>> +       }
>>> +
>>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address) {
>>> +               m->cp_hqd_persistent_state |=
>>> +                       (1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
>>> +               m->cp_hqd_ctx_save_base_addr_lo =
>>> +                       lower_32_bits(q->ctx_save_restore_area_address);
>>> +               m->cp_hqd_ctx_save_base_addr_hi =
>>> +                       upper_32_bits(q->ctx_save_restore_area_address);
>>> +               m->cp_hqd_ctx_save_size = q->ctx_save_restore_area_size;
>>> +               m->cp_hqd_cntl_stack_size = q->ctl_stack_size;
>>> +               m->cp_hqd_cntl_stack_offset = q->ctl_stack_size;
>>> +               m->cp_hqd_wg_state_offset = q->ctl_stack_size;
>>> +       }
>>> +
>>> +       *mqd = m;
>>> +       if (gart_addr)
>>> +               *gart_addr = addr;
>>> +       retval = mm->update_mqd(mm, m, q);
>>> +
>>> +       return retval;
>>> +}
>>> +
>>> +static int load_mqd(struct mqd_manager *mm, void *mqd,
>>> +                       uint32_t pipe_id, uint32_t queue_id,
>>> +                       struct queue_properties *p, struct mm_struct *mms)
>>> +{
>>> +       /* AQL write pointer counts in 64B packets, PM4/CP counts in dwords. */
>>> +       uint32_t wptr_shift = (p->format == KFD_QUEUE_FORMAT_AQL ? 4 : 0);
>>> +
>>> +       return mm->dev->kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id,
>>> +                                         (uint32_t __user *)p->write_ptr,
>>> +                                         wptr_shift, 0, mms);
>>> +}
>>> +
>>> +static int update_mqd(struct mqd_manager *mm, void *mqd,
>>> +                     struct queue_properties *q)
>>> +{
>>> +       struct v9_mqd *m;
>>> +
>>> +       m = get_mqd(mqd);
>>> +
>>> +       m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
>>> +       m->cp_hqd_pq_control |= order_base_2(q->queue_size / 4) - 1;
>>> +       pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
>>> +
>>> +       m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
>>> +       m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
>>> +
>>> +       m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>>> +       m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>>> +       m->cp_hqd_pq_wptr_poll_addr_lo = lower_32_bits((uint64_t)q->write_ptr);
>>> +       m->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits((uint64_t)q->write_ptr);
>>> +
>>> +       m->cp_hqd_pq_doorbell_control =
>>> +               q->doorbell_off <<
>>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
>>> +       pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
>>> +                       m->cp_hqd_pq_doorbell_control);
>>> +
>>> +       m->cp_hqd_ib_control =
>>> +               3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
>>> +               1 << CP_HQD_IB_CONTROL__IB_EXE_DISABLE__SHIFT;
>>> +
>>> +       /*
>>> +        * HW does not clamp this field correctly. Maximum EOP queue size
>>> +        * is constrained by per-SE EOP done signal count, which is 8-bit.
>>> +        * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
>>> +        * more than (EOP entry count - 1) so a queue size of 0x800 dwords
>>> +        * is safe, giving a maximum field value of 0xA.
>>> +        */
>>> +       m->cp_hqd_eop_control = min(0xA,
>>> +               order_base_2(q->eop_ring_buffer_size / 4) - 1);
>>> +       m->cp_hqd_eop_base_addr_lo =
>>> +                       lower_32_bits(q->eop_ring_buffer_address >> 8);
>>> +       m->cp_hqd_eop_base_addr_hi =
>>> +                       upper_32_bits(q->eop_ring_buffer_address >> 8);
>>> +
>>> +       m->cp_hqd_iq_timer = 0;
>>> +
>>> +       m->cp_hqd_vmid = q->vmid;
>>> +
>>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>>> +               m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
>>> +                               2 << CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
>>> +                               1 << CP_HQD_PQ_CONTROL__QUEUE_FULL_EN__SHIFT |
>>> +                               1 << CP_HQD_PQ_CONTROL__WPP_CLAMP_EN__SHIFT;
>>> +               m->cp_hqd_pq_doorbell_control |= 1 <<
>>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_BIF_DROP__SHIFT;
>>> +       }
>>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address)
>>> +               m->cp_hqd_ctx_save_control = 0;
>>> +
>>> +       q->is_active = (q->queue_size > 0 &&
>>> +                       q->queue_address != 0 &&
>>> +                       q->queue_percent > 0 &&
>>> +                       !q->is_evicted);
>>> +
>>> +       return 0;
>>> +}
>>> +
>>> +
>>> +static int destroy_mqd(struct mqd_manager *mm, void *mqd,
>>> +                       enum kfd_preempt_type type,
>>> +                       unsigned int timeout, uint32_t pipe_id,
>>> +                       uint32_t queue_id)
>>> +{
>>> +       return mm->dev->kfd2kgd->hqd_destroy
>>> +               (mm->dev->kgd, mqd, type, timeout,
>>> +               pipe_id, queue_id);
>>> +}
>>> +
>>> +static void uninit_mqd(struct mqd_manager *mm, void *mqd,
>>> +                       struct kfd_mem_obj *mqd_mem_obj)
>>> +{
>>> +       struct kfd_dev *kfd = mm->dev;
>>> +
>>> +       if (mqd_mem_obj->gtt_mem) {
>>> +               kfd->kfd2kgd->free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
>>> +               kfree(mqd_mem_obj);
>>> +       } else {
>>> +               kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>>> +       }
>>> +}
>>> +
>>> +static bool is_occupied(struct mqd_manager *mm, void *mqd,
>>> +                       uint64_t queue_address, uint32_t pipe_id,
>>> +                       uint32_t queue_id)
>>> +{
>>> +       return mm->dev->kfd2kgd->hqd_is_occupied(
>>> +               mm->dev->kgd, queue_address,
>>> +               pipe_id, queue_id);
>>> +}
>>> +
>>> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
>>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>> +                       struct queue_properties *q)
>>> +{
>>> +       struct v9_mqd *m;
>>> +       int retval = init_mqd(mm, mqd, mqd_mem_obj, gart_addr, q);
>>> +
>>> +       if (retval != 0)
>>> +               return retval;
>>> +
>>> +       m = get_mqd(*mqd);
>>> +
>>> +       m->cp_hqd_pq_control |= 1 << CP_HQD_PQ_CONTROL__PRIV_STATE__SHIFT |
>>> +                       1 << CP_HQD_PQ_CONTROL__KMD_QUEUE__SHIFT;
>>> +
>>> +       return retval;
>>> +}
>>> +
>>> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
>>> +                       struct queue_properties *q)
>>> +{
>>> +       struct v9_mqd *m;
>>> +       int retval = update_mqd(mm, mqd, q);
>>> +
>>> +       if (retval != 0)
>>> +               return retval;
>>> +
>>> +       /* TODO: what's the point? update_mqd already does this. */
>>> +       m = get_mqd(mqd);
>>> +       m->cp_hqd_vmid = q->vmid;
>>> +       return retval;
>>> +}
>>> +
>>> +static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
>>> +               struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>> +               struct queue_properties *q)
>>> +{
>>> +       int retval;
>>> +       struct v9_sdma_mqd *m;
>>> +
>>> +
>>> +       retval = kfd_gtt_sa_allocate(mm->dev,
>>> +                       sizeof(struct v9_sdma_mqd),
>>> +                       mqd_mem_obj);
>>> +
>>> +       if (retval != 0)
>>> +               return -ENOMEM;
>>> +
>>> +       m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
>>> +
>>> +       memset(m, 0, sizeof(struct v9_sdma_mqd));
>>> +
>>> +       *mqd = m;
>>> +       if (gart_addr)
>>> +               *gart_addr = (*mqd_mem_obj)->gpu_addr;
>>> +
>>> +       retval = mm->update_mqd(mm, m, q);
>>> +
>>> +       return retval;
>>> +}
>>> +
>>> +static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>> +               struct kfd_mem_obj *mqd_mem_obj)
>>> +{
>>> +       kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>>> +}
>>> +
>>> +static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>> +               uint32_t pipe_id, uint32_t queue_id,
>>> +               struct queue_properties *p, struct mm_struct *mms)
>>> +{
>>> +       return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd,
>>> +                                              (uint32_t __user *)p->write_ptr,
>>> +                                              mms);
>>> +}
>>> +
>>> +#define SDMA_RLC_DUMMY_DEFAULT 0xf
>>> +
>>> +static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>> +               struct queue_properties *q)
>>> +{
>>> +       struct v9_sdma_mqd *m;
>>> +
>>> +       m = get_sdma_mqd(mqd);
>>> +       m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
>>> +               << SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
>>> +               q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
>>> +               1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
>>> +               6 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_TIMER__SHIFT;
>>> +
>>> +       m->sdmax_rlcx_rb_base = lower_32_bits(q->queue_address >> 8);
>>> +       m->sdmax_rlcx_rb_base_hi = upper_32_bits(q->queue_address >> 8);
>>> +       m->sdmax_rlcx_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>>> +       m->sdmax_rlcx_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>>> +       m->sdmax_rlcx_doorbell_offset =
>>> +               q->doorbell_off << SDMA0_RLC0_DOORBELL_OFFSET__OFFSET__SHIFT;
>>> +
>>> +       m->sdma_engine_id = q->sdma_engine_id;
>>> +       m->sdma_queue_id = q->sdma_queue_id;
>>> +       m->sdmax_rlcx_dummy_reg = SDMA_RLC_DUMMY_DEFAULT;
>>> +
>>> +       q->is_active = (q->queue_size > 0 &&
>>> +                       q->queue_address != 0 &&
>>> +                       q->queue_percent > 0 &&
>>> +                       !q->is_evicted);
>>> +
>>> +       return 0;
>>> +}
>>> +
>>> +/*
>>> + *  * preempt type here is ignored because there is only one way
>>> + *  * to preempt sdma queue
>>> + */
>>> +static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>> +               enum kfd_preempt_type type,
>>> +               unsigned int timeout, uint32_t pipe_id,
>>> +               uint32_t queue_id)
>>> +{
>>> +       return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
>>> +}
>>> +
>>> +static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
>>> +               uint64_t queue_address, uint32_t pipe_id,
>>> +               uint32_t queue_id)
>>> +{
>>> +       return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
>>> +}
>>> +
>>> +#if defined(CONFIG_DEBUG_FS)
>>> +
>>> +static int debugfs_show_mqd(struct seq_file *m, void *data)
>>> +{
>>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>>> +                    data, sizeof(struct v9_mqd), false);
>>> +       return 0;
>>> +}
>>> +
>>> +static int debugfs_show_mqd_sdma(struct seq_file *m, void *data)
>>> +{
>>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>>> +                    data, sizeof(struct v9_sdma_mqd), false);
>>> +       return 0;
>>> +}
>>> +
>>> +#endif
>>> +
>>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>>> +               struct kfd_dev *dev)
>>> +{
>>> +       struct mqd_manager *mqd;
>>> +
>>> +       if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
>>> +               return NULL;
>>> +
>>> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
>> Using GFP_NOIO directly is not recommended. Can we use the scope
>> functions instead ?
>>
>>> +       if (!mqd)
>>> +               return NULL;
>>> +
>>> +       mqd->dev = dev;
>>> +
>>> +       switch (type) {
>>> +       case KFD_MQD_TYPE_CP:
>>> +       case KFD_MQD_TYPE_COMPUTE:
>>> +               mqd->init_mqd = init_mqd;
>>> +               mqd->uninit_mqd = uninit_mqd;
>>> +               mqd->load_mqd = load_mqd;
>>> +               mqd->update_mqd = update_mqd;
>>> +               mqd->destroy_mqd = destroy_mqd;
>>> +               mqd->is_occupied = is_occupied;
>>> +#if defined(CONFIG_DEBUG_FS)
>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>>> +#endif
>>> +               break;
>>> +       case KFD_MQD_TYPE_HIQ:
>>> +               mqd->init_mqd = init_mqd_hiq;
>>> +               mqd->uninit_mqd = uninit_mqd;
>>> +               mqd->load_mqd = load_mqd;
>>> +               mqd->update_mqd = update_mqd_hiq;
>>> +               mqd->destroy_mqd = destroy_mqd;
>>> +               mqd->is_occupied = is_occupied;
>>> +#if defined(CONFIG_DEBUG_FS)
>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>>> +#endif
>>> +               break;
>>> +       case KFD_MQD_TYPE_SDMA:
>>> +               mqd->init_mqd = init_mqd_sdma;
>>> +               mqd->uninit_mqd = uninit_mqd_sdma;
>>> +               mqd->load_mqd = load_mqd_sdma;
>>> +               mqd->update_mqd = update_mqd_sdma;
>>> +               mqd->destroy_mqd = destroy_mqd_sdma;
>>> +               mqd->is_occupied = is_occupied_sdma;
>>> +#if defined(CONFIG_DEBUG_FS)
>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
>>> +#endif
>>> +               break;
>>> +       default:
>>> +               kfree(mqd);
>>> +               return NULL;
>>> +       }
>>> +
>>> +       return mqd;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index b68299a..fac2882 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -197,6 +197,7 @@ struct kfd_mem_obj {
>>>         uint32_t range_end;
>>>         uint64_t gpu_addr;
>>>         uint32_t *cpu_ptr;
>>> +       void *gtt_mem;
>>>  };
>>>
>>>  struct kfd_vmid_info {
>>> @@ -822,6 +823,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
>>>                 struct kfd_dev *dev);
>>>  struct mqd_manager *mqd_manager_init_vi_tonga(enum KFD_MQD_TYPE type,
>>>                 struct kfd_dev *dev);
>>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>>> +               struct kfd_dev *dev);
>>>  struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
>>>  void device_queue_manager_uninit(struct device_queue_manager *dqm);
>>>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>>> --
>>> 2.7.4
>>>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
       [not found]         ` <ba222e85-d524-d611-0efe-11850ba48ff3-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-11 20:07           ` Oded Gabbay
  0 siblings, 0 replies; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11 20:07 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

applied this patch instead of original, thanks.

On Wed, Apr 25, 2018 at 12:42 AM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> A minor update to this patch is attached. The rest of the series is
> unchanged and rebased cleanly on 4.17-rc2 on my system.
>
> Regards,
>   Felix
>
>
> On 2018-04-10 05:33 PM, Felix Kuehling wrote:
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c         | 10 +++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c     | 25 +++++++++++++++++------
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h     |  7 ++++++-
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c |  9 ++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c  |  9 ++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c  |  9 ++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h             |  1 +
>>  7 files changed, 63 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> index 36c9269e..5d7cccc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c
>> @@ -214,6 +214,16 @@ void write_kernel_doorbell(void __iomem *db, u32 value)
>>       }
>>  }
>>
>> +void write_kernel_doorbell64(void __iomem *db, u64 value)
>> +{
>> +     if (db) {
>> +             WARN(((unsigned long)db & 7) != 0,
>> +                  "Unaligned 64-bit doorbell");
>> +             writeq(value, (u64 __iomem *)db);
>> +             pr_debug("writing %llu to doorbell address 0x%p\n", value, db);+        }
>> +}
>> +
>>  unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
>>                                       struct kfd_process *process,
>>                                       unsigned int doorbell_id)
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>> index 9f38161..476951d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c
>> @@ -99,7 +99,7 @@ static bool initialize(struct kernel_queue *kq, struct kfd_dev *dev,
>>       kq->rptr_kernel = kq->rptr_mem->cpu_ptr;
>>       kq->rptr_gpu_addr = kq->rptr_mem->gpu_addr;
>>
>> -     retval = kfd_gtt_sa_allocate(dev, sizeof(*kq->wptr_kernel),
>> +     retval = kfd_gtt_sa_allocate(dev, dev->device_info->doorbell_size,
>>                                       &kq->wptr_mem);
>>
>>       if (retval != 0)
>> @@ -208,6 +208,7 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>>       size_t available_size;
>>       size_t queue_size_dwords;
>>       uint32_t wptr, rptr;
>> +     uint64_t wptr64;
>>       unsigned int *queue_address;
>>
>>       /* When rptr == wptr, the buffer is empty.
>> @@ -216,7 +217,8 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>>        * the opposite. So we can only use up to queue_size_dwords - 1 dwords.
>>        */
>>       rptr = *kq->rptr_kernel;
>> -     wptr = *kq->wptr_kernel;
>> +     wptr = kq->pending_wptr;
>> +     wptr64 = kq->pending_wptr64;
>>       queue_address = (unsigned int *)kq->pq_kernel_addr;
>>       queue_size_dwords = kq->queue->properties.queue_size / 4;
>>
>> @@ -246,11 +248,13 @@ static int acquire_packet_buffer(struct kernel_queue *kq,
>>               while (wptr > 0) {
>>                       queue_address[wptr] = kq->nop_packet;
>>                       wptr = (wptr + 1) % queue_size_dwords;
>> +                     wptr64++;
>>               }
>>       }
>>
>>       *buffer_ptr = &queue_address[wptr];
>>       kq->pending_wptr = wptr + packet_size_in_dwords;
>> +     kq->pending_wptr64 = wptr64 + packet_size_in_dwords;
>>
>>       return 0;
>>
>> @@ -272,14 +276,18 @@ static void submit_packet(struct kernel_queue *kq)
>>       pr_debug("\n");
>>  #endif
>>
>> -     *kq->wptr_kernel = kq->pending_wptr;
>> -     write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
>> -                             kq->pending_wptr);
>> +     kq->ops_asic_specific.submit_packet(kq);
>>  }
>>
>>  static void rollback_packet(struct kernel_queue *kq)
>>  {
>> -     kq->pending_wptr = *kq->wptr_kernel;
>> +     if (kq->dev->device_info->doorbell_size == 8) {
>> +             kq->pending_wptr64 = *kq->wptr64_kernel;
>> +             kq->pending_wptr = *kq->wptr_kernel %
>> +                     (kq->queue->properties.queue_size / 4);
>> +     } else {
>> +             kq->pending_wptr = *kq->wptr_kernel;
>> +     }
>>  }
>>
>>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>> @@ -310,6 +318,11 @@ struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>>       case CHIP_HAWAII:
>>               kernel_queue_init_cik(&kq->ops_asic_specific);
>>               break;
>> +
>> +     case CHIP_VEGA10:
>> +     case CHIP_RAVEN:
>> +             kernel_queue_init_v9(&kq->ops_asic_specific);
>> +             break;
>>       default:
>>               WARN(1, "Unexpected ASIC family %u",
>>                    dev->device_info->asic_family);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
>> index 5940531..97aff20 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h
>> @@ -72,6 +72,7 @@ struct kernel_queue {
>>       struct kfd_dev          *dev;
>>       struct mqd_manager      *mqd;
>>       struct queue            *queue;
>> +     uint64_t                pending_wptr64;
>>       uint32_t                pending_wptr;
>>       unsigned int            nop_packet;
>>
>> @@ -79,7 +80,10 @@ struct kernel_queue {
>>       uint32_t                *rptr_kernel;
>>       uint64_t                rptr_gpu_addr;
>>       struct kfd_mem_obj      *wptr_mem;
>> -     uint32_t                *wptr_kernel;
>> +     union {
>> +             uint64_t        *wptr64_kernel;
>> +             uint32_t        *wptr_kernel;
>> +     };
>>       uint64_t                wptr_gpu_addr;
>>       struct kfd_mem_obj      *pq;
>>       uint64_t                pq_gpu_addr;
>> @@ -97,5 +101,6 @@ struct kernel_queue {
>>
>>  void kernel_queue_init_cik(struct kernel_queue_ops *ops);
>>  void kernel_queue_init_vi(struct kernel_queue_ops *ops);
>> +void kernel_queue_init_v9(struct kernel_queue_ops *ops);
>>
>>  #endif /* KFD_KERNEL_QUEUE_H_ */
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
>> index a90eb44..19e54ac 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c
>> @@ -26,11 +26,13 @@
>>  static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
>>                       enum kfd_queue_type type, unsigned int queue_size);
>>  static void uninitialize_cik(struct kernel_queue *kq);
>> +static void submit_packet_cik(struct kernel_queue *kq);
>>
>>  void kernel_queue_init_cik(struct kernel_queue_ops *ops)
>>  {
>>       ops->initialize = initialize_cik;
>>       ops->uninitialize = uninitialize_cik;
>> +     ops->submit_packet = submit_packet_cik;
>>  }
>>
>>  static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
>> @@ -42,3 +44,10 @@ static bool initialize_cik(struct kernel_queue *kq, struct kfd_dev *dev,
>>  static void uninitialize_cik(struct kernel_queue *kq)
>>  {
>>  }
>> +
>> +static void submit_packet_cik(struct kernel_queue *kq)
>> +{
>> +     *kq->wptr_kernel = kq->pending_wptr;
>> +     write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
>> +                             kq->pending_wptr);
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>> index ece7d59..684a3bf 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>> @@ -29,11 +29,13 @@
>>  static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
>>                       enum kfd_queue_type type, unsigned int queue_size);
>>  static void uninitialize_v9(struct kernel_queue *kq);
>> +static void submit_packet_v9(struct kernel_queue *kq);
>>
>>  void kernel_queue_init_v9(struct kernel_queue_ops *ops)
>>  {
>>       ops->initialize = initialize_v9;
>>       ops->uninitialize = uninitialize_v9;
>> +     ops->submit_packet = submit_packet_v9;
>>  }
>>
>>  static bool initialize_v9(struct kernel_queue *kq, struct kfd_dev *dev,
>> @@ -58,6 +60,13 @@ static void uninitialize_v9(struct kernel_queue *kq)
>>       kfd_gtt_sa_free(kq->dev, kq->eop_mem);
>>  }
>>
>> +static void submit_packet_v9(struct kernel_queue *kq)
>> +{
>> +     *kq->wptr64_kernel = kq->pending_wptr64;
>> +     write_kernel_doorbell64(kq->queue->properties.doorbell_ptr,
>> +                             kq->pending_wptr64);
>> +}
>> +
>>  static int pm_map_process_v9(struct packet_manager *pm,
>>               uint32_t *buffer, struct qcm_process_device *qpd)
>>  {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>> index f9019ef..bf20c6d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c
>> @@ -29,11 +29,13 @@
>>  static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
>>                       enum kfd_queue_type type, unsigned int queue_size);
>>  static void uninitialize_vi(struct kernel_queue *kq);
>> +static void submit_packet_vi(struct kernel_queue *kq);
>>
>>  void kernel_queue_init_vi(struct kernel_queue_ops *ops)
>>  {
>>       ops->initialize = initialize_vi;
>>       ops->uninitialize = uninitialize_vi;
>> +     ops->submit_packet = submit_packet_vi;
>>  }
>>
>>  static bool initialize_vi(struct kernel_queue *kq, struct kfd_dev *dev,
>> @@ -58,6 +60,13 @@ static void uninitialize_vi(struct kernel_queue *kq)
>>       kfd_gtt_sa_free(kq->dev, kq->eop_mem);
>>  }
>>
>> +static void submit_packet_vi(struct kernel_queue *kq)
>> +{
>> +     *kq->wptr_kernel = kq->pending_wptr;
>> +     write_kernel_doorbell(kq->queue->properties.doorbell_ptr,
>> +                             kq->pending_wptr);
>> +}
>> +
>>  unsigned int pm_build_pm4_header(unsigned int opcode, size_t packet_size)
>>  {
>>       union PM4_MES_TYPE_3_HEADER header;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 06b210b..10d5b54 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -769,6 +769,7 @@ void __iomem *kfd_get_kernel_doorbell(struct kfd_dev *kfd,
>>  void kfd_release_kernel_doorbell(struct kfd_dev *kfd, u32 __iomem *db_addr);
>>  u32 read_kernel_doorbell(u32 __iomem *db);
>>  void write_kernel_doorbell(void __iomem *db, u32 value);
>> +void write_kernel_doorbell64(void __iomem *db, u64 value);
>>  unsigned int kfd_doorbell_id_to_offset(struct kfd_dev *kfd,
>>                                       struct kfd_process *process,
>>                                       unsigned int doorbell_id);
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/21] GFXv9/Vega10 support for KFD
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (21 preceding siblings ...)
  2018-04-10 21:58   ` [PATCH 00/21] GFXv9/Vega10 support for KFD Oded Gabbay
@ 2018-05-11 20:08   ` Oded Gabbay
  2018-05-14 14:27   ` Tom Stellard
  23 siblings, 0 replies; 38+ messages in thread
From: Oded Gabbay @ 2018-05-11 20:08 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Apr 11, 2018 at 12:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> This patch series adds support for GFXv9 GPUs to KFD. In this series it
> enables support for Vega10. Raven support requires some extra work that
> will follow shortly, but Raven support is already included and I didn't
> go out of my way to keep it out.
>
> Felix Kuehling (19):
>   drm/amdgpu: Remove unused interface from kfd2kgd interface
>   drm/amd: Update GFXv9 SDMA MQD structure
>   drm/amdgpu: Add GFXv9 TLB invalidation packet definition
>   drm/amdgpu: Add GFXv9 kfd2kgd interface functions
>   drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
>   drm/amdkfd: Make doorbell size ASIC-dependent
>   drm/amdkfd: Implement doorbell allocation for SOC15
>   drm/amdkfd: Move packet writer functions into ASIC-specific file
>   drm/amdkfd: Add GFXv9 PM4 packet writer functions
>   drm/amdkfd: Add GFXv9 MQD manager
>   drm/amdkfd: Add GFXv9 device queue manager
>   drm/amdkfd: Add SOC15 interrupt processing support
>   drm/amdkfd: Fix goto usage
>   drm/amdkfd: Fix kernel queue rollback_packet
>   drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
>   drm/amdkfd: Remove limit on number of GPUs (follow-up)
>   drm/amdkfd: Support flat memory apertures for GFXv9
>   drm/amdkfd: Add GFXv9 CWSR trap handler
>   drm/amdkfd: Add Vega10 topology and device info
>
> Harish Kasiviswanathan (1):
>   drm/amdkfd: Clean up KFD_MMAP_ offset handling
>
> welu (1):
>   drm/amdkfd: Try to enable atomics for all GPUs
>
>  MAINTAINERS                                        |    2 +
>  drivers/gpu/drm/amd/amdgpu/Makefile                |    3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   26 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c              |    1 +
>  drivers/gpu/drm/amd/amdgpu/soc15d.h                |    5 +
>  drivers/gpu/drm/amd/amdkfd/Makefile                |   10 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 ++++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |   42 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   11 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   89 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    2 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          |   65 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  119 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c    |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |   39 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h      |    7 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |    9 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    3 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    |  443 ++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  385 +----
>  drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h    |  583 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  106 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   40 +-
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
>  drivers/gpu/drm/amd/amdkfd/soc15_int.h             |   47 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   20 +-
>  drivers/gpu/drm/amd/include/v9_structs.h           |   48 +-
>  39 files changed, 5118 insertions(+), 501 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h
>
> --
> 2.7.4
>

Series is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager
       [not found]                 ` <CAFCwf10UquT1mycsc9HGveaX7rgwSJhez+e9F-N6E=GMFH=-GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-11 21:50                   ` Felix Kuehling
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-05-11 21:50 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Jay Cornwall, amd-gfx list, John Bridgman

On 2018-05-11 03:06 PM, Oded Gabbay wrote:
> On Fri, May 11, 2018 at 9:15 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
>> This patch series was meant to be applied after the userptr changes. I
>> haven't tested this without the userptr changes.
>>
>> I think your main concern about userptr is the use of GFP_NOIO. I
>> remember considering memalloc_noio_save/restore when I worked on this
>> over a year ago. I found an old email thread I had with Christian about
>> this (subject: MMU notifier deadlock on kernel 4.9):
>>
>>> memalloc_noio_save doesn't affect kmalloc directly. It sets
>>> current->flags, which is used deep inside the page allocator, after
>>> lockdep_trace_alloc(gfp) flags a lock as being used with IO enabled. The
>>> slob allocator looks at gfp_allowed_mask, but I'm not sure how safe it
>>> is to change that in a driver and it doesn't seem to be an exported
>>> symbol anyway.
>> Maybe this has changed in the mean time, but a year ago, using
>> memalloc_noio_save/restore may have prevented real deadlocks, but would
>> not shut up the lockdep warnings. Maybe this has changed in the mean
>> time. FWIW, I don't find lockdep_trace_alloc in the current kernel.
>>
>> Regards,
>>   Felix
> I'm not familiar with this API, but from reading about it, it seems a
> more robust solution then to change the GFP flags directly in each
> kmalloc from the reasons I mentioned in the original email I sent.

I agree.

> Having said that, if no one else objects and we say we will look at
> moving to that API in the future, I don't object to taking your
> patch-set as is now.

Yeah. I'll look into this.

Regards,
  Felix

>
> Oded
>
>>
>> On 2018-05-11 05:10 AM, Oded Gabbay wrote:
>>> On Wed, Apr 11, 2018 at 12:33 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
>>>> Signed-off-by: John Bridgman <john.bridgman@amd.com>
>>>> Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdkfd/Makefile             |   1 +
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_device.c         |   2 +-
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c    |   3 +
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 443 ++++++++++++++++++++++++
>>>>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   3 +
>>>>  5 files changed, 451 insertions(+), 1 deletion(-)
>>>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
>>>> index 52b3c1b..094b591 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>>>> @@ -30,6 +30,7 @@ amdkfd-y      := kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>>>>                 kfd_pasid.o kfd_doorbell.o kfd_flat_memory.o \
>>>>                 kfd_process.o kfd_queue.o kfd_mqd_manager.o \
>>>>                 kfd_mqd_manager_cik.o kfd_mqd_manager_vi.o \
>>>> +               kfd_mqd_manager_v9.o \
>>>>                 kfd_kernel_queue.o kfd_kernel_queue_cik.o \
>>>>                 kfd_kernel_queue_vi.o kfd_kernel_queue_v9.o \
>>>>                 kfd_packet_manager.o kfd_process_queue_manager.o \
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>> index f563acb..c368ce3 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>>> @@ -700,7 +700,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
>>>>         if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
>>>>                 return -ENOMEM;
>>>>
>>>> -       *mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>>>> +       *mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>>> This assumes the patch in the userptr patch-set is applied. I changed
>>> it to GFP_KERNEL for now.
>>>
>>>>         if ((*mem_obj) == NULL)
>>>>                 return -ENOMEM;
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>>> index ee7061e..4b8eb50 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c
>>>> @@ -38,6 +38,9 @@ struct mqd_manager *mqd_manager_init(enum KFD_MQD_TYPE type,
>>>>         case CHIP_POLARIS10:
>>>>         case CHIP_POLARIS11:
>>>>                 return mqd_manager_init_vi_tonga(type, dev);
>>>> +       case CHIP_VEGA10:
>>>> +       case CHIP_RAVEN:
>>>> +               return mqd_manager_init_v9(type, dev);
>>>>         default:
>>>>                 WARN(1, "Unexpected ASIC family %u",
>>>>                      dev->device_info->asic_family);
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>>> new file mode 100644
>>>> index 0000000..684054f
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>>> @@ -0,0 +1,443 @@
>>>> +/*
>>>> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
>>>> + *
>>>> + * Permission is hereby granted, free of charge, to any person obtaining a
>>>> + * copy of this software and associated documentation files (the "Software"),
>>>> + * to deal in the Software without restriction, including without limitation
>>>> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
>>>> + * and/or sell copies of the Software, and to permit persons to whom the
>>>> + * Software is furnished to do so, subject to the following conditions:
>>>> + *
>>>> + * The above copyright notice and this permission notice shall be included in
>>>> + * all copies or substantial portions of the Software.
>>>> + *
>>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
>>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
>>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
>>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
>>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>>> + *
>>>> + */
>>>> +
>>>> +#include <linux/printk.h>
>>>> +#include <linux/slab.h>
>>>> +#include <linux/uaccess.h>
>>>> +#include "kfd_priv.h"
>>>> +#include "kfd_mqd_manager.h"
>>>> +#include "v9_structs.h"
>>>> +#include "gc/gc_9_0_offset.h"
>>>> +#include "gc/gc_9_0_sh_mask.h"
>>>> +#include "sdma0/sdma0_4_0_sh_mask.h"
>>>> +
>>>> +static inline struct v9_mqd *get_mqd(void *mqd)
>>>> +{
>>>> +       return (struct v9_mqd *)mqd;
>>>> +}
>>>> +
>>>> +static inline struct v9_sdma_mqd *get_sdma_mqd(void *mqd)
>>>> +{
>>>> +       return (struct v9_sdma_mqd *)mqd;
>>>> +}
>>>> +
>>>> +static int init_mqd(struct mqd_manager *mm, void **mqd,
>>>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>>> +                       struct queue_properties *q)
>>>> +{
>>>> +       int retval;
>>>> +       uint64_t addr;
>>>> +       struct v9_mqd *m;
>>>> +       struct kfd_dev *kfd = mm->dev;
>>>> +
>>>> +       /* From V9,  for CWSR, the control stack is located on the next page
>>>> +        * boundary after the mqd, we will use the gtt allocation function
>>>> +        * instead of sub-allocation function.
>>>> +        */
>>>> +       if (kfd->cwsr_enabled && (q->type == KFD_QUEUE_TYPE_COMPUTE)) {
>>>> +               *mqd_mem_obj = kzalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
>>> Using GFP_NOIO directly is not recommended. Can we use the scope
>>> functions instead ?
>>>
>>>> +               if (!*mqd_mem_obj)
>>>> +                       return -ENOMEM;
>>>> +               retval = kfd->kfd2kgd->init_gtt_mem_allocation(kfd->kgd,
>>>> +                       ALIGN(q->ctl_stack_size, PAGE_SIZE) +
>>>> +                               ALIGN(sizeof(struct v9_mqd), PAGE_SIZE),
>>>> +                       &((*mqd_mem_obj)->gtt_mem),
>>>> +                       &((*mqd_mem_obj)->gpu_addr),
>>>> +                       (void *)&((*mqd_mem_obj)->cpu_ptr));
>>>> +       } else
>>>> +               retval = kfd_gtt_sa_allocate(mm->dev, sizeof(struct v9_mqd),
>>>> +                               mqd_mem_obj);
>>>> +       if (retval != 0)
>>>> +               return -ENOMEM;
>>>> +
>>>> +       m = (struct v9_mqd *) (*mqd_mem_obj)->cpu_ptr;
>>>> +       addr = (*mqd_mem_obj)->gpu_addr;
>>>> +
>>>> +       memset(m, 0, sizeof(struct v9_mqd));
>>>> +
>>>> +       m->header = 0xC0310800;
>>>> +       m->compute_pipelinestat_enable = 1;
>>>> +       m->compute_static_thread_mgmt_se0 = 0xFFFFFFFF;
>>>> +       m->compute_static_thread_mgmt_se1 = 0xFFFFFFFF;
>>>> +       m->compute_static_thread_mgmt_se2 = 0xFFFFFFFF;
>>>> +       m->compute_static_thread_mgmt_se3 = 0xFFFFFFFF;
>>>> +
>>>> +       m->cp_hqd_persistent_state = CP_HQD_PERSISTENT_STATE__PRELOAD_REQ_MASK |
>>>> +                       0x53 << CP_HQD_PERSISTENT_STATE__PRELOAD_SIZE__SHIFT;
>>>> +
>>>> +       m->cp_mqd_control = 1 << CP_MQD_CONTROL__PRIV_STATE__SHIFT;
>>>> +
>>>> +       m->cp_mqd_base_addr_lo        = lower_32_bits(addr);
>>>> +       m->cp_mqd_base_addr_hi        = upper_32_bits(addr);
>>>> +
>>>> +       m->cp_hqd_quantum = 1 << CP_HQD_QUANTUM__QUANTUM_EN__SHIFT |
>>>> +                       1 << CP_HQD_QUANTUM__QUANTUM_SCALE__SHIFT |
>>>> +                       10 << CP_HQD_QUANTUM__QUANTUM_DURATION__SHIFT;
>>>> +
>>>> +       m->cp_hqd_pipe_priority = 1;
>>>> +       m->cp_hqd_queue_priority = 15;
>>>> +
>>>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>>>> +               m->cp_hqd_aql_control =
>>>> +                       1 << CP_HQD_AQL_CONTROL__CONTROL0__SHIFT;
>>>> +       }
>>>> +
>>>> +       if (q->tba_addr) {
>>>> +               m->compute_pgm_rsrc2 |=
>>>> +                       (1 << COMPUTE_PGM_RSRC2__TRAP_PRESENT__SHIFT);
>>>> +       }
>>>> +
>>>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address) {
>>>> +               m->cp_hqd_persistent_state |=
>>>> +                       (1 << CP_HQD_PERSISTENT_STATE__QSWITCH_MODE__SHIFT);
>>>> +               m->cp_hqd_ctx_save_base_addr_lo =
>>>> +                       lower_32_bits(q->ctx_save_restore_area_address);
>>>> +               m->cp_hqd_ctx_save_base_addr_hi =
>>>> +                       upper_32_bits(q->ctx_save_restore_area_address);
>>>> +               m->cp_hqd_ctx_save_size = q->ctx_save_restore_area_size;
>>>> +               m->cp_hqd_cntl_stack_size = q->ctl_stack_size;
>>>> +               m->cp_hqd_cntl_stack_offset = q->ctl_stack_size;
>>>> +               m->cp_hqd_wg_state_offset = q->ctl_stack_size;
>>>> +       }
>>>> +
>>>> +       *mqd = m;
>>>> +       if (gart_addr)
>>>> +               *gart_addr = addr;
>>>> +       retval = mm->update_mqd(mm, m, q);
>>>> +
>>>> +       return retval;
>>>> +}
>>>> +
>>>> +static int load_mqd(struct mqd_manager *mm, void *mqd,
>>>> +                       uint32_t pipe_id, uint32_t queue_id,
>>>> +                       struct queue_properties *p, struct mm_struct *mms)
>>>> +{
>>>> +       /* AQL write pointer counts in 64B packets, PM4/CP counts in dwords. */
>>>> +       uint32_t wptr_shift = (p->format == KFD_QUEUE_FORMAT_AQL ? 4 : 0);
>>>> +
>>>> +       return mm->dev->kfd2kgd->hqd_load(mm->dev->kgd, mqd, pipe_id, queue_id,
>>>> +                                         (uint32_t __user *)p->write_ptr,
>>>> +                                         wptr_shift, 0, mms);
>>>> +}
>>>> +
>>>> +static int update_mqd(struct mqd_manager *mm, void *mqd,
>>>> +                     struct queue_properties *q)
>>>> +{
>>>> +       struct v9_mqd *m;
>>>> +
>>>> +       m = get_mqd(mqd);
>>>> +
>>>> +       m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT;
>>>> +       m->cp_hqd_pq_control |= order_base_2(q->queue_size / 4) - 1;
>>>> +       pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
>>>> +
>>>> +       m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
>>>> +       m->cp_hqd_pq_base_hi = upper_32_bits((uint64_t)q->queue_address >> 8);
>>>> +
>>>> +       m->cp_hqd_pq_rptr_report_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>>>> +       m->cp_hqd_pq_rptr_report_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>>>> +       m->cp_hqd_pq_wptr_poll_addr_lo = lower_32_bits((uint64_t)q->write_ptr);
>>>> +       m->cp_hqd_pq_wptr_poll_addr_hi = upper_32_bits((uint64_t)q->write_ptr);
>>>> +
>>>> +       m->cp_hqd_pq_doorbell_control =
>>>> +               q->doorbell_off <<
>>>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_OFFSET__SHIFT;
>>>> +       pr_debug("cp_hqd_pq_doorbell_control 0x%x\n",
>>>> +                       m->cp_hqd_pq_doorbell_control);
>>>> +
>>>> +       m->cp_hqd_ib_control =
>>>> +               3 << CP_HQD_IB_CONTROL__MIN_IB_AVAIL_SIZE__SHIFT |
>>>> +               1 << CP_HQD_IB_CONTROL__IB_EXE_DISABLE__SHIFT;
>>>> +
>>>> +       /*
>>>> +        * HW does not clamp this field correctly. Maximum EOP queue size
>>>> +        * is constrained by per-SE EOP done signal count, which is 8-bit.
>>>> +        * Limit is 0xFF EOP entries (= 0x7F8 dwords). CP will not submit
>>>> +        * more than (EOP entry count - 1) so a queue size of 0x800 dwords
>>>> +        * is safe, giving a maximum field value of 0xA.
>>>> +        */
>>>> +       m->cp_hqd_eop_control = min(0xA,
>>>> +               order_base_2(q->eop_ring_buffer_size / 4) - 1);
>>>> +       m->cp_hqd_eop_base_addr_lo =
>>>> +                       lower_32_bits(q->eop_ring_buffer_address >> 8);
>>>> +       m->cp_hqd_eop_base_addr_hi =
>>>> +                       upper_32_bits(q->eop_ring_buffer_address >> 8);
>>>> +
>>>> +       m->cp_hqd_iq_timer = 0;
>>>> +
>>>> +       m->cp_hqd_vmid = q->vmid;
>>>> +
>>>> +       if (q->format == KFD_QUEUE_FORMAT_AQL) {
>>>> +               m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__NO_UPDATE_RPTR_MASK |
>>>> +                               2 << CP_HQD_PQ_CONTROL__SLOT_BASED_WPTR__SHIFT |
>>>> +                               1 << CP_HQD_PQ_CONTROL__QUEUE_FULL_EN__SHIFT |
>>>> +                               1 << CP_HQD_PQ_CONTROL__WPP_CLAMP_EN__SHIFT;
>>>> +               m->cp_hqd_pq_doorbell_control |= 1 <<
>>>> +                       CP_HQD_PQ_DOORBELL_CONTROL__DOORBELL_BIF_DROP__SHIFT;
>>>> +       }
>>>> +       if (mm->dev->cwsr_enabled && q->ctx_save_restore_area_address)
>>>> +               m->cp_hqd_ctx_save_control = 0;
>>>> +
>>>> +       q->is_active = (q->queue_size > 0 &&
>>>> +                       q->queue_address != 0 &&
>>>> +                       q->queue_percent > 0 &&
>>>> +                       !q->is_evicted);
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +
>>>> +static int destroy_mqd(struct mqd_manager *mm, void *mqd,
>>>> +                       enum kfd_preempt_type type,
>>>> +                       unsigned int timeout, uint32_t pipe_id,
>>>> +                       uint32_t queue_id)
>>>> +{
>>>> +       return mm->dev->kfd2kgd->hqd_destroy
>>>> +               (mm->dev->kgd, mqd, type, timeout,
>>>> +               pipe_id, queue_id);
>>>> +}
>>>> +
>>>> +static void uninit_mqd(struct mqd_manager *mm, void *mqd,
>>>> +                       struct kfd_mem_obj *mqd_mem_obj)
>>>> +{
>>>> +       struct kfd_dev *kfd = mm->dev;
>>>> +
>>>> +       if (mqd_mem_obj->gtt_mem) {
>>>> +               kfd->kfd2kgd->free_gtt_mem(kfd->kgd, mqd_mem_obj->gtt_mem);
>>>> +               kfree(mqd_mem_obj);
>>>> +       } else {
>>>> +               kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>>>> +       }
>>>> +}
>>>> +
>>>> +static bool is_occupied(struct mqd_manager *mm, void *mqd,
>>>> +                       uint64_t queue_address, uint32_t pipe_id,
>>>> +                       uint32_t queue_id)
>>>> +{
>>>> +       return mm->dev->kfd2kgd->hqd_is_occupied(
>>>> +               mm->dev->kgd, queue_address,
>>>> +               pipe_id, queue_id);
>>>> +}
>>>> +
>>>> +static int init_mqd_hiq(struct mqd_manager *mm, void **mqd,
>>>> +                       struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>>> +                       struct queue_properties *q)
>>>> +{
>>>> +       struct v9_mqd *m;
>>>> +       int retval = init_mqd(mm, mqd, mqd_mem_obj, gart_addr, q);
>>>> +
>>>> +       if (retval != 0)
>>>> +               return retval;
>>>> +
>>>> +       m = get_mqd(*mqd);
>>>> +
>>>> +       m->cp_hqd_pq_control |= 1 << CP_HQD_PQ_CONTROL__PRIV_STATE__SHIFT |
>>>> +                       1 << CP_HQD_PQ_CONTROL__KMD_QUEUE__SHIFT;
>>>> +
>>>> +       return retval;
>>>> +}
>>>> +
>>>> +static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
>>>> +                       struct queue_properties *q)
>>>> +{
>>>> +       struct v9_mqd *m;
>>>> +       int retval = update_mqd(mm, mqd, q);
>>>> +
>>>> +       if (retval != 0)
>>>> +               return retval;
>>>> +
>>>> +       /* TODO: what's the point? update_mqd already does this. */
>>>> +       m = get_mqd(mqd);
>>>> +       m->cp_hqd_vmid = q->vmid;
>>>> +       return retval;
>>>> +}
>>>> +
>>>> +static int init_mqd_sdma(struct mqd_manager *mm, void **mqd,
>>>> +               struct kfd_mem_obj **mqd_mem_obj, uint64_t *gart_addr,
>>>> +               struct queue_properties *q)
>>>> +{
>>>> +       int retval;
>>>> +       struct v9_sdma_mqd *m;
>>>> +
>>>> +
>>>> +       retval = kfd_gtt_sa_allocate(mm->dev,
>>>> +                       sizeof(struct v9_sdma_mqd),
>>>> +                       mqd_mem_obj);
>>>> +
>>>> +       if (retval != 0)
>>>> +               return -ENOMEM;
>>>> +
>>>> +       m = (struct v9_sdma_mqd *) (*mqd_mem_obj)->cpu_ptr;
>>>> +
>>>> +       memset(m, 0, sizeof(struct v9_sdma_mqd));
>>>> +
>>>> +       *mqd = m;
>>>> +       if (gart_addr)
>>>> +               *gart_addr = (*mqd_mem_obj)->gpu_addr;
>>>> +
>>>> +       retval = mm->update_mqd(mm, m, q);
>>>> +
>>>> +       return retval;
>>>> +}
>>>> +
>>>> +static void uninit_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>>> +               struct kfd_mem_obj *mqd_mem_obj)
>>>> +{
>>>> +       kfd_gtt_sa_free(mm->dev, mqd_mem_obj);
>>>> +}
>>>> +
>>>> +static int load_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>>> +               uint32_t pipe_id, uint32_t queue_id,
>>>> +               struct queue_properties *p, struct mm_struct *mms)
>>>> +{
>>>> +       return mm->dev->kfd2kgd->hqd_sdma_load(mm->dev->kgd, mqd,
>>>> +                                              (uint32_t __user *)p->write_ptr,
>>>> +                                              mms);
>>>> +}
>>>> +
>>>> +#define SDMA_RLC_DUMMY_DEFAULT 0xf
>>>> +
>>>> +static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>>> +               struct queue_properties *q)
>>>> +{
>>>> +       struct v9_sdma_mqd *m;
>>>> +
>>>> +       m = get_sdma_mqd(mqd);
>>>> +       m->sdmax_rlcx_rb_cntl = order_base_2(q->queue_size / 4)
>>>> +               << SDMA0_RLC0_RB_CNTL__RB_SIZE__SHIFT |
>>>> +               q->vmid << SDMA0_RLC0_RB_CNTL__RB_VMID__SHIFT |
>>>> +               1 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_ENABLE__SHIFT |
>>>> +               6 << SDMA0_RLC0_RB_CNTL__RPTR_WRITEBACK_TIMER__SHIFT;
>>>> +
>>>> +       m->sdmax_rlcx_rb_base = lower_32_bits(q->queue_address >> 8);
>>>> +       m->sdmax_rlcx_rb_base_hi = upper_32_bits(q->queue_address >> 8);
>>>> +       m->sdmax_rlcx_rb_rptr_addr_lo = lower_32_bits((uint64_t)q->read_ptr);
>>>> +       m->sdmax_rlcx_rb_rptr_addr_hi = upper_32_bits((uint64_t)q->read_ptr);
>>>> +       m->sdmax_rlcx_doorbell_offset =
>>>> +               q->doorbell_off << SDMA0_RLC0_DOORBELL_OFFSET__OFFSET__SHIFT;
>>>> +
>>>> +       m->sdma_engine_id = q->sdma_engine_id;
>>>> +       m->sdma_queue_id = q->sdma_queue_id;
>>>> +       m->sdmax_rlcx_dummy_reg = SDMA_RLC_DUMMY_DEFAULT;
>>>> +
>>>> +       q->is_active = (q->queue_size > 0 &&
>>>> +                       q->queue_address != 0 &&
>>>> +                       q->queue_percent > 0 &&
>>>> +                       !q->is_evicted);
>>>> +
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +/*
>>>> + *  * preempt type here is ignored because there is only one way
>>>> + *  * to preempt sdma queue
>>>> + */
>>>> +static int destroy_mqd_sdma(struct mqd_manager *mm, void *mqd,
>>>> +               enum kfd_preempt_type type,
>>>> +               unsigned int timeout, uint32_t pipe_id,
>>>> +               uint32_t queue_id)
>>>> +{
>>>> +       return mm->dev->kfd2kgd->hqd_sdma_destroy(mm->dev->kgd, mqd, timeout);
>>>> +}
>>>> +
>>>> +static bool is_occupied_sdma(struct mqd_manager *mm, void *mqd,
>>>> +               uint64_t queue_address, uint32_t pipe_id,
>>>> +               uint32_t queue_id)
>>>> +{
>>>> +       return mm->dev->kfd2kgd->hqd_sdma_is_occupied(mm->dev->kgd, mqd);
>>>> +}
>>>> +
>>>> +#if defined(CONFIG_DEBUG_FS)
>>>> +
>>>> +static int debugfs_show_mqd(struct seq_file *m, void *data)
>>>> +{
>>>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>>>> +                    data, sizeof(struct v9_mqd), false);
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +static int debugfs_show_mqd_sdma(struct seq_file *m, void *data)
>>>> +{
>>>> +       seq_hex_dump(m, "    ", DUMP_PREFIX_OFFSET, 32, 4,
>>>> +                    data, sizeof(struct v9_sdma_mqd), false);
>>>> +       return 0;
>>>> +}
>>>> +
>>>> +#endif
>>>> +
>>>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>>>> +               struct kfd_dev *dev)
>>>> +{
>>>> +       struct mqd_manager *mqd;
>>>> +
>>>> +       if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
>>>> +               return NULL;
>>>> +
>>>> +       mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
>>> Using GFP_NOIO directly is not recommended. Can we use the scope
>>> functions instead ?
>>>
>>>> +       if (!mqd)
>>>> +               return NULL;
>>>> +
>>>> +       mqd->dev = dev;
>>>> +
>>>> +       switch (type) {
>>>> +       case KFD_MQD_TYPE_CP:
>>>> +       case KFD_MQD_TYPE_COMPUTE:
>>>> +               mqd->init_mqd = init_mqd;
>>>> +               mqd->uninit_mqd = uninit_mqd;
>>>> +               mqd->load_mqd = load_mqd;
>>>> +               mqd->update_mqd = update_mqd;
>>>> +               mqd->destroy_mqd = destroy_mqd;
>>>> +               mqd->is_occupied = is_occupied;
>>>> +#if defined(CONFIG_DEBUG_FS)
>>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>>>> +#endif
>>>> +               break;
>>>> +       case KFD_MQD_TYPE_HIQ:
>>>> +               mqd->init_mqd = init_mqd_hiq;
>>>> +               mqd->uninit_mqd = uninit_mqd;
>>>> +               mqd->load_mqd = load_mqd;
>>>> +               mqd->update_mqd = update_mqd_hiq;
>>>> +               mqd->destroy_mqd = destroy_mqd;
>>>> +               mqd->is_occupied = is_occupied;
>>>> +#if defined(CONFIG_DEBUG_FS)
>>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd;
>>>> +#endif
>>>> +               break;
>>>> +       case KFD_MQD_TYPE_SDMA:
>>>> +               mqd->init_mqd = init_mqd_sdma;
>>>> +               mqd->uninit_mqd = uninit_mqd_sdma;
>>>> +               mqd->load_mqd = load_mqd_sdma;
>>>> +               mqd->update_mqd = update_mqd_sdma;
>>>> +               mqd->destroy_mqd = destroy_mqd_sdma;
>>>> +               mqd->is_occupied = is_occupied_sdma;
>>>> +#if defined(CONFIG_DEBUG_FS)
>>>> +               mqd->debugfs_show_mqd = debugfs_show_mqd_sdma;
>>>> +#endif
>>>> +               break;
>>>> +       default:
>>>> +               kfree(mqd);
>>>> +               return NULL;
>>>> +       }
>>>> +
>>>> +       return mqd;
>>>> +}
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> index b68299a..fac2882 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>>> @@ -197,6 +197,7 @@ struct kfd_mem_obj {
>>>>         uint32_t range_end;
>>>>         uint64_t gpu_addr;
>>>>         uint32_t *cpu_ptr;
>>>> +       void *gtt_mem;
>>>>  };
>>>>
>>>>  struct kfd_vmid_info {
>>>> @@ -822,6 +823,8 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
>>>>                 struct kfd_dev *dev);
>>>>  struct mqd_manager *mqd_manager_init_vi_tonga(enum KFD_MQD_TYPE type,
>>>>                 struct kfd_dev *dev);
>>>> +struct mqd_manager *mqd_manager_init_v9(enum KFD_MQD_TYPE type,
>>>> +               struct kfd_dev *dev);
>>>>  struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev);
>>>>  void device_queue_manager_uninit(struct device_queue_manager *dqm);
>>>>  struct kernel_queue *kernel_queue_init(struct kfd_dev *dev,
>>>> --
>>>> 2.7.4
>>>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/21] GFXv9/Vega10 support for KFD
       [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (22 preceding siblings ...)
  2018-05-11 20:08   ` Oded Gabbay
@ 2018-05-14 14:27   ` Tom Stellard
       [not found]     ` <4b467be5-8c51-c08c-c116-6041a1838aff-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  23 siblings, 1 reply; 38+ messages in thread
From: Tom Stellard @ 2018-05-14 14:27 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 04/10/2018 02:32 PM, Felix Kuehling wrote:
> This patch series adds support for GFXv9 GPUs to KFD. In this series it
> enables support for Vega10. Raven support requires some extra work that
> will follow shortly, but Raven support is already included and I didn't
> go out of my way to keep it out.
> 

Hi Felix,

Can I use the thunk from the ROCm 1.8.0 release to test these patches,
or do I need a custom build?

Thanks,
Tom

> Felix Kuehling (19):
>   drm/amdgpu: Remove unused interface from kfd2kgd interface
>   drm/amd: Update GFXv9 SDMA MQD structure
>   drm/amdgpu: Add GFXv9 TLB invalidation packet definition
>   drm/amdgpu: Add GFXv9 kfd2kgd interface functions
>   drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
>   drm/amdkfd: Make doorbell size ASIC-dependent
>   drm/amdkfd: Implement doorbell allocation for SOC15
>   drm/amdkfd: Move packet writer functions into ASIC-specific file
>   drm/amdkfd: Add GFXv9 PM4 packet writer functions
>   drm/amdkfd: Add GFXv9 MQD manager
>   drm/amdkfd: Add GFXv9 device queue manager
>   drm/amdkfd: Add SOC15 interrupt processing support
>   drm/amdkfd: Fix goto usage
>   drm/amdkfd: Fix kernel queue rollback_packet
>   drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
>   drm/amdkfd: Remove limit on number of GPUs (follow-up)
>   drm/amdkfd: Support flat memory apertures for GFXv9
>   drm/amdkfd: Add GFXv9 CWSR trap handler
>   drm/amdkfd: Add Vega10 topology and device info
> 
> Harish Kasiviswanathan (1):
>   drm/amdkfd: Clean up KFD_MMAP_ offset handling
> 
> welu (1):
>   drm/amdkfd: Try to enable atomics for all GPUs
> 
>  MAINTAINERS                                        |    2 +
>  drivers/gpu/drm/amd/amdgpu/Makefile                |    3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   26 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c              |    1 +
>  drivers/gpu/drm/amd/amdgpu/soc15d.h                |    5 +
>  drivers/gpu/drm/amd/amdkfd/Makefile                |   10 +-
>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 ++++++++++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |   42 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   11 +
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   89 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    2 +
>  .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          |   65 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  119 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c    |   84 ++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |   39 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h      |    7 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |    9 +
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    3 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    |  443 ++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  385 +----
>  drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h    |  583 ++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  106 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   40 +-
>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    6 +
>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
>  drivers/gpu/drm/amd/amdkfd/soc15_int.h             |   47 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   20 +-
>  drivers/gpu/drm/amd/include/v9_structs.h           |   48 +-
>  39 files changed, 5118 insertions(+), 501 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
>  create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h
> 

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/21] GFXv9/Vega10 support for KFD
       [not found]     ` <4b467be5-8c51-c08c-c116-6041a1838aff-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2018-05-14 15:34       ` Felix Kuehling
  0 siblings, 0 replies; 38+ messages in thread
From: Felix Kuehling @ 2018-05-14 15:34 UTC (permalink / raw)
  To: tstellar-H+wXaHxf7aLQT0dZR+AlfA,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Hi Tom,

The ROCm 1.8 Thunk isn't compatible with the upstream ioctls yet. I'll
work on getting that aligned for ROCm 1.9.

Regards,
  Felix


On 2018-05-14 10:27 AM, Tom Stellard wrote:
> On 04/10/2018 02:32 PM, Felix Kuehling wrote:
>> This patch series adds support for GFXv9 GPUs to KFD. In this series it
>> enables support for Vega10. Raven support requires some extra work that
>> will follow shortly, but Raven support is already included and I didn't
>> go out of my way to keep it out.
>>
> Hi Felix,
>
> Can I use the thunk from the ROCm 1.8.0 release to test these patches,
> or do I need a custom build?
>
> Thanks,
> Tom
>
>> Felix Kuehling (19):
>>   drm/amdgpu: Remove unused interface from kfd2kgd interface
>>   drm/amd: Update GFXv9 SDMA MQD structure
>>   drm/amdgpu: Add GFXv9 TLB invalidation packet definition
>>   drm/amdgpu: Add GFXv9 kfd2kgd interface functions
>>   drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources
>>   drm/amdkfd: Make doorbell size ASIC-dependent
>>   drm/amdkfd: Implement doorbell allocation for SOC15
>>   drm/amdkfd: Move packet writer functions into ASIC-specific file
>>   drm/amdkfd: Add GFXv9 PM4 packet writer functions
>>   drm/amdkfd: Add GFXv9 MQD manager
>>   drm/amdkfd: Add GFXv9 device queue manager
>>   drm/amdkfd: Add SOC15 interrupt processing support
>>   drm/amdkfd: Fix goto usage
>>   drm/amdkfd: Fix kernel queue rollback_packet
>>   drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue
>>   drm/amdkfd: Remove limit on number of GPUs (follow-up)
>>   drm/amdkfd: Support flat memory apertures for GFXv9
>>   drm/amdkfd: Add GFXv9 CWSR trap handler
>>   drm/amdkfd: Add Vega10 topology and device info
>>
>> Harish Kasiviswanathan (1):
>>   drm/amdkfd: Clean up KFD_MMAP_ offset handling
>>
>> welu (1):
>>   drm/amdkfd: Try to enable atomics for all GPUs
>>
>>  MAINTAINERS                                        |    2 +
>>  drivers/gpu/drm/amd/amdgpu/Makefile                |    3 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |   26 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |    1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   10 -
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   10 -
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c  | 1043 ++++++++++++++
>>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c              |    1 +
>>  drivers/gpu/drm/amd/amdgpu/soc15d.h                |    5 +
>>  drivers/gpu/drm/amd/amdkfd/Makefile                |   10 +-
>>  .../gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm  | 1495 ++++++++++++++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |   42 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   11 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   89 +-
>>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  102 +-
>>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    2 +
>>  .../drm/amd/amdkfd/kfd_device_queue_manager_v9.c   |   84 ++
>>  drivers/gpu/drm/amd/amdkfd/kfd_doorbell.c          |   65 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_events.c            |    2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  119 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c    |   84 ++
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c      |   39 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.h      |    7 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_cik.c  |    9 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c   |  340 +++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_vi.c   |  319 +++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    5 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager.c       |    3 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c    |  443 ++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  385 +----
>>  drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h    |  583 ++++++++
>>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  106 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           |   40 +-
>>  .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   12 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    6 +
>>  drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
>>  drivers/gpu/drm/amd/amdkfd/soc15_int.h             |   47 +
>>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |   20 +-
>>  drivers/gpu/drm/amd/include/v9_structs.h           |   48 +-
>>  39 files changed, 5118 insertions(+), 501 deletions(-)
>>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager_v9.c
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue_v9.c
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pm4_headers_ai.h
>>  create mode 100644 drivers/gpu/drm/amd/amdkfd/soc15_int.h
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions
       [not found]     ` <1523395998-31314-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-15  9:41       ` Dave Airlie
       [not found]         ` <CAPM=9txHAqFw5NqjXE3HjxpZh0p4SDVS0B7wMP6PcMFotvOj6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Dave Airlie @ 2018-05-15  9:41 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Oded Gabbay, Yong Zhao, Jay Cornwall, John Bridgman,
	amd-gfx mailing list, Shaoyun Liu

> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
> +                       uint32_t queue_id, uint32_t __user *wptr,
> +                       uint32_t wptr_shift, uint32_t wptr_mask,
> +                       struct mm_struct *mm)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +       struct v9_mqd *m;
> +       uint32_t *mqd_hqd;
> +       uint32_t reg, hqd_base, data;
> +
> +       m = get_mqd(mqd);
> +
> +       acquire_queue(kgd, pipe_id, queue_id);
> +
> +       /* HIQ is set during driver init period with vmid set to 0*/
> +       if (m->cp_hqd_vmid == 0) {
> +               uint32_t value, mec, pipe;
> +
> +               mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
> +               pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
> +
> +               pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
> +                       mec, pipe, queue_id);
> +               value = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS));
> +               value = REG_SET_FIELD(value, RLC_CP_SCHEDULERS, scheduler1,
> +                       ((mec << 5) | (pipe << 3) | queue_id | 0x80));
> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS), value);
> +       }
> +
> +       /* HQD registers extend from CP_MQD_BASE_ADDR to CP_HQD_EOP_WPTR_MEM. */
> +       mqd_hqd = &m->cp_mqd_base_addr_lo;
> +       hqd_base = SOC15_REG_OFFSET(GC, 0, mmCP_MQD_BASE_ADDR);
> +
> +       for (reg = hqd_base;
> +            reg <= SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI); reg++)
> +               WREG32(reg, mqd_hqd[reg - hqd_base]);
> +
> +
> +       /* Activate doorbell logic before triggering WPTR poll. */
> +       data = REG_SET_FIELD(m->cp_hqd_pq_doorbell_control,
> +                            CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
> +       WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), data);
> +
> +       if (wptr) {
> +               /* Don't read wptr with get_user because the user
> +                * context may not be accessible (if this function
> +                * runs in a work queue). Instead trigger a one-shot
> +                * polling read from memory in the CP. This assumes
> +                * that wptr is GPU-accessible in the queue's VMID via
> +                * ATC or SVM. WPTR==RPTR before starting the poll so
> +                * the CP starts fetching new commands from the right
> +                * place.
> +                *
> +                * Guessing a 64-bit WPTR from a 32-bit RPTR is a bit
> +                * tricky. Assume that the queue didn't overflow. The
> +                * number of valid bits in the 32-bit RPTR depends on
> +                * the queue size. The remaining bits are taken from
> +                * the saved 64-bit WPTR. If the WPTR wrapped, add the
> +                * queue size.
> +                */
> +               uint32_t queue_size =
> +                       2 << REG_GET_FIELD(m->cp_hqd_pq_control,
> +                                          CP_HQD_PQ_CONTROL, QUEUE_SIZE);
> +               uint64_t guessed_wptr = m->cp_hqd_pq_rptr & (queue_size - 1);
> +
> +               if ((m->cp_hqd_pq_wptr_lo & (queue_size - 1)) < guessed_wptr)
> +                       guessed_wptr += queue_size;
> +               guessed_wptr += m->cp_hqd_pq_wptr_lo & ~(queue_size - 1);
> +               guessed_wptr += (uint64_t)m->cp_hqd_pq_wptr_hi << 32;
> +
> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
> +                      lower_32_bits(guessed_wptr));
> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
> +                      upper_32_bits(guessed_wptr));
> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
> +                      lower_32_bits((uint64_t)wptr));
> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
> +                      upper_32_bits((uint64_t)wptr));

 CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.o
In file included from
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:30:0:
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:
In function ‘kgd_hqd_load’:
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:24:
warning: cast from pointer to integer of different size
[-Wpointer-to-int-cast]
          lower_32_bits((uint64_t)wptr));
                        ^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
note: in definition of macro ‘WREG32’
 #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
                                                     ^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:10:
note: in expansion of macro ‘lower_32_bits’
          lower_32_bits((uint64_t)wptr));
          ^~~~~~~~~~~~~
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:24:
warning: cast from pointer to integer of different size
[-Wpointer-to-int-cast]
          upper_32_bits((uint64_t)wptr));
                        ^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
note: in definition of macro ‘WREG32’
 #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
                                                     ^
/home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:10:
note: in expansion of macro ‘upper_32_bits’
          upper_32_bits((uint64_t)wptr));
          ^~~~~~~~~~~~~

On a 32-bit build.

Wow that function and the comment make me feel happy nothing will ever
break here.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions
       [not found]         ` <CAPM=9txHAqFw5NqjXE3HjxpZh0p4SDVS0B7wMP6PcMFotvOj6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-05-15 22:08           ` Felix Kuehling
       [not found]             ` <b5ca1829-dc53-6a07-5966-50d805cdb8d9-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 38+ messages in thread
From: Felix Kuehling @ 2018-05-15 22:08 UTC (permalink / raw)
  To: Dave Airlie
  Cc: Oded Gabbay, Yong Zhao, Jay Cornwall, John Bridgman,
	amd-gfx mailing list, Shaoyun Liu

On 2018-05-15 05:41 AM, Dave Airlie wrote:
>> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
>> +                       uint32_t queue_id, uint32_t __user *wptr,
>> +                       uint32_t wptr_shift, uint32_t wptr_mask,
>> +                       struct mm_struct *mm)
>> +{
>> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
>> +       struct v9_mqd *m;
>> +       uint32_t *mqd_hqd;
>> +       uint32_t reg, hqd_base, data;
>> +
>> +       m = get_mqd(mqd);
>> +
>> +       acquire_queue(kgd, pipe_id, queue_id);
>> +
>> +       /* HIQ is set during driver init period with vmid set to 0*/
>> +       if (m->cp_hqd_vmid == 0) {
>> +               uint32_t value, mec, pipe;
>> +
>> +               mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
>> +               pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
>> +
>> +               pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>> +                       mec, pipe, queue_id);
>> +               value = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS));
>> +               value = REG_SET_FIELD(value, RLC_CP_SCHEDULERS, scheduler1,
>> +                       ((mec << 5) | (pipe << 3) | queue_id | 0x80));
>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS), value);
>> +       }
>> +
>> +       /* HQD registers extend from CP_MQD_BASE_ADDR to CP_HQD_EOP_WPTR_MEM. */
>> +       mqd_hqd = &m->cp_mqd_base_addr_lo;
>> +       hqd_base = SOC15_REG_OFFSET(GC, 0, mmCP_MQD_BASE_ADDR);
>> +
>> +       for (reg = hqd_base;
>> +            reg <= SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI); reg++)
>> +               WREG32(reg, mqd_hqd[reg - hqd_base]);
>> +
>> +
>> +       /* Activate doorbell logic before triggering WPTR poll. */
>> +       data = REG_SET_FIELD(m->cp_hqd_pq_doorbell_control,
>> +                            CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
>> +       WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), data);
>> +
>> +       if (wptr) {
>> +               /* Don't read wptr with get_user because the user
>> +                * context may not be accessible (if this function
>> +                * runs in a work queue). Instead trigger a one-shot
>> +                * polling read from memory in the CP. This assumes
>> +                * that wptr is GPU-accessible in the queue's VMID via
>> +                * ATC or SVM. WPTR==RPTR before starting the poll so
>> +                * the CP starts fetching new commands from the right
>> +                * place.
>> +                *
>> +                * Guessing a 64-bit WPTR from a 32-bit RPTR is a bit
>> +                * tricky. Assume that the queue didn't overflow. The
>> +                * number of valid bits in the 32-bit RPTR depends on
>> +                * the queue size. The remaining bits are taken from
>> +                * the saved 64-bit WPTR. If the WPTR wrapped, add the
>> +                * queue size.
>> +                */
>> +               uint32_t queue_size =
>> +                       2 << REG_GET_FIELD(m->cp_hqd_pq_control,
>> +                                          CP_HQD_PQ_CONTROL, QUEUE_SIZE);
>> +               uint64_t guessed_wptr = m->cp_hqd_pq_rptr & (queue_size - 1);
>> +
>> +               if ((m->cp_hqd_pq_wptr_lo & (queue_size - 1)) < guessed_wptr)
>> +                       guessed_wptr += queue_size;
>> +               guessed_wptr += m->cp_hqd_pq_wptr_lo & ~(queue_size - 1);
>> +               guessed_wptr += (uint64_t)m->cp_hqd_pq_wptr_hi << 32;
>> +
>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
>> +                      lower_32_bits(guessed_wptr));
>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
>> +                      upper_32_bits(guessed_wptr));
>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
>> +                      lower_32_bits((uint64_t)wptr));
>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
>> +                      upper_32_bits((uint64_t)wptr));
>  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.o
> In file included from
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:30:0:
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:
> In function ‘kgd_hqd_load’:
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:24:
> warning: cast from pointer to integer of different size
> [-Wpointer-to-int-cast]
>           lower_32_bits((uint64_t)wptr));
>                         ^
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
> note: in definition of macro ‘WREG32’
>  #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
>                                                      ^
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:10:
> note: in expansion of macro ‘lower_32_bits’
>           lower_32_bits((uint64_t)wptr));
>           ^~~~~~~~~~~~~
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:24:
> warning: cast from pointer to integer of different size
> [-Wpointer-to-int-cast]
>           upper_32_bits((uint64_t)wptr));
>                         ^
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
> note: in definition of macro ‘WREG32’
>  #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
>                                                      ^
> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:10:
> note: in expansion of macro ‘upper_32_bits’
>           upper_32_bits((uint64_t)wptr));
>           ^~~~~~~~~~~~~
>
> On a 32-bit build.

Hmm, I guess we should cast to uintptr_t instead.

That said, I should fix the build system to not compile the
amdgpu_amdkfd_* files when CONFIG_HSA_AMD is not enabled. That would
prevent this type of 32-bit or architecture-specific build problems I've
been seeing from KFD-changes.

>
> Wow that function and the comment make me feel happy nothing will ever
> break here.

Exactly. The comment is there because this will never break and no one
will ever need to read it. :P The assumptions are sane, though. If they
were violated, either the GPU doesn't have access to the queue or the
queue overflowed and the owner of the queue would be in trouble anyway.

Regards,
  Felix


>
> Dave.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions
       [not found]             ` <b5ca1829-dc53-6a07-5966-50d805cdb8d9-5C7GfCeVMHo@public.gmane.org>
@ 2018-05-16 10:09               ` Oded Gabbay
  0 siblings, 0 replies; 38+ messages in thread
From: Oded Gabbay @ 2018-05-16 10:09 UTC (permalink / raw)
  To: Felix Kuehling
  Cc: Yong Zhao, Jay Cornwall, John Bridgman, amd-gfx mailing list,
	Dave Airlie, Shaoyun Liu

On Wed, May 16, 2018 at 1:08 AM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> On 2018-05-15 05:41 AM, Dave Airlie wrote:
>>> +static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
>>> +                       uint32_t queue_id, uint32_t __user *wptr,
>>> +                       uint32_t wptr_shift, uint32_t wptr_mask,
>>> +                       struct mm_struct *mm)
>>> +{
>>> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
>>> +       struct v9_mqd *m;
>>> +       uint32_t *mqd_hqd;
>>> +       uint32_t reg, hqd_base, data;
>>> +
>>> +       m = get_mqd(mqd);
>>> +
>>> +       acquire_queue(kgd, pipe_id, queue_id);
>>> +
>>> +       /* HIQ is set during driver init period with vmid set to 0*/
>>> +       if (m->cp_hqd_vmid == 0) {
>>> +               uint32_t value, mec, pipe;
>>> +
>>> +               mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
>>> +               pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
>>> +
>>> +               pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>>> +                       mec, pipe, queue_id);
>>> +               value = RREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS));
>>> +               value = REG_SET_FIELD(value, RLC_CP_SCHEDULERS, scheduler1,
>>> +                       ((mec << 5) | (pipe << 3) | queue_id | 0x80));
>>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmRLC_CP_SCHEDULERS), value);
>>> +       }
>>> +
>>> +       /* HQD registers extend from CP_MQD_BASE_ADDR to CP_HQD_EOP_WPTR_MEM. */
>>> +       mqd_hqd = &m->cp_mqd_base_addr_lo;
>>> +       hqd_base = SOC15_REG_OFFSET(GC, 0, mmCP_MQD_BASE_ADDR);
>>> +
>>> +       for (reg = hqd_base;
>>> +            reg <= SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI); reg++)
>>> +               WREG32(reg, mqd_hqd[reg - hqd_base]);
>>> +
>>> +
>>> +       /* Activate doorbell logic before triggering WPTR poll. */
>>> +       data = REG_SET_FIELD(m->cp_hqd_pq_doorbell_control,
>>> +                            CP_HQD_PQ_DOORBELL_CONTROL, DOORBELL_EN, 1);
>>> +       WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_DOORBELL_CONTROL), data);
>>> +
>>> +       if (wptr) {
>>> +               /* Don't read wptr with get_user because the user
>>> +                * context may not be accessible (if this function
>>> +                * runs in a work queue). Instead trigger a one-shot
>>> +                * polling read from memory in the CP. This assumes
>>> +                * that wptr is GPU-accessible in the queue's VMID via
>>> +                * ATC or SVM. WPTR==RPTR before starting the poll so
>>> +                * the CP starts fetching new commands from the right
>>> +                * place.
>>> +                *
>>> +                * Guessing a 64-bit WPTR from a 32-bit RPTR is a bit
>>> +                * tricky. Assume that the queue didn't overflow. The
>>> +                * number of valid bits in the 32-bit RPTR depends on
>>> +                * the queue size. The remaining bits are taken from
>>> +                * the saved 64-bit WPTR. If the WPTR wrapped, add the
>>> +                * queue size.
>>> +                */
>>> +               uint32_t queue_size =
>>> +                       2 << REG_GET_FIELD(m->cp_hqd_pq_control,
>>> +                                          CP_HQD_PQ_CONTROL, QUEUE_SIZE);
>>> +               uint64_t guessed_wptr = m->cp_hqd_pq_rptr & (queue_size - 1);
>>> +
>>> +               if ((m->cp_hqd_pq_wptr_lo & (queue_size - 1)) < guessed_wptr)
>>> +                       guessed_wptr += queue_size;
>>> +               guessed_wptr += m->cp_hqd_pq_wptr_lo & ~(queue_size - 1);
>>> +               guessed_wptr += (uint64_t)m->cp_hqd_pq_wptr_hi << 32;
>>> +
>>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_LO),
>>> +                      lower_32_bits(guessed_wptr));
>>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_HI),
>>> +                      upper_32_bits(guessed_wptr));
>>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR),
>>> +                      lower_32_bits((uint64_t)wptr));
>>> +               WREG32(SOC15_REG_OFFSET(GC, 0, mmCP_HQD_PQ_WPTR_POLL_ADDR_HI),
>>> +                      upper_32_bits((uint64_t)wptr));
>>  CC [M]  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.o
>> In file included from
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:30:0:
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:
>> In function ‘kgd_hqd_load’:
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:24:
>> warning: cast from pointer to integer of different size
>> [-Wpointer-to-int-cast]
>>           lower_32_bits((uint64_t)wptr));
>>                         ^
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
>> note: in definition of macro ‘WREG32’
>>  #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
>>                                                      ^
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:473:10:
>> note: in expansion of macro ‘lower_32_bits’
>>           lower_32_bits((uint64_t)wptr));
>>           ^~~~~~~~~~~~~
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:24:
>> warning: cast from pointer to integer of different size
>> [-Wpointer-to-int-cast]
>>           upper_32_bits((uint64_t)wptr));
>>                         ^
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu.h:1666:53:
>> note: in definition of macro ‘WREG32’
>>  #define WREG32(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 0)
>>                                                      ^
>> /home/airlied/devel/kernel/dim/src/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c:475:10:
>> note: in expansion of macro ‘upper_32_bits’
>>           upper_32_bits((uint64_t)wptr));
>>           ^~~~~~~~~~~~~
>>
>> On a 32-bit build.
>
> Hmm, I guess we should cast to uintptr_t instead.
>
> That said, I should fix the build system to not compile the
> amdgpu_amdkfd_* files when CONFIG_HSA_AMD is not enabled. That would
> prevent this type of 32-bit or architecture-specific build problems I've
> been seeing from KFD-changes.
>
I sent you a patch to review on doing that.
I checked it on 32 and 64-bit builds.

Thanks,
Oded

>>
>> Wow that function and the comment make me feel happy nothing will ever
>> break here.
>
> Exactly. The comment is there because this will never break and no one
> will ever need to read it. :P The assumptions are sane, though. If they
> were violated, either the GPU doesn't have access to the queue or the
> queue overflowed and the owner of the queue would be in trouble anyway.
>
> Regards,
>   Felix
>
>
>>
>> Dave.
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2018-05-16 10:09 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-10 21:32 [PATCH 00/21] GFXv9/Vega10 support for KFD Felix Kuehling
     [not found] ` <1523395998-31314-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-04-10 21:32   ` [PATCH 01/21] drm/amdgpu: Remove unused interface from kfd2kgd interface Felix Kuehling
2018-04-10 21:32   ` [PATCH 02/21] drm/amd: Update GFXv9 SDMA MQD structure Felix Kuehling
2018-04-10 21:33   ` [PATCH 03/21] drm/amdgpu: Add GFXv9 TLB invalidation packet definition Felix Kuehling
2018-04-10 21:33   ` [PATCH 04/21] drm/amdgpu: Add GFXv9 kfd2kgd interface functions Felix Kuehling
     [not found]     ` <1523395998-31314-5-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-05-15  9:41       ` Dave Airlie
     [not found]         ` <CAPM=9txHAqFw5NqjXE3HjxpZh0p4SDVS0B7wMP6PcMFotvOj6w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-15 22:08           ` Felix Kuehling
     [not found]             ` <b5ca1829-dc53-6a07-5966-50d805cdb8d9-5C7GfCeVMHo@public.gmane.org>
2018-05-16 10:09               ` Oded Gabbay
2018-04-10 21:33   ` [PATCH 05/21] drm/amdgpu: Add doorbell routing info to kgd2kfd_shared_resources Felix Kuehling
2018-04-10 21:33   ` [PATCH 06/21] drm/amdkfd: Make doorbell size ASIC-dependent Felix Kuehling
2018-04-10 21:33   ` [PATCH 07/21] drm/amdkfd: Clean up KFD_MMAP_ offset handling Felix Kuehling
     [not found]     ` <1523395998-31314-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-05-11  8:52       ` Oded Gabbay
     [not found]         ` <CAFCwf12Z9zSADyx9k1ps4o8-W72N_nZ-mSznZLo-vbMF=8veLA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-11 15:57           ` Felix Kuehling
     [not found]             ` <f7e5a2e1-8297-ab85-e190-4c0431e91080-5C7GfCeVMHo@public.gmane.org>
2018-05-11 18:57               ` Oded Gabbay
2018-04-10 21:33   ` [PATCH 08/21] drm/amdkfd: Implement doorbell allocation for SOC15 Felix Kuehling
2018-04-10 21:33   ` [PATCH 09/21] drm/amdkfd: Move packet writer functions into ASIC-specific file Felix Kuehling
2018-04-10 21:33   ` [PATCH 10/21] drm/amdkfd: Add GFXv9 PM4 packet writer functions Felix Kuehling
2018-04-10 21:33   ` [PATCH 11/21] drm/amdkfd: Add GFXv9 MQD manager Felix Kuehling
     [not found]     ` <1523395998-31314-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-05-11  9:10       ` Oded Gabbay
     [not found]         ` <CAFCwf12Lj17xZTk43bD3YEM9Gc=poNN_G7w+JxLbPU=EqH7Y5g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-11 18:15           ` Felix Kuehling
     [not found]             ` <6c9f12ce-3b30-93bc-7a7b-499208861420-5C7GfCeVMHo@public.gmane.org>
2018-05-11 19:06               ` Oded Gabbay
     [not found]                 ` <CAFCwf10UquT1mycsc9HGveaX7rgwSJhez+e9F-N6E=GMFH=-GQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-05-11 21:50                   ` Felix Kuehling
2018-04-10 21:33   ` [PATCH 12/21] drm/amdkfd: Add GFXv9 device queue manager Felix Kuehling
2018-04-10 21:33   ` [PATCH 13/21] drm/amdkfd: Add SOC15 interrupt processing support Felix Kuehling
2018-04-10 21:33   ` [PATCH 14/21] drm/amdkfd: Fix goto usage Felix Kuehling
2018-04-10 21:33   ` [PATCH 15/21] drm/amdkfd: Fix kernel queue rollback_packet Felix Kuehling
2018-04-10 21:33   ` [PATCH 16/21] drm/amdkfd: Add 64-bit doorbell and wptr support to kernel queue Felix Kuehling
     [not found]     ` <1523395998-31314-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-04-24 21:42       ` Felix Kuehling
     [not found]         ` <ba222e85-d524-d611-0efe-11850ba48ff3-5C7GfCeVMHo@public.gmane.org>
2018-05-11 20:07           ` Oded Gabbay
2018-04-10 21:33   ` [PATCH 17/21] drm/amdkfd: Remove limit on number of GPUs (follow-up) Felix Kuehling
2018-04-10 21:33   ` [PATCH 18/21] drm/amdkfd: Support flat memory apertures for GFXv9 Felix Kuehling
2018-04-10 21:33   ` [PATCH 19/21] drm/amdkfd: Add GFXv9 CWSR trap handler Felix Kuehling
2018-04-10 21:33   ` [PATCH 20/21] drm/amdkfd: Try to enable atomics for all GPUs Felix Kuehling
2018-04-10 21:33   ` [PATCH 21/21] drm/amdkfd: Add Vega10 topology and device info Felix Kuehling
2018-04-10 21:58   ` [PATCH 00/21] GFXv9/Vega10 support for KFD Oded Gabbay
2018-05-11 20:08   ` Oded Gabbay
2018-05-14 14:27   ` Tom Stellard
     [not found]     ` <4b467be5-8c51-c08c-c116-6041a1838aff-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2018-05-14 15:34       ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.