All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] Add KFD GPUVM support for dGPUs v2
@ 2018-02-07  1:32 Felix Kuehling
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Changes since v1:
* Rebased
* Several fixes
* Conditional IOMMU change carried over from previous patch series and updated

AMDGPU:
Patches 1-5 are minor cleanups and fixes
Patches 6-10 add and implement KFD->KGD interfaces for GPUVM

AMDKFD:
Patch 11: Make IOMMU conditional (carried over from previous patch series)
Patch 12-13: small fixes
Patches 14-25 add all the GPUVM memory management functionality

Felix Kuehling (22):
  drm/amdgpu: remove useless BUG_ONs
  drm/amdgpu: Fix header file dependencies
  drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
  drm/amdgpu: Remove unused kfd2kgd interface
  drm/amdgpu: Add KFD eviction fence
  drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
  drm/amdgpu: add amdgpu_sync_clone
  drm/amdgpu: Add GPUVM memory management functions for KFD
  drm/amdgpu: Add submit IB function for KFD
  drm/amdkfd: Centralize IOMMUv2 code and make it conditional
  drm/amdkfd: Use per-device sched_policy
  drm/amdkfd: Add GPUVM virtual address space to PDD
  drm/amdkfd: Implement KFD process eviction/restore
  uapi: Fix type used in ioctl parameter structures
  drm/amdkfd: Remove limit on number of GPUs
  drm/amdkfd: Aperture setup for dGPUs
  drm/amdkfd: Add per-process IDR for buffer handles
  drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
  drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
  drm/amdkfd: Add ioctls for GPUVM memory management
  drm/amdkfd: Kmap event page for dGPUs
  drm/amdkfd: Add module option for testing large-BAR functionality

Harish Kasiviswanathan (1):
  drm/amdkfd: Remove unaligned memory access

Oak Zeng (1):
  drm/amdkfd: Populate DRM render device minor

Yong Zhao (1):
  drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem

 drivers/gpu/drm/amd/amdgpu/Makefile                |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |  128 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  112 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c   |  179 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   80 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   82 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 1501 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c         |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h         |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h           |    6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c           |   56 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h           |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |   25 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |    1 +
 drivers/gpu/drm/amd/amdkfd/Kconfig                 |    2 +-
 drivers/gpu/drm/amd/amdkfd/Makefile                |    4 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |  484 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   17 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  192 ++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  288 +++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    9 +
 drivers/gpu/drm/amd/amdkfd/kfd_events.c            |   34 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |   59 +-
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c             |  356 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h             |   78 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    7 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |    9 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |    6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |   37 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |   91 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  599 ++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |   20 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    7 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  101 +-
 include/uapi/linux/kfd_ioctl.h                     |   87 +-
 35 files changed, 4280 insertions(+), 386 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
                     ` (23 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Dereferencing NULL pointers will cause a BUG anyway. No need to do
an explicit check.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 6 ------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 --
 3 files changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3abed1e..1f620b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -212,10 +212,6 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 	struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
 	int r;
 
-	BUG_ON(kgd == NULL);
-	BUG_ON(gpu_addr == NULL);
-	BUG_ON(cpu_ptr == NULL);
-
 	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
 	if ((*mem) == NULL)
 		return -ENOMEM;
@@ -270,8 +266,6 @@ void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 {
 	struct kgd_mem *mem = (struct kgd_mem *) mem_obj;
 
-	BUG_ON(mem == NULL);
-
 	amdgpu_bo_reserve(mem->bo, true);
 	amdgpu_bo_kunmap(mem->bo);
 	amdgpu_bo_unpin(mem->bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index a9e6aea..74fcb8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -812,8 +812,6 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 	const union amdgpu_firmware_header *hdr;
 
-	BUG_ON(kgd == NULL);
-
 	switch (type) {
 	case KGD_ENGINE_PFP:
 		hdr = (const union amdgpu_firmware_header *)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index b127259..c70c8e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -775,8 +775,6 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 	const union amdgpu_firmware_header *hdr;
 
-	BUG_ON(kgd == NULL);
-
 	switch (type) {
 	case KGD_ENGINE_PFP:
 		hdr = (const union amdgpu_firmware_header *)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
                     ` (22 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Yong Zhao, Felix Kuehling

From: Yong Zhao <yong.zhao@amd.com>

The extra fields in struct kgd_mem aren't actually needed. This struct
will be used for GPUVM allocations later.

Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 48 ++++++++++++++----------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 1f620b8..c9f204d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -209,16 +209,13 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **cpu_ptr)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
-	struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
+	struct amdgpu_bo *bo = NULL;
 	int r;
-
-	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
-	if ((*mem) == NULL)
-		return -ENOMEM;
+	uint64_t gpu_addr_tmp = 0;
+	void *cpu_ptr_tmp = NULL;
 
 	r = amdgpu_bo_create(adev, size, PAGE_SIZE, true, AMDGPU_GEM_DOMAIN_GTT,
-			     AMDGPU_GEM_CREATE_CPU_GTT_USWC, NULL, NULL, 0,
-			     &(*mem)->bo);
+			AMDGPU_GEM_CREATE_CPU_GTT_USWC, NULL, NULL, 0, &bo);
 	if (r) {
 		dev_err(adev->dev,
 			"failed to allocate BO for amdkfd (%d)\n", r);
@@ -226,52 +223,53 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 	}
 
 	/* map the buffer */
-	r = amdgpu_bo_reserve((*mem)->bo, true);
+	r = amdgpu_bo_reserve(bo, true);
 	if (r) {
 		dev_err(adev->dev, "(%d) failed to reserve bo for amdkfd\n", r);
 		goto allocate_mem_reserve_bo_failed;
 	}
 
-	r = amdgpu_bo_pin((*mem)->bo, AMDGPU_GEM_DOMAIN_GTT,
-				&(*mem)->gpu_addr);
+	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT,
+				&gpu_addr_tmp);
 	if (r) {
 		dev_err(adev->dev, "(%d) failed to pin bo for amdkfd\n", r);
 		goto allocate_mem_pin_bo_failed;
 	}
-	*gpu_addr = (*mem)->gpu_addr;
 
-	r = amdgpu_bo_kmap((*mem)->bo, &(*mem)->cpu_ptr);
+	r = amdgpu_bo_kmap(bo, &cpu_ptr_tmp);
 	if (r) {
 		dev_err(adev->dev,
 			"(%d) failed to map bo to kernel for amdkfd\n", r);
 		goto allocate_mem_kmap_bo_failed;
 	}
-	*cpu_ptr = (*mem)->cpu_ptr;
 
-	amdgpu_bo_unreserve((*mem)->bo);
+	*mem_obj = bo;
+	*gpu_addr = gpu_addr_tmp;
+	*cpu_ptr = cpu_ptr_tmp;
+
+	amdgpu_bo_unreserve(bo);
 
 	return 0;
 
 allocate_mem_kmap_bo_failed:
-	amdgpu_bo_unpin((*mem)->bo);
+	amdgpu_bo_unpin(bo);
 allocate_mem_pin_bo_failed:
-	amdgpu_bo_unreserve((*mem)->bo);
+	amdgpu_bo_unreserve(bo);
 allocate_mem_reserve_bo_failed:
-	amdgpu_bo_unref(&(*mem)->bo);
+	amdgpu_bo_unref(&bo);
 
 	return r;
 }
 
 void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 {
-	struct kgd_mem *mem = (struct kgd_mem *) mem_obj;
-
-	amdgpu_bo_reserve(mem->bo, true);
-	amdgpu_bo_kunmap(mem->bo);
-	amdgpu_bo_unpin(mem->bo);
-	amdgpu_bo_unreserve(mem->bo);
-	amdgpu_bo_unref(&(mem->bo));
-	kfree(mem);
+	struct amdgpu_bo *bo = (struct amdgpu_bo *) mem_obj;
+
+	amdgpu_bo_reserve(bo, true);
+	amdgpu_bo_kunmap(bo);
+	amdgpu_bo_unpin(bo);
+	amdgpu_bo_unreserve(bo);
+	amdgpu_bo_unref(&(bo));
 }
 
 void get_local_mem_info(struct kgd_dev *kgd,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 03/25] drm/amdgpu: Fix header file dependencies
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
  2018-02-07  1:32   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
                     ` (21 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 102dad3..65d5a4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -26,6 +26,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/gpu_scheduler.h>
+#include <drm/drm_print.h>
 
 /* max number of rings */
 #define AMDGPU_MAX_RINGS		18
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 21a80f1..13c367a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -28,6 +28,7 @@
 #include <linux/kfifo.h>
 #include <linux/rbtree.h>
 #include <drm/gpu_scheduler.h>
+#include <drm/drm_file.h>
 
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
                     ` (20 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 74fcb8b..b8be7b96 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -787,7 +787,7 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 
 	reg = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
-	return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
+	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
 static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index c70c8e1..744c05b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -704,7 +704,7 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 
 	reg = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
-	return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
+	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
 static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-6-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
                     ` (19 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  9 ---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 ----------
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  2 --
 3 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index b8be7b96..1362181 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -139,7 +139,6 @@ static uint32_t kgd_address_watch_get_offset(struct kgd_dev *kgd,
 static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd, uint8_t vmid);
 static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 							uint8_t vmid);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
@@ -196,7 +195,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.address_watch_get_offset = kgd_address_watch_get_offset,
 	.get_atc_vmid_pasid_mapping_pasid = get_atc_vmid_pasid_mapping_pasid,
 	.get_atc_vmid_pasid_mapping_valid = get_atc_vmid_pasid_mapping_valid,
-	.write_vmid_invalidate_request = write_vmid_invalidate_request,
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
@@ -790,13 +788,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-{
-	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
-
-	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
-}
-
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 744c05b..5130eac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -81,7 +81,6 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
 				uint32_t queue_id);
 static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
 				unsigned int utimeout);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 static int kgd_address_watch_disable(struct kgd_dev *kgd);
 static int kgd_address_watch_execute(struct kgd_dev *kgd,
 					unsigned int watch_point_id,
@@ -99,7 +98,6 @@ static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
 		uint8_t vmid);
 static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 		uint8_t vmid);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
@@ -157,7 +155,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 			get_atc_vmid_pasid_mapping_pasid,
 	.get_atc_vmid_pasid_mapping_valid =
 			get_atc_vmid_pasid_mapping_valid,
-	.write_vmid_invalidate_request = write_vmid_invalidate_request,
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
@@ -707,13 +704,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-{
-	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
-
-	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
-}
-
 static int kgd_address_watch_disable(struct kgd_dev *kgd)
 {
 	return 0;
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a6752bd..94eab548 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -258,8 +258,6 @@ struct kfd2kgd_calls {
 	uint16_t (*get_atc_vmid_pasid_mapping_pasid)(
 					struct kgd_dev *kgd,
 					uint8_t vmid);
-	void (*write_vmid_invalidate_request)(struct kgd_dev *kgd,
-					uint8_t vmid);
 
 	uint16_t (*get_fw_version)(struct kgd_dev *kgd,
 				enum kgd_engine_type type);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
                     ` (18 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

This fence is used by KFD to keep memory resident while user mode
queues are enabled. Trying to evict memory will trigger the
enable_signaling callback, which starts a KFD eviction, which
involves preempting user mode queues before signaling the fence.
There is one such fence per process.

v2:
* Grab a reference to mm_struct
* Dereference fence after NULL check
* Simplify fence release, no need to signal without anyone waiting
* Added signed-off-by Harish, who is the original author of this code

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 179 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  21 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
 7 files changed, 241 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index d6e5b72..43dc3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -130,6 +130,7 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += \
 	 amdgpu_amdkfd.o \
+	 amdgpu_amdkfd_fence.o \
 	 amdgpu_amdkfd_gfx_v8.o
 
 # add cgs
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 2a519f9..492c7af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -29,6 +29,8 @@
 #include <linux/mmu_context.h>
 #include <kgd_kfd_interface.h>
 
+extern const struct kgd2kfd_calls *kgd2kfd;
+
 struct amdgpu_device;
 
 struct kgd_mem {
@@ -37,6 +39,19 @@ struct kgd_mem {
 	void *cpu_ptr;
 };
 
+/* KFD Memory Eviction */
+struct amdgpu_amdkfd_fence {
+	struct dma_fence base;
+	struct mm_struct *mm;
+	spinlock_t lock;
+	char timeline_name[TASK_COMM_LEN];
+};
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       struct mm_struct *mm);
+bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
+
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
new file mode 100644
index 0000000..cf2f1e9
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -0,0 +1,179 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/stacktrace.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+#include "amdgpu_amdkfd.h"
+
+const struct dma_fence_ops amd_kfd_fence_ops;
+static atomic_t fence_seq = ATOMIC_INIT(0);
+
+/* Eviction Fence
+ * Fence helper functions to deal with KFD memory eviction.
+ * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
+ *  evicted unless all the user queues for that process are evicted.
+ *
+ * All the BOs in a process share an eviction fence. When process X wants
+ * to map VRAM memory but TTM can't find enough space, TTM will attempt to
+ * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
+ * by calling ttm_bo_driver->eviction_valuable().
+ *
+ * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
+ *  to process X. Otherwise, it will return true to indicate BO can be
+ *  evicted by TTM.
+ *
+ * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
+ * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
+ * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
+ *
+ * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
+ *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
+ *  --> amdgpu_amdkfd_fence.enable_signaling
+ *
+ * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
+ * user queues and signal fence. The work item will also start another delayed
+ * work item to restore BOs
+ */
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       struct mm_struct *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = NULL;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (fence == NULL)
+		return NULL;
+
+	/* This reference gets released in amd_kfd_fence_release */
+	mmgrab(mm);
+	fence->mm = mm;
+	get_task_comm(fence->timeline_name, current);
+	spin_lock_init(&fence->lock);
+
+	dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
+		   context, atomic_inc_return(&fence_seq));
+
+	return fence;
+}
+
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence;
+
+	if (!f)
+		return NULL;
+
+	fence = container_of(f, struct amdgpu_amdkfd_fence, base);
+	if (fence && f->ops == &amd_kfd_fence_ops)
+		return fence;
+
+	return NULL;
+}
+
+static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
+{
+	return "amdgpu_amdkfd_fence";
+}
+
+static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	return fence->timeline_name;
+}
+
+/**
+ * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
+ *  a KFD BO and schedules a job to move the BO.
+ *  If fence is already signaled return true.
+ *  If fence is not signaled schedule a evict KFD process work item.
+ */
+static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+
+	if (dma_fence_is_signaled(f))
+		return true;
+
+	if (!kgd2kfd->schedule_evict_and_restore_process(fence->mm, f))
+		return true;
+
+	return false;
+}
+
+/**
+ * amd_kfd_fence_release - callback that fence can be freed
+ *
+ * @fence: fence
+ *
+ * This function is called when the reference count becomes zero.
+ * Drops the mm_struct reference and RCU schedules freeing up the fence.
+ */
+static void amd_kfd_fence_release(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	/* Unconditionally signal the fence. The process is getting
+	 * terminated.
+	 */
+	if (WARN_ON(!fence))
+		return; /* Not an amdgpu_amdkfd_fence */
+
+	mmdrop(fence->mm);
+	kfree_rcu(f, rcu);
+}
+
+/**
+ * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
+ *  if same return TRUE else return FALSE.
+ *
+ * @f: [IN] fence
+ * @mm: [IN] mm that needs to be verified
+ */
+bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+	else if (fence->mm == mm)
+		return true;
+
+	return false;
+}
+
+const struct dma_fence_ops amd_kfd_fence_ops = {
+	.get_driver_name = amd_kfd_fence_get_driver_name,
+	.get_timeline_name = amd_kfd_fence_get_timeline_name,
+	.enable_signaling = amd_kfd_fence_enable_signaling,
+	.signaled = NULL,
+	.wait = dma_fence_default_wait,
+	.release = amd_kfd_fence_release,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 65d5a4e..ca00dd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -36,8 +36,9 @@
 #define AMDGPU_MAX_UVD_ENC_RINGS	2
 
 /* some special values for the owner field */
-#define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
-#define AMDGPU_FENCE_OWNER_VM		((void*)1ul)
+#define AMDGPU_FENCE_OWNER_UNDEFINED	((void *)0ul)
+#define AMDGPU_FENCE_OWNER_VM		((void *)1ul)
+#define AMDGPU_FENCE_OWNER_KFD		((void *)2ul)
 
 #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
 #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index df65c66..b8d3b87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -31,6 +31,7 @@
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 struct amdgpu_sync_entry {
 	struct hlist_node	node;
@@ -85,11 +86,20 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
  */
 static void *amdgpu_sync_get_owner(struct dma_fence *f)
 {
-	struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
+	struct drm_sched_fence *s_fence;
+	struct amdgpu_amdkfd_fence *kfd_fence;
+
+	if (!f)
+		return AMDGPU_FENCE_OWNER_UNDEFINED;
 
+	s_fence = to_drm_sched_fence(f);
 	if (s_fence)
 		return s_fence->owner;
 
+	kfd_fence = to_amdgpu_amdkfd_fence(f);
+	if (kfd_fence)
+		return AMDGPU_FENCE_OWNER_KFD;
+
 	return AMDGPU_FENCE_OWNER_UNDEFINED;
 }
 
@@ -204,11 +214,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 	for (i = 0; i < flist->shared_count; ++i) {
 		f = rcu_dereference_protected(flist->shared[i],
 					      reservation_object_held(resv));
+		/* We only want to trigger KFD eviction fences on
+		 * evict or move jobs. Skip KFD fences otherwise.
+		 */
+		fence_owner = amdgpu_sync_get_owner(f);
+		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+			continue;
+
 		if (amdgpu_sync_same_dev(adev, f)) {
 			/* VM updates are only interesting
 			 * for other VM updates and moves.
 			 */
-			fence_owner = amdgpu_sync_get_owner(f);
 			if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    ((owner == AMDGPU_FENCE_OWNER_VM) !=
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435..c3f33d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -46,6 +46,7 @@
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 #include "bif/bif_4_1_d.h"
 
 #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
@@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	unsigned long num_pages = bo->mem.num_pages;
 	struct drm_mm_node *node = bo->mem.mm_node;
+	struct reservation_object_list *flist;
+	struct dma_fence *f;
+	int i;
+
+	/* If bo is a KFD BO, check if the bo belongs to the current process.
+	 * If true, then return false as any KFD process needs all its BOs to
+	 * be resident to run successfully
+	 */
+	flist = reservation_object_get_list(bo->resv);
+	if (flist) {
+		for (i = 0; i < flist->shared_count; ++i) {
+			f = rcu_dereference_protected(flist->shared[i],
+				reservation_object_held(bo->resv));
+			if (amd_kfd_fence_check_mm(f, current->mm))
+				return false;
+		}
+	}
 
 	switch (bo->mem.mem_type) {
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 94eab548..9e35249 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -30,6 +30,7 @@
 
 #include <linux/types.h>
 #include <linux/bitmap.h>
+#include <linux/dma-fence.h>
 
 struct pci_dev;
 
@@ -286,6 +287,9 @@ struct kfd2kgd_calls {
  *
  * @resume: Notifies amdkfd about a resume action done to a kgd device
  *
+ * @schedule_evict_and_restore_process: Schedules work queue that will prepare
+ * for safe eviction of KFD BOs that belong to the specified process.
+ *
  * This structure contains function callback pointers so the kgd driver
  * will notify to the amdkfd about certain status changes.
  *
@@ -300,6 +304,8 @@ struct kgd2kfd_calls {
 	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
 	void (*suspend)(struct kfd_dev *kfd);
 	int (*resume)(struct kfd_dev *kfd);
+	int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
+			struct dma_fence *fence);
 };
 
 int kgd2kfd_init(unsigned interface_version,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
                     ` (17 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Add GPUVM size and DRM render node. Also add function to query the
VMID mask to avoid hard-coding it in multiple places later.

v2:
* Cut off GPUVM size at the VA hole

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 20 ++++++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
 3 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c9f204d..25c2aed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -30,6 +30,8 @@
 const struct kgd2kfd_calls *kgd2kfd;
 bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
 
+static const unsigned int compute_vmid_bitmap = 0xFF00;
+
 int amdgpu_amdkfd_init(void)
 {
 	int ret;
@@ -137,9 +139,13 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 	int last_valid_bit;
 	if (adev->kfd) {
 		struct kgd2kfd_shared_resources gpu_resources = {
-			.compute_vmid_bitmap = 0xFF00,
+			.compute_vmid_bitmap = compute_vmid_bitmap,
 			.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
-			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
+			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
+			.gpuvm_size = min(adev->vm_manager.max_pfn
+					  << AMDGPU_GPU_PAGE_SHIFT,
+					  AMDGPU_VA_HOLE_START),
+			.drm_render_minor = adev->ddev->render->index
 		};
 
 		/* this is going to have a few of the MSBs set that we need to
@@ -351,3 +357,13 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 
 	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
 }
+
+bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
+{
+	if (adev->kfd) {
+		if ((1 << vmid) & compute_vmid_bitmap)
+			return true;
+	}
+
+	return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 492c7af..9bed9fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -66,6 +66,8 @@ void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
 
+bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
+
 /* Shared API */
 int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **mem_obj, uint64_t *gpu_addr,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9e35249..36c706a 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -108,6 +108,12 @@ struct kgd2kfd_shared_resources {
 
 	/* Number of bytes at start of aperture reserved for KGD. */
 	size_t doorbell_start_offset;
+
+	/* GPUVM address space size in bytes */
+	uint64_t gpuvm_size;
+
+	/* Minor device number of the render node */
+	int drm_render_minor;
 };
 
 struct tile_config {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-9-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
                     ` (16 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Cloning a sync object is useful for waiting for a sync object
without locking the original structure indefinitely, blocking
other threads.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 35 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h |  1 +
 2 files changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index b8d3b87..2d6f5ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -322,6 +322,41 @@ struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit
 	return NULL;
 }
 
+/**
+ * amdgpu_sync_clone - clone a sync object
+ *
+ * @source: sync object to clone
+ * @clone: pointer to destination sync object
+ *
+ * Adds references to all unsignaled fences in @source to @clone. Also
+ * removes signaled fences from @source while at it.
+ */
+int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone)
+{
+	struct amdgpu_sync_entry *e;
+	struct hlist_node *tmp;
+	struct dma_fence *f;
+	int i, r;
+
+	hash_for_each_safe(source->fences, i, tmp, e, node) {
+		f = e->fence;
+		if (!dma_fence_is_signaled(f)) {
+			r = amdgpu_sync_fence(NULL, clone, f, e->explicit);
+			if (r)
+				return r;
+		} else {
+			hash_del(&e->node);
+			dma_fence_put(f);
+			kmem_cache_free(amdgpu_sync_slab, e);
+		}
+	}
+
+	dma_fence_put(clone->last_vm_update);
+	clone->last_vm_update = dma_fence_get(source->last_vm_update);
+
+	return 0;
+}
+
 int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
 {
 	struct amdgpu_sync_entry *e;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
index 7aba38d..10cf23a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
@@ -50,6 +50,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 struct dma_fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync,
 				     struct amdgpu_ring *ring);
 struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit);
+int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone);
 int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr);
 void amdgpu_sync_free(struct amdgpu_sync *sync);
 int amdgpu_sync_init(void);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-10-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
                     ` (15 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

v2:
* Removed unused flags from struct kgd_mem
* Updated some comments
* Added a check to unmap_memory_from_gpu whether BO was mapped

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile               |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |   91 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |   66 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |   67 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 1501 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c        |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h        |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c           |    7 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |   77 ++
 10 files changed, 1813 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 43dc3f9..180b2a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -131,6 +131,7 @@ amdgpu-y += \
 amdgpu-y += \
 	 amdgpu_amdkfd.o \
 	 amdgpu_amdkfd_fence.o \
+	 amdgpu_amdkfd_gpuvm.o \
 	 amdgpu_amdkfd_gfx_v8.o
 
 # add cgs
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 25c2aed..01fb142 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -58,6 +58,7 @@ int amdgpu_amdkfd_init(void)
 #else
 	ret = -ENOENT;
 #endif
+	amdgpu_amdkfd_gpuvm_init_mem_limits();
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 9bed9fc..87fb4e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -28,15 +28,41 @@
 #include <linux/types.h>
 #include <linux/mmu_context.h>
 #include <kgd_kfd_interface.h>
+#include <drm/ttm/ttm_execbuf_util.h>
+#include "amdgpu_sync.h"
+#include "amdgpu_vm.h"
 
 extern const struct kgd2kfd_calls *kgd2kfd;
 
 struct amdgpu_device;
 
+struct kfd_bo_va_list {
+	struct list_head bo_list;
+	struct amdgpu_bo_va *bo_va;
+	void *kgd_dev;
+	bool is_mapped;
+	uint64_t va;
+	uint64_t pte_flags;
+};
+
 struct kgd_mem {
+	struct mutex lock;
 	struct amdgpu_bo *bo;
-	uint64_t gpu_addr;
-	void *cpu_ptr;
+	struct list_head bo_va_list;
+	/* protected by amdkfd_process_info.lock */
+	struct ttm_validate_buffer validate_list;
+	struct ttm_validate_buffer resv_list;
+	uint32_t domain;
+	unsigned int mapped_to_gpu_memory;
+	uint64_t va;
+
+	uint32_t mapping_flags;
+
+	struct amdkfd_process_info *process_info;
+
+	struct amdgpu_sync sync;
+
+	bool aql_queue;
 };
 
 /* KFD Memory Eviction */
@@ -52,6 +78,41 @@ struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
 bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
 struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
 
+struct amdkfd_process_info {
+	/* List head of all VMs that belong to a KFD process */
+	struct list_head vm_list_head;
+	/* List head for all KFD BOs that belong to a KFD process. */
+	struct list_head kfd_bo_list;
+	/* Lock to protect kfd_bo_list */
+	struct mutex lock;
+
+	/* Number of VMs */
+	unsigned int n_vms;
+	/* Eviction Fence */
+	struct amdgpu_amdkfd_fence *eviction_fence;
+};
+
+/* struct amdkfd_vm -
+ * For Memory Eviction KGD requires a mechanism to keep track of all KFD BOs
+ * belonging to a KFD process. All the VMs belonging to the same process point
+ * to the same amdkfd_process_info.
+ */
+struct amdkfd_vm {
+	/* Keep base as the first parameter for pointer compatibility between
+	 * amdkfd_vm and amdgpu_vm.
+	 */
+	struct amdgpu_vm base;
+
+	/* List node in amdkfd_process_info.vm_list_head*/
+	struct list_head vm_list_node;
+
+	struct amdgpu_device *adev;
+	/* Points to the KFD process VM info*/
+	struct amdkfd_process_info *process_info;
+
+	uint64_t pd_phys_addr;
+};
+
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
@@ -96,4 +157,30 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd);
 		valid;							\
 	})
 
+/* GPUVM API */
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
+					  void **process_info,
+					  struct dma_fence **ef);
+void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm);
+uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm);
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
+		struct kgd_dev *kgd, uint64_t va, uint64_t size,
+		void *vm, struct kgd_mem **mem,
+		uint64_t *offset, uint32_t flags);
+int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem);
+int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
+int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
+int amdgpu_amdkfd_gpuvm_sync_memory(
+		struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
+int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
+		struct kgd_mem *mem, void **kptr, uint64_t *size);
+int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
+					    struct dma_fence **ef);
+
+void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
+void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo);
+
 #endif /* AMDGPU_AMDKFD_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 1362181..65783d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -143,6 +143,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base);
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
 
 /* Because of REG_GET_FIELD() being used, we put this function in the
  * asic specific file.
@@ -199,7 +203,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
 	.get_cu_info = get_cu_info,
-	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
+	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
+	.set_vm_context_page_table_base = set_vm_context_page_table_base,
+	.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
+	.free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
+	.map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
+	.unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
+	.sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
+	.map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
+	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
+	.invalidate_tlbs = invalidate_tlbs,
+	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
@@ -855,3 +872,50 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	return hdr->common.ucode_version;
 }
 
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+			uint32_t page_table_base)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("trying to set page table base for wrong VMID\n");
+		return;
+	}
+	WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
+}
+
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	int vmid;
+	unsigned int tmp;
+
+	for (vmid = 0; vmid < 16; vmid++) {
+		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
+			continue;
+
+		tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
+		if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
+			(tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
+			WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+			RREG32(mmVM_INVALIDATE_RESPONSE);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("non kfd vmid\n");
+		return 0;
+	}
+
+	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+	RREG32(mmVM_INVALIDATE_RESPONSE);
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 5130eac..1b5bf13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -101,6 +101,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base);
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
 
 /* Because of REG_GET_FIELD() being used, we put this function in the
  * asic specific file.
@@ -159,7 +163,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
 	.get_cu_info = get_cu_info,
-	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
+	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
+	.set_vm_context_page_table_base = set_vm_context_page_table_base,
+	.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
+	.free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
+	.map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
+	.unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
+	.sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
+	.map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
+	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
+	.invalidate_tlbs = invalidate_tlbs,
+	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
@@ -816,3 +833,51 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	/* Only 12 bit in use*/
 	return hdr->common.ucode_version;
 }
+
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("trying to set page table base for wrong VMID\n");
+		return;
+	}
+	WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
+}
+
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	int vmid;
+	unsigned int tmp;
+
+	for (vmid = 0; vmid < 16; vmid++) {
+		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
+			continue;
+
+		tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
+		if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
+			(tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
+			WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+			RREG32(mmVM_INVALIDATE_RESPONSE);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("non kfd vmid %d\n", vmid);
+		return -EINVAL;
+	}
+
+	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+	RREG32(mmVM_INVALIDATE_RESPONSE);
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
new file mode 100644
index 0000000..9703fd0
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -0,0 +1,1501 @@
+/*
+ * Copyright 2014-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#define pr_fmt(fmt) "kfd2kgd: " fmt
+
+#include <linux/list.h>
+#include <drm/drmP.h>
+#include "amdgpu_object.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_amdkfd.h"
+
+/* Special VM and GART address alignment needed for VI pre-Fiji due to
+ * a HW bug.
+ */
+#define VI_BO_SIZE_ALIGN (0x8000)
+
+/* Impose limit on how much memory KFD can use */
+static struct {
+	uint64_t max_system_mem_limit;
+	int64_t system_mem_used;
+	spinlock_t mem_limit_lock;
+} kfd_mem_limit;
+
+/* Struct used for amdgpu_amdkfd_bo_validate */
+struct amdgpu_vm_parser {
+	uint32_t        domain;
+	bool            wait;
+};
+
+static const char * const domain_bit_to_string[] = {
+		"CPU",
+		"GTT",
+		"VRAM",
+		"GDS",
+		"GWS",
+		"OA"
+};
+
+#define domain_string(domain) domain_bit_to_string[ffs(domain)-1]
+
+
+
+static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
+{
+	return (struct amdgpu_device *)kgd;
+}
+
+static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
+		struct kgd_mem *mem)
+{
+	struct kfd_bo_va_list *entry;
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list)
+		if (entry->bo_va->base.vm == avm)
+			return false;
+
+	return true;
+}
+
+/* Set memory usage limits. Current, limits are
+ *  System (kernel) memory - 3/8th System RAM
+ */
+void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
+{
+	struct sysinfo si;
+	uint64_t mem;
+
+	si_meminfo(&si);
+	mem = si.totalram - si.totalhigh;
+	mem *= si.mem_unit;
+
+	spin_lock_init(&kfd_mem_limit.mem_limit_lock);
+	kfd_mem_limit.max_system_mem_limit = (mem >> 1) - (mem >> 3);
+	pr_debug("Kernel memory limit %lluM\n",
+		(kfd_mem_limit.max_system_mem_limit >> 20));
+}
+
+static int amdgpu_amdkfd_reserve_system_mem_limit(struct amdgpu_device *adev,
+					      uint64_t size, u32 domain)
+{
+	size_t acc_size;
+	int ret = 0;
+
+	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
+				       sizeof(struct amdgpu_bo));
+
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+	if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+		if (kfd_mem_limit.system_mem_used + (acc_size + size) >
+			kfd_mem_limit.max_system_mem_limit) {
+			ret = -ENOMEM;
+			goto err_no_mem;
+		}
+		kfd_mem_limit.system_mem_used += (acc_size + size);
+	}
+err_no_mem:
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+	return ret;
+}
+
+static void unreserve_system_mem_limit(struct amdgpu_device *adev,
+				       uint64_t size, u32 domain)
+{
+	size_t acc_size;
+
+	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
+				       sizeof(struct amdgpu_bo));
+
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+	if (domain == AMDGPU_GEM_DOMAIN_GTT)
+		kfd_mem_limit.system_mem_used -= (acc_size + size);
+	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
+		  "kfd system memory accounting unbalanced");
+
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+}
+
+void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo)
+{
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+
+	if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT) {
+		kfd_mem_limit.system_mem_used -=
+			(bo->tbo.acc_size + amdgpu_bo_size(bo));
+	}
+	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
+		  "kfd system memory accounting unbalanced");
+
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+}
+
+
+/* amdgpu_amdkfd_remove_eviction_fence - Removes eviction fence(s) from BO's
+ *  reservation object.
+ *
+ * @bo: [IN] Remove eviction fence(s) from this BO
+ * @ef: [IN] If ef is specified, then this eviction fence is removed if it
+ *  is present in the shared list.
+ * @ef_list: [OUT] Returns list of eviction fences. These fences are removed
+ *  from BO's reservation object shared list.
+ * @ef_count: [OUT] Number of fences in ef_list.
+ *
+ * NOTE: If called with ef_list, then amdgpu_amdkfd_add_eviction_fence must be
+ *  called to restore the eviction fences and to avoid memory leak. This is
+ *  useful for shared BOs.
+ * NOTE: Must be called with BO reserved i.e. bo->tbo.resv->lock held.
+ */
+static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
+					struct amdgpu_amdkfd_fence *ef,
+					struct amdgpu_amdkfd_fence ***ef_list,
+					unsigned int *ef_count)
+{
+	struct reservation_object_list *fobj;
+	struct reservation_object *resv;
+	unsigned int i = 0, j = 0, k = 0, shared_count;
+	unsigned int count = 0;
+	struct amdgpu_amdkfd_fence **fence_list;
+
+	if (!ef && !ef_list)
+		return -EINVAL;
+
+	if (ef_list) {
+		*ef_list = NULL;
+		*ef_count = 0;
+	}
+
+	resv = bo->tbo.resv;
+	fobj = reservation_object_get_list(resv);
+
+	if (!fobj)
+		return 0;
+
+	preempt_disable();
+	write_seqcount_begin(&resv->seq);
+
+	/* Go through all the shared fences in the resevation object. If
+	 * ef is specified and it exists in the list, remove it and reduce the
+	 * count. If ef is not specified, then get the count of eviction fences
+	 * present.
+	 */
+	shared_count = fobj->shared_count;
+	for (i = 0; i < shared_count; ++i) {
+		struct dma_fence *f;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+					      reservation_object_held(resv));
+
+		if (ef) {
+			if (f->context == ef->base.context) {
+				dma_fence_put(f);
+				fobj->shared_count--;
+			} else
+				RCU_INIT_POINTER(fobj->shared[j++], f);
+
+		} else if (to_amdgpu_amdkfd_fence(f))
+			count++;
+	}
+	write_seqcount_end(&resv->seq);
+	preempt_enable();
+
+	if (ef || !count)
+		return 0;
+
+	/* Alloc memory for count number of eviction fence pointers. Fill the
+	 * ef_list array and ef_count
+	 */
+	fence_list = kcalloc(count, sizeof(struct amdgpu_amdkfd_fence *),
+			     GFP_KERNEL);
+	if (!fence_list)
+		return -ENOMEM;
+
+	preempt_disable();
+	write_seqcount_begin(&resv->seq);
+
+	j = 0;
+	for (i = 0; i < shared_count; ++i) {
+		struct dma_fence *f;
+		struct amdgpu_amdkfd_fence *efence;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+			reservation_object_held(resv));
+
+		efence = to_amdgpu_amdkfd_fence(f);
+		if (efence) {
+			fence_list[k++] = efence;
+			fobj->shared_count--;
+		} else
+			RCU_INIT_POINTER(fobj->shared[j++], f);
+	}
+
+	write_seqcount_end(&resv->seq);
+	preempt_enable();
+
+	*ef_list = fence_list;
+	*ef_count = k;
+
+	return 0;
+}
+
+/* amdgpu_amdkfd_add_eviction_fence - Adds eviction fence(s) back into BO's
+ *  reservation object.
+ *
+ * @bo: [IN] Add eviction fences to this BO
+ * @ef_list: [IN] List of eviction fences to be added
+ * @ef_count: [IN] Number of fences in ef_list.
+ *
+ * NOTE: Must call amdgpu_amdkfd_remove_eviction_fence before calling this
+ *  function.
+ */
+static void amdgpu_amdkfd_add_eviction_fence(struct amdgpu_bo *bo,
+				struct amdgpu_amdkfd_fence **ef_list,
+				unsigned int ef_count)
+{
+	int i;
+
+	if (!ef_list || !ef_count)
+		return;
+
+	for (i = 0; i < ef_count; i++) {
+		amdgpu_bo_fence(bo, &ef_list[i]->base, true);
+		/* Readding the fence takes an additional reference. Drop that
+		 * reference.
+		 */
+		dma_fence_put(&ef_list[i]->base);
+	}
+
+	kfree(ef_list);
+}
+
+static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain,
+				     bool wait)
+{
+	struct ttm_operation_ctx ctx = { false, false };
+	int ret;
+
+	if (WARN(amdgpu_ttm_tt_get_usermm(bo->tbo.ttm),
+		 "Called with userptr BO"))
+		return -EINVAL;
+
+	amdgpu_ttm_placement_from_domain(bo, domain);
+
+	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (ret)
+		goto validate_fail;
+	if (wait) {
+		struct amdgpu_amdkfd_fence **ef_list;
+		unsigned int ef_count;
+
+		ret = amdgpu_amdkfd_remove_eviction_fence(bo, NULL, &ef_list,
+							  &ef_count);
+		if (ret)
+			goto validate_fail;
+
+		ttm_bo_wait(&bo->tbo, false, false);
+		amdgpu_amdkfd_add_eviction_fence(bo, ef_list, ef_count);
+	}
+
+validate_fail:
+	return ret;
+}
+
+static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo)
+{
+	struct amdgpu_vm_parser *p = param;
+
+	return amdgpu_amdkfd_bo_validate(bo, p->domain, p->wait);
+}
+
+/* vm_validate_pt_pd_bos - Validate page table and directory BOs
+ *
+ * Page directories are not updated here because huge page handling
+ * during page table updates can invalidate page directory entries
+ * again. Page directories are only updated after updating page
+ * tables.
+ */
+static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
+{
+	struct amdgpu_bo *pd = vm->base.root.base.bo;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
+	struct amdgpu_vm_parser param;
+	uint64_t addr, flags = AMDGPU_PTE_VALID;
+	int ret;
+
+	param.domain = AMDGPU_GEM_DOMAIN_VRAM;
+	param.wait = false;
+
+	ret = amdgpu_vm_validate_pt_bos(adev, &vm->base, amdgpu_amdkfd_validate,
+					&param);
+	if (ret) {
+		pr_err("amdgpu: failed to validate PT BOs\n");
+		return ret;
+	}
+
+	ret = amdgpu_amdkfd_validate(&param, pd);
+	if (ret) {
+		pr_err("amdgpu: failed to validate PD\n");
+		return ret;
+	}
+
+	addr = amdgpu_bo_gpu_offset(vm->base.root.base.bo);
+	amdgpu_gart_get_vm_pde(adev, -1, &addr, &flags);
+	vm->pd_phys_addr = addr;
+
+	if (vm->base.use_cpu_for_update) {
+		ret = amdgpu_bo_kmap(pd, NULL);
+		if (ret) {
+			pr_err("amdgpu: failed to kmap PD, ret=%d\n", ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int sync_vm_fence(struct amdgpu_device *adev, struct amdgpu_sync *sync,
+			 struct dma_fence *f)
+{
+	int ret = amdgpu_sync_fence(adev, sync, f, false);
+
+	/* Sync objects can't handle multiple GPUs (contexts) updating
+	 * sync->last_vm_update. Fortunately we don't need it for
+	 * KFD's purposes, so we can just drop that fence.
+	 */
+	if (sync->last_vm_update) {
+		dma_fence_put(sync->last_vm_update);
+		sync->last_vm_update = NULL;
+	}
+
+	return ret;
+}
+
+static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
+{
+	struct amdgpu_bo *pd = vm->root.base.bo;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
+	int ret;
+
+	ret = amdgpu_vm_update_directories(adev, vm);
+	if (ret)
+		return ret;
+
+	return sync_vm_fence(adev, sync, vm->last_update);
+}
+
+/* add_bo_to_vm - Add a BO to a VM
+ *
+ * Everything that needs to bo done only once when a BO is first added
+ * to a VM. It can later be mapped and unmapped many times without
+ * repeating these steps.
+ *
+ * 1. Allocate and initialize BO VA entry data structure
+ * 2. Add BO to the VM
+ * 3. Determine ASIC-specific PTE flags
+ * 4. Alloc page tables and directories if needed
+ * 4a.  Validate new page tables and directories
+ */
+static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
+		struct amdgpu_vm *avm, bool is_aql,
+		struct kfd_bo_va_list **p_bo_va_entry)
+{
+	int ret;
+	struct kfd_bo_va_list *bo_va_entry;
+	struct amdkfd_vm *kvm = container_of(avm,
+					     struct amdkfd_vm, base);
+	struct amdgpu_bo *pd = avm->root.base.bo;
+	struct amdgpu_bo *bo = mem->bo;
+	uint64_t va = mem->va;
+	struct list_head *list_bo_va = &mem->bo_va_list;
+	unsigned long bo_size = bo->tbo.mem.size;
+
+	if (!va) {
+		pr_err("Invalid VA when adding BO to VM\n");
+		return -EINVAL;
+	}
+
+	if (is_aql)
+		va += bo_size;
+
+	bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
+	if (!bo_va_entry)
+		return -ENOMEM;
+
+	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
+			va + bo_size, avm);
+
+	/* Add BO to VM internal data structures*/
+	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, avm, bo);
+	if (!bo_va_entry->bo_va) {
+		ret = -EINVAL;
+		pr_err("Failed to add BO object to VM. ret == %d\n",
+				ret);
+		goto err_vmadd;
+	}
+
+	bo_va_entry->va = va;
+	bo_va_entry->pte_flags = amdgpu_vm_get_pte_flags(adev,
+							 mem->mapping_flags);
+	bo_va_entry->kgd_dev = (void *)adev;
+	list_add(&bo_va_entry->bo_list, list_bo_va);
+
+	if (p_bo_va_entry)
+		*p_bo_va_entry = bo_va_entry;
+
+	/* Allocate new page tables if neeeded and validate
+	 * them. Clearing of new page tables and validate need to wait
+	 * on move fences. We don't want that to trigger the eviction
+	 * fence, so remove it temporarily.
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(pd,
+					kvm->process_info->eviction_fence,
+					NULL, NULL);
+
+	ret = amdgpu_vm_alloc_pts(adev, avm, va, amdgpu_bo_size(bo));
+	if (ret) {
+		pr_err("Failed to allocate pts, err=%d\n", ret);
+		goto err_alloc_pts;
+	}
+
+	ret = vm_validate_pt_pd_bos(kvm);
+	if (ret) {
+		pr_err("validate_pt_pd_bos() failed\n");
+		goto err_alloc_pts;
+	}
+
+	/* Add the eviction fence back */
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+
+	return 0;
+
+err_alloc_pts:
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+	amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
+	list_del(&bo_va_entry->bo_list);
+err_vmadd:
+	kfree(bo_va_entry);
+	return ret;
+}
+
+static void remove_bo_from_vm(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry, unsigned long size)
+{
+	pr_debug("\t remove VA 0x%llx - 0x%llx in entry %p\n",
+			entry->va,
+			entry->va + size, entry);
+	amdgpu_vm_bo_rmv(adev, entry->bo_va);
+	list_del(&entry->bo_list);
+	kfree(entry);
+}
+
+static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
+				struct amdkfd_process_info *process_info)
+{
+	struct ttm_validate_buffer *entry = &mem->validate_list;
+	struct amdgpu_bo *bo = mem->bo;
+
+	INIT_LIST_HEAD(&entry->head);
+	entry->shared = true;
+	entry->bo = &bo->tbo;
+	mutex_lock(&process_info->lock);
+	list_add_tail(&entry->head, &process_info->kfd_bo_list);
+	mutex_unlock(&process_info->lock);
+}
+
+/* Reserving a BO and its page table BOs must happen atomically to
+ * avoid deadlocks. Some operations update multiple VMs at once. Track
+ * all the reservation info in a context structure. Optionally a sync
+ * object can track VM updates.
+ */
+struct bo_vm_reservation_context {
+	struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
+	unsigned int n_vms;		    /* Number of VMs reserved	    */
+	struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
+	struct ww_acquire_ctx ticket;	    /* Reservation ticket	    */
+	struct list_head list, duplicates;  /* BO lists			    */
+	struct amdgpu_sync *sync;	    /* Pointer to sync object	    */
+	bool reserved;			    /* Whether BOs are reserved	    */
+};
+
+enum bo_vm_match {
+	BO_VM_NOT_MAPPED = 0,	/* Match VMs where a BO is not mapped */
+	BO_VM_MAPPED,		/* Match VMs where a BO is mapped     */
+	BO_VM_ALL,		/* Match all VMs a BO was added to    */
+};
+
+/**
+ * reserve_bo_and_vm - reserve a BO and a VM unconditionally.
+ * @mem: KFD BO structure.
+ * @vm: the VM to reserve.
+ * @ctx: the struct that will be used in unreserve_bo_and_vms().
+ */
+static int reserve_bo_and_vm(struct kgd_mem *mem,
+			      struct amdgpu_vm *vm,
+			      struct bo_vm_reservation_context *ctx)
+{
+	struct amdgpu_bo *bo = mem->bo;
+	int ret;
+
+	WARN_ON(!vm);
+
+	ctx->reserved = false;
+	ctx->n_vms = 1;
+	ctx->sync = &mem->sync;
+
+	INIT_LIST_HEAD(&ctx->list);
+	INIT_LIST_HEAD(&ctx->duplicates);
+
+	ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
+	if (!ctx->vm_pd)
+		return -ENOMEM;
+
+	ctx->kfd_bo.robj = bo;
+	ctx->kfd_bo.priority = 0;
+	ctx->kfd_bo.tv.bo = &bo->tbo;
+	ctx->kfd_bo.tv.shared = true;
+	ctx->kfd_bo.user_pages = NULL;
+	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
+
+	amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
+
+	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
+				     false, &ctx->duplicates);
+	if (!ret)
+		ctx->reserved = true;
+	else {
+		pr_err("Failed to reserve buffers in ttm\n");
+		kfree(ctx->vm_pd);
+		ctx->vm_pd = NULL;
+	}
+
+	return ret;
+}
+
+/**
+ * reserve_bo_and_cond_vms - reserve a BO and some VMs conditionally
+ * @mem: KFD BO structure.
+ * @vm: the VM to reserve. If NULL, then all VMs associated with the BO
+ * is used. Otherwise, a single VM associated with the BO.
+ * @map_type: the mapping status that will be used to filter the VMs.
+ * @ctx: the struct that will be used in unreserve_bo_and_vms().
+ *
+ * Returns 0 for success, negative for failure.
+ */
+static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
+				struct amdgpu_vm *vm, enum bo_vm_match map_type,
+				struct bo_vm_reservation_context *ctx)
+{
+	struct amdgpu_bo *bo = mem->bo;
+	struct kfd_bo_va_list *entry;
+	unsigned int i;
+	int ret;
+
+	ctx->reserved = false;
+	ctx->n_vms = 0;
+	ctx->vm_pd = NULL;
+	ctx->sync = &mem->sync;
+
+	INIT_LIST_HEAD(&ctx->list);
+	INIT_LIST_HEAD(&ctx->duplicates);
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if ((vm && vm != entry->bo_va->base.vm) ||
+			(entry->is_mapped != map_type
+			&& map_type != BO_VM_ALL))
+			continue;
+
+		ctx->n_vms++;
+	}
+
+	if (ctx->n_vms != 0) {
+		ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
+				     GFP_KERNEL);
+		if (!ctx->vm_pd)
+			return -ENOMEM;
+	}
+
+	ctx->kfd_bo.robj = bo;
+	ctx->kfd_bo.priority = 0;
+	ctx->kfd_bo.tv.bo = &bo->tbo;
+	ctx->kfd_bo.tv.shared = true;
+	ctx->kfd_bo.user_pages = NULL;
+	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
+
+	i = 0;
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if ((vm && vm != entry->bo_va->base.vm) ||
+			(entry->is_mapped != map_type
+			&& map_type != BO_VM_ALL))
+			continue;
+
+		amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
+				&ctx->vm_pd[i]);
+		i++;
+	}
+
+	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
+				     false, &ctx->duplicates);
+	if (!ret)
+		ctx->reserved = true;
+	else
+		pr_err("Failed to reserve buffers in ttm.\n");
+
+	if (ret) {
+		kfree(ctx->vm_pd);
+		ctx->vm_pd = NULL;
+	}
+
+	return ret;
+}
+
+/**
+ * unreserve_bo_and_vms - Unreserve BO and VMs from a reservation context
+ * @ctx: Reservation context to unreserve
+ * @wait: Optionally wait for a sync object representing pending VM updates
+ * @intr: Whether the wait is interruptible
+ *
+ * Also frees any resources allocated in
+ * reserve_bo_and_(cond_)vm(s). Returns the status from
+ * amdgpu_sync_wait.
+ */
+static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
+				 bool wait, bool intr)
+{
+	int ret = 0;
+
+	if (wait)
+		ret = amdgpu_sync_wait(ctx->sync, intr);
+
+	if (ctx->reserved)
+		ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
+	kfree(ctx->vm_pd);
+
+	ctx->sync = NULL;
+
+	ctx->reserved = false;
+	ctx->vm_pd = NULL;
+
+	return ret;
+}
+
+static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
+				struct kfd_bo_va_list *entry,
+				struct amdgpu_sync *sync)
+{
+	struct amdgpu_bo_va *bo_va = entry->bo_va;
+	struct amdgpu_vm *vm = bo_va->base.vm;
+	struct amdkfd_vm *kvm = container_of(vm, struct amdkfd_vm, base);
+	struct amdgpu_bo *pd = vm->root.base.bo;
+
+	/* Remove eviction fence from PD (and thereby from PTs too as
+	 * they share the resv. object). Otherwise during PT update
+	 * job (see amdgpu_vm_bo_update_mapping), eviction fence would
+	 * get added to job->sync object and job execution would
+	 * trigger the eviction fence.
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(pd,
+					    kvm->process_info->eviction_fence,
+					    NULL, NULL);
+	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
+
+	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
+
+	/* Add the eviction fence back */
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+
+	sync_vm_fence(adev, sync, bo_va->last_pt_update);
+
+	return 0;
+}
+
+static int update_gpuvm_pte(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry,
+		struct amdgpu_sync *sync)
+{
+	int ret;
+	struct amdgpu_vm *vm;
+	struct amdgpu_bo_va *bo_va;
+	struct amdgpu_bo *bo;
+
+	bo_va = entry->bo_va;
+	vm = bo_va->base.vm;
+	bo = bo_va->base.bo;
+
+	/* Update the page tables  */
+	ret = amdgpu_vm_bo_update(adev, bo_va, false);
+	if (ret) {
+		pr_err("amdgpu_vm_bo_update failed\n");
+		return ret;
+	}
+
+	return sync_vm_fence(adev, sync, bo_va->last_pt_update);
+}
+
+static int map_bo_to_gpuvm(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync)
+{
+	int ret;
+
+	/* Set virtual address for the allocation */
+	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
+			       amdgpu_bo_size(entry->bo_va->base.bo),
+			       entry->pte_flags);
+	if (ret) {
+		pr_err("Failed to map VA 0x%llx in vm. ret %d\n",
+				entry->va, ret);
+		return ret;
+	}
+
+	ret = update_gpuvm_pte(adev, entry, sync);
+	if (ret) {
+		pr_err("update_gpuvm_pte() failed\n");
+		goto update_gpuvm_pte_failed;
+	}
+
+	return 0;
+
+update_gpuvm_pte_failed:
+	unmap_bo_from_gpuvm(adev, entry, sync);
+	return ret;
+}
+
+static int process_validate_vms(struct amdkfd_process_info *process_info)
+{
+	struct amdkfd_vm *peer_vm;
+	int ret;
+
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		ret = vm_validate_pt_pd_bos(peer_vm);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int process_update_pds(struct amdkfd_process_info *process_info,
+			      struct amdgpu_sync *sync)
+{
+	struct amdkfd_vm *peer_vm;
+	int ret;
+
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		ret = vm_update_pds(&peer_vm->base, sync);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
+					  void **process_info,
+					  struct dma_fence **ef)
+{
+	int ret;
+	struct amdkfd_vm *new_vm;
+	struct amdkfd_process_info *info;
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
+	if (!new_vm)
+		return -ENOMEM;
+
+	/* Initialize the VM context, allocate the page directory and zero it */
+	ret = amdgpu_vm_init(adev, &new_vm->base, AMDGPU_VM_CONTEXT_COMPUTE, 0);
+	if (ret) {
+		pr_err("Failed init vm ret %d\n", ret);
+		goto vm_init_fail;
+	}
+	new_vm->adev = adev;
+
+	if (!*process_info) {
+		info = kzalloc(sizeof(*info), GFP_KERNEL);
+		if (!info) {
+			ret = -ENOMEM;
+			goto alloc_process_info_fail;
+		}
+
+		mutex_init(&info->lock);
+		INIT_LIST_HEAD(&info->vm_list_head);
+		INIT_LIST_HEAD(&info->kfd_bo_list);
+
+		info->eviction_fence =
+			amdgpu_amdkfd_fence_create(dma_fence_context_alloc(1),
+						   current->mm);
+		if (!info->eviction_fence) {
+			pr_err("Failed to create eviction fence\n");
+			goto create_evict_fence_fail;
+		}
+
+		*process_info = info;
+		*ef = dma_fence_get(&info->eviction_fence->base);
+	}
+
+	new_vm->process_info = *process_info;
+
+	mutex_lock(&new_vm->process_info->lock);
+	list_add_tail(&new_vm->vm_list_node,
+			&(new_vm->process_info->vm_list_head));
+	new_vm->process_info->n_vms++;
+	mutex_unlock(&new_vm->process_info->lock);
+
+	*vm = (void *) new_vm;
+
+	pr_debug("Created process vm %p\n", *vm);
+
+	return ret;
+
+create_evict_fence_fail:
+	kfree(info);
+alloc_process_info_fail:
+	amdgpu_vm_fini(adev, &new_vm->base);
+vm_init_fail:
+	kfree(new_vm);
+	return ret;
+
+}
+
+void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *) vm;
+	struct amdgpu_vm *avm = &kfd_vm->base;
+	struct amdgpu_bo *pd;
+	struct amdkfd_process_info *process_info;
+
+	if (WARN_ON(!kgd || !vm))
+		return;
+
+	pr_debug("Destroying process vm %p\n", vm);
+	/* Release eviction fence from PD */
+	pd = avm->root.base.bo;
+	amdgpu_bo_reserve(pd, false);
+	amdgpu_bo_fence(pd, NULL, false);
+	amdgpu_bo_unreserve(pd);
+
+	process_info = kfd_vm->process_info;
+
+	mutex_lock(&process_info->lock);
+	process_info->n_vms--;
+	list_del(&kfd_vm->vm_list_node);
+	mutex_unlock(&process_info->lock);
+
+	/* Release per-process resources */
+	if (!process_info->n_vms) {
+		WARN_ON(!list_empty(&process_info->kfd_bo_list));
+
+		dma_fence_put(&process_info->eviction_fence->base);
+		kfree(process_info);
+	}
+
+	/* Release the VM context */
+	amdgpu_vm_fini(adev, avm);
+	kfree(vm);
+}
+
+uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm)
+{
+	struct amdkfd_vm *avm = (struct amdkfd_vm *)vm;
+
+	return avm->pd_phys_addr >> AMDGPU_GPU_PAGE_SHIFT;
+}
+
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
+		struct kgd_dev *kgd, uint64_t va, uint64_t size,
+		void *vm, struct kgd_mem **mem,
+		uint64_t *offset, uint32_t flags)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	struct amdgpu_bo *bo;
+	int byte_align;
+	u32 alloc_domain;
+	u64 alloc_flags;
+	uint32_t mapping_flags;
+	int ret;
+
+	/*
+	 * Check on which domain to allocate BO
+	 */
+	if (flags & ALLOC_MEM_FLAGS_VRAM) {
+		alloc_domain = AMDGPU_GEM_DOMAIN_VRAM;
+		alloc_flags = AMDGPU_GEM_CREATE_VRAM_CLEARED;
+		alloc_flags |= (flags & ALLOC_MEM_FLAGS_PUBLIC) ?
+			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED :
+			AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
+	} else if (flags & ALLOC_MEM_FLAGS_GTT) {
+		alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
+		alloc_flags = 0;
+	} else {
+		return -EINVAL;
+	}
+
+	*mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+	if (!*mem)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	mutex_init(&(*mem)->lock);
+	(*mem)->aql_queue     = !!(flags & ALLOC_MEM_FLAGS_AQL_QUEUE_MEM);
+
+	/* Workaround for AQL queue wraparound bug. Map the same
+	 * memory twice. That means we only actually allocate half
+	 * the memory.
+	 */
+	if ((*mem)->aql_queue)
+		size = size >> 1;
+
+	/* Workaround for TLB bug on older VI chips */
+	byte_align = (adev->family == AMDGPU_FAMILY_VI &&
+			adev->asic_type != CHIP_FIJI &&
+			adev->asic_type != CHIP_POLARIS10 &&
+			adev->asic_type != CHIP_POLARIS11) ?
+			VI_BO_SIZE_ALIGN : 1;
+
+	mapping_flags = AMDGPU_VM_PAGE_READABLE;
+	if (flags & ALLOC_MEM_FLAGS_WRITABLE)
+		mapping_flags |= AMDGPU_VM_PAGE_WRITEABLE;
+	if (flags & ALLOC_MEM_FLAGS_EXECUTABLE)
+		mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
+	if (flags & ALLOC_MEM_FLAGS_COHERENT)
+		mapping_flags |= AMDGPU_VM_MTYPE_UC;
+	else
+		mapping_flags |= AMDGPU_VM_MTYPE_NC;
+	(*mem)->mapping_flags = mapping_flags;
+
+	amdgpu_sync_create(&(*mem)->sync);
+
+	ret = amdgpu_amdkfd_reserve_system_mem_limit(adev, size, alloc_domain);
+	if (ret) {
+		pr_debug("Insufficient system memory\n");
+		goto err_bo_create;
+	}
+
+	pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n",
+			va, size, domain_string(alloc_domain));
+
+	ret = amdgpu_bo_create(adev, size, byte_align, false,
+				alloc_domain, alloc_flags, NULL, NULL, 0, &bo);
+	if (ret) {
+		pr_debug("Failed to create BO on domain %s. ret %d\n",
+				domain_string(alloc_domain), ret);
+		unreserve_system_mem_limit(adev, size, alloc_domain);
+		goto err_bo_create;
+	}
+	bo->kfd_bo = *mem;
+	(*mem)->bo = bo;
+
+	(*mem)->va = va;
+	(*mem)->domain = alloc_domain;
+	(*mem)->mapped_to_gpu_memory = 0;
+	(*mem)->process_info = kfd_vm->process_info;
+	add_kgd_mem_to_kfd_bo_list(*mem, kfd_vm->process_info);
+
+	if (offset)
+		*offset = amdgpu_bo_mmap_offset(bo);
+
+	return 0;
+
+err_bo_create:
+	kfree(*mem);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	struct amdkfd_process_info *process_info = mem->process_info;
+	unsigned long bo_size = mem->bo->tbo.mem.size;
+	struct kfd_bo_va_list *entry, *tmp;
+	struct bo_vm_reservation_context ctx;
+	struct ttm_validate_buffer *bo_list_entry;
+	int ret;
+
+	mutex_lock(&mem->lock);
+
+	if (mem->mapped_to_gpu_memory > 0) {
+		pr_debug("BO VA 0x%llx size 0x%lx is still mapped.\n",
+				mem->va, bo_size);
+		mutex_unlock(&mem->lock);
+		return -EBUSY;
+	}
+
+	mutex_unlock(&mem->lock);
+	/* lock is not needed after this, since mem is unused and will
+	 * be freed anyway
+	 */
+
+	/* Make sure restore workers don't access the BO any more */
+	bo_list_entry = &mem->validate_list;
+	mutex_lock(&process_info->lock);
+	list_del(&bo_list_entry->head);
+	mutex_unlock(&process_info->lock);
+
+	ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
+	if (unlikely(ret))
+		return ret;
+
+	/* The eviction fence should be removed by the last unmap.
+	 * TODO: Log an error condition if the bo still has the eviction fence
+	 * attached
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(mem->bo,
+					process_info->eviction_fence,
+					NULL, NULL);
+	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
+		mem->va + bo_size * (1 + mem->aql_queue));
+
+	/* Remove from VM internal data structures */
+	list_for_each_entry_safe(entry, tmp, &mem->bo_va_list, bo_list)
+		remove_bo_from_vm((struct amdgpu_device *)entry->kgd_dev,
+				entry, bo_size);
+
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
+	/* Free the sync object */
+	amdgpu_sync_free(&mem->sync);
+
+	/* Free the BO*/
+	amdgpu_bo_unref(&mem->bo);
+	kfree(mem);
+
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	int ret;
+	struct amdgpu_bo *bo;
+	uint32_t domain;
+	struct kfd_bo_va_list *entry;
+	struct bo_vm_reservation_context ctx;
+	struct kfd_bo_va_list *bo_va_entry = NULL;
+	struct kfd_bo_va_list *bo_va_entry_aql = NULL;
+	unsigned long bo_size;
+
+	/* Make sure restore is not running concurrently.
+	 */
+	mutex_lock(&mem->process_info->lock);
+
+	mutex_lock(&mem->lock);
+
+	bo = mem->bo;
+
+	if (!bo) {
+		pr_err("Invalid BO when mapping memory to GPU\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	domain = mem->domain;
+	bo_size = bo->tbo.mem.size;
+
+	pr_debug("Map VA 0x%llx - 0x%llx to vm %p domain %s\n",
+			mem->va,
+			mem->va + bo_size * (1 + mem->aql_queue),
+			vm, domain_string(domain));
+
+	ret = reserve_bo_and_vm(mem, vm, &ctx);
+	if (unlikely(ret))
+		goto out;
+
+	if (check_if_add_bo_to_vm((struct amdgpu_vm *)vm, mem)) {
+		ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm, false,
+				&bo_va_entry);
+		if (ret)
+			goto add_bo_to_vm_failed;
+		if (mem->aql_queue) {
+			ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm,
+					true, &bo_va_entry_aql);
+			if (ret)
+				goto add_bo_to_vm_failed_aql;
+		}
+	} else {
+		ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+		if (unlikely(ret))
+			goto add_bo_to_vm_failed;
+	}
+
+	if (mem->mapped_to_gpu_memory == 0) {
+		/* Validate BO only once. The eviction fence gets added to BO
+		 * the first time it is mapped. Validate will wait for all
+		 * background evictions to complete.
+		 */
+		ret = amdgpu_amdkfd_bo_validate(bo, domain, true);
+		if (ret) {
+			pr_debug("Validate failed\n");
+			goto map_bo_to_gpuvm_failed;
+		}
+	}
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if (entry->bo_va->base.vm == vm && !entry->is_mapped) {
+			pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
+					entry->va, entry->va + bo_size,
+					entry);
+
+			ret = map_bo_to_gpuvm(adev, entry, ctx.sync);
+			if (ret) {
+				pr_err("Failed to map radeon bo to gpuvm\n");
+				goto map_bo_to_gpuvm_failed;
+			}
+
+			ret = vm_update_pds(vm, ctx.sync);
+			if (ret) {
+				pr_err("Failed to update page directories\n");
+				goto map_bo_to_gpuvm_failed;
+			}
+
+			entry->is_mapped = true;
+			mem->mapped_to_gpu_memory++;
+			pr_debug("\t INC mapping count %d\n",
+					mem->mapped_to_gpu_memory);
+		}
+	}
+
+	if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->pin_count)
+		amdgpu_bo_fence(bo,
+				&kfd_vm->process_info->eviction_fence->base,
+				true);
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
+	goto out;
+
+map_bo_to_gpuvm_failed:
+	if (bo_va_entry_aql)
+		remove_bo_from_vm(adev, bo_va_entry_aql, bo_size);
+add_bo_to_vm_failed_aql:
+	if (bo_va_entry)
+		remove_bo_from_vm(adev, bo_va_entry, bo_size);
+add_bo_to_vm_failed:
+	unreserve_bo_and_vms(&ctx, false, false);
+
+out:
+	mutex_unlock(&mem->process_info->lock);
+	mutex_unlock(&mem->lock);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_process_info *process_info =
+		((struct amdkfd_vm *)vm)->process_info;
+	unsigned long bo_size = mem->bo->tbo.mem.size;
+	struct kfd_bo_va_list *entry;
+	struct bo_vm_reservation_context ctx;
+	int ret;
+
+	mutex_lock(&mem->lock);
+
+	ret = reserve_bo_and_cond_vms(mem, vm, BO_VM_MAPPED, &ctx);
+	if (unlikely(ret))
+		goto out;
+	/* If no VMs were reserved, it means the BO wasn't actually mapped */
+	if (ctx.n_vms == 0) {
+		ret = -EINVAL;
+		goto unreserve_out;
+	}
+
+	ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+	if (unlikely(ret))
+		goto unreserve_out;
+
+	pr_debug("Unmap VA 0x%llx - 0x%llx from vm %p\n",
+		mem->va,
+		mem->va + bo_size * (1 + mem->aql_queue),
+		vm);
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if (entry->bo_va->base.vm == vm && entry->is_mapped) {
+			pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
+					entry->va,
+					entry->va + bo_size,
+					entry);
+
+			ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
+			if (ret == 0) {
+				entry->is_mapped = false;
+			} else {
+				pr_err("failed to unmap VA 0x%llx\n",
+						mem->va);
+				goto unreserve_out;
+			}
+
+			mem->mapped_to_gpu_memory--;
+			pr_debug("\t DEC mapping count %d\n",
+					mem->mapped_to_gpu_memory);
+		}
+	}
+
+	/* If BO is unmapped from all VMs, unfence it. It can be evicted if
+	 * required.
+	 */
+	if (mem->mapped_to_gpu_memory == 0 &&
+	    !amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && !mem->bo->pin_count)
+		amdgpu_amdkfd_remove_eviction_fence(mem->bo,
+						process_info->eviction_fence,
+						    NULL, NULL);
+
+unreserve_out:
+	unreserve_bo_and_vms(&ctx, false, false);
+out:
+	mutex_unlock(&mem->lock);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_sync_memory(
+		struct kgd_dev *kgd, struct kgd_mem *mem, bool intr)
+{
+	struct amdgpu_sync sync;
+	int ret;
+
+	amdgpu_sync_create(&sync);
+
+	mutex_lock(&mem->lock);
+	amdgpu_sync_clone(&mem->sync, &sync);
+	mutex_unlock(&mem->lock);
+
+	ret = amdgpu_sync_wait(&sync, intr);
+	amdgpu_sync_free(&sync);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
+		struct kgd_mem *mem, void **kptr, uint64_t *size)
+{
+	int ret;
+	struct amdgpu_bo *bo = mem->bo;
+
+	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
+		pr_err("userptr can't be mapped to kernel\n");
+		return -EINVAL;
+	}
+
+	/* delete kgd_mem from kfd_bo_list to avoid re-validating
+	 * this BO in BO's restoring after eviction.
+	 */
+	mutex_lock(&mem->process_info->lock);
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		pr_err("Failed to reserve bo. ret %d\n", ret);
+		goto bo_reserve_failed;
+	}
+
+	ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+	if (ret) {
+		pr_err("Failed to pin bo. ret %d\n", ret);
+		goto pin_failed;
+	}
+
+	ret = amdgpu_bo_kmap(bo, kptr);
+	if (ret) {
+		pr_err("Failed to map bo to kernel. ret %d\n", ret);
+		goto kmap_failed;
+	}
+
+	amdgpu_amdkfd_remove_eviction_fence(
+		bo, mem->process_info->eviction_fence, NULL, NULL);
+	list_del_init(&mem->validate_list.head);
+
+	if (size)
+		*size = amdgpu_bo_size(bo);
+
+	amdgpu_bo_unreserve(bo);
+
+	mutex_unlock(&mem->process_info->lock);
+	return 0;
+
+kmap_failed:
+	amdgpu_bo_unpin(bo);
+pin_failed:
+	amdgpu_bo_unreserve(bo);
+bo_reserve_failed:
+	mutex_unlock(&mem->process_info->lock);
+
+	return ret;
+}
+
+/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
+ *   KFD process identified by process_info
+ *
+ * @process_info: amdkfd_process_info of the KFD process
+ *
+ * After memory eviction, restore thread calls this function. The function
+ * should be called when the Process is still valid. BO restore involves -
+ *
+ * 1.  Release old eviction fence and create new one
+ * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
+ * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
+ *     BOs that need to be reserved.
+ * 4.  Reserve all the BOs
+ * 5.  Validate of PD and PT BOs.
+ * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
+ * 7.  Add fence to all PD and PT BOs.
+ * 8.  Unreserve all BOs
+ */
+int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
+{
+	struct amdgpu_bo_list_entry *pd_bo_list;
+	struct amdkfd_process_info *process_info = info;
+	struct amdkfd_vm *peer_vm;
+	struct kgd_mem *mem;
+	struct bo_vm_reservation_context ctx;
+	struct amdgpu_amdkfd_fence *new_fence;
+	int ret = 0, i;
+	struct list_head duplicate_save;
+	struct amdgpu_sync sync_obj;
+
+	INIT_LIST_HEAD(&duplicate_save);
+	INIT_LIST_HEAD(&ctx.list);
+	INIT_LIST_HEAD(&ctx.duplicates);
+
+	pd_bo_list = kcalloc(process_info->n_vms,
+			     sizeof(struct amdgpu_bo_list_entry),
+			     GFP_KERNEL);
+	if (!pd_bo_list)
+		return -ENOMEM;
+
+	i = 0;
+	mutex_lock(&process_info->lock);
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			vm_list_node)
+		amdgpu_vm_get_pd_bo(&peer_vm->base, &ctx.list,
+				    &pd_bo_list[i++]);
+
+	/* Reserve all BOs and page tables/directory. Add all BOs from
+	 * kfd_bo_list to ctx.list
+	 */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+			    validate_list.head) {
+
+		list_add_tail(&mem->resv_list.head, &ctx.list);
+		mem->resv_list.bo = mem->validate_list.bo;
+		mem->resv_list.shared = mem->validate_list.shared;
+	}
+
+	ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
+				     false, &duplicate_save);
+	if (ret) {
+		pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
+		goto ttm_reserve_fail;
+	}
+
+	amdgpu_sync_create(&sync_obj);
+
+	/* Validate PDs and PTs */
+	ret = process_validate_vms(process_info);
+	if (ret)
+		goto validate_map_fail;
+
+	/* Wait for PD/PTs validate to finish */
+	/* FIXME: I think this isn't needed */
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+
+		ttm_bo_wait(&bo->tbo, false, false);
+	}
+
+	/* Validate BOs and map them to GPUVM (update VM page tables). */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+			    validate_list.head) {
+
+		struct amdgpu_bo *bo = mem->bo;
+		uint32_t domain = mem->domain;
+		struct kfd_bo_va_list *bo_va_entry;
+
+		ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
+		if (ret) {
+			pr_debug("Memory eviction: Validate BOs failed. Try again\n");
+			goto validate_map_fail;
+		}
+
+		list_for_each_entry(bo_va_entry, &mem->bo_va_list,
+				    bo_list) {
+			ret = update_gpuvm_pte((struct amdgpu_device *)
+					      bo_va_entry->kgd_dev,
+					      bo_va_entry,
+					      &sync_obj);
+			if (ret) {
+				pr_debug("Memory eviction: update PTE failed. Try again\n");
+				goto validate_map_fail;
+			}
+		}
+	}
+
+	/* Update page directories */
+	ret = process_update_pds(process_info, &sync_obj);
+	if (ret) {
+		pr_debug("Memory eviction: update PDs failed. Try again\n");
+		goto validate_map_fail;
+	}
+
+	amdgpu_sync_wait(&sync_obj, false);
+
+	/* Release old eviction fence and create new one, because fence only
+	 * goes from unsignaled to signaled, fence cannot be reused.
+	 * Use context and mm from the old fence.
+	 */
+	new_fence = amdgpu_amdkfd_fence_create(
+				process_info->eviction_fence->base.context,
+				process_info->eviction_fence->mm);
+	if (!new_fence) {
+		pr_err("Failed to create eviction fence\n");
+		ret = -ENOMEM;
+		goto validate_map_fail;
+	}
+	dma_fence_put(&process_info->eviction_fence->base);
+	process_info->eviction_fence = new_fence;
+	*ef = dma_fence_get(&new_fence->base);
+
+	/* Wait for validate to finish and attach new eviction fence */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+		validate_list.head)
+		ttm_bo_wait(&mem->bo->tbo, false, false);
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+		validate_list.head)
+		amdgpu_bo_fence(mem->bo,
+			&process_info->eviction_fence->base, true);
+
+	/* Attach eviction fence to PD / PT BOs */
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+
+		amdgpu_bo_fence(bo, &process_info->eviction_fence->base, true);
+	}
+
+validate_map_fail:
+	ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
+	amdgpu_sync_free(&sync_obj);
+ttm_reserve_fail:
+	mutex_unlock(&process_info->lock);
+	kfree(pd_bo_list);
+	return ret;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5c4c3e0..f608ecf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -36,6 +36,7 @@
 #include <drm/drm_cache.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 static bool amdgpu_need_backup(struct amdgpu_device *adev)
 {
@@ -54,6 +55,9 @@ static void amdgpu_ttm_bo_destroy(struct ttm_buffer_object *tbo)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
 	struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
 
+	if (bo->kfd_bo)
+		amdgpu_amdkfd_unreserve_system_memory_limit(bo);
+
 	amdgpu_bo_kunmap(bo);
 
 	drm_gem_object_release(&bo->gem_base);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 33615e2..ba5330a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -92,6 +92,8 @@ struct amdgpu_bo {
 		struct list_head	mn_list;
 		struct list_head	shadow_list;
 	};
+
+	struct kgd_mem                  *kfd_bo;
 };
 
 static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c3f33d3..76ee968 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -261,6 +261,13 @@ static int amdgpu_verify_access(struct ttm_buffer_object *bo, struct file *filp)
 {
 	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
 
+	/*
+	 * Don't verify access for KFD BOs. They don't have a GEM
+	 * object associated with them.
+	 */
+	if (abo->kfd_bo)
+		return 0;
+
 	if (amdgpu_ttm_tt_get_usermm(bo->ttm))
 		return -EPERM;
 	return drm_vma_node_verify_access(&abo->gem_base.vma_node,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 36c706a..5984fec 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -127,6 +127,25 @@ struct tile_config {
 	uint32_t num_ranks;
 };
 
+
+/*
+ * Allocation flag domains
+ */
+#define ALLOC_MEM_FLAGS_VRAM		(1 << 0)
+#define ALLOC_MEM_FLAGS_GTT		(1 << 1)
+#define ALLOC_MEM_FLAGS_USERPTR		(1 << 2) /* TODO */
+#define ALLOC_MEM_FLAGS_DOORBELL	(1 << 3) /* TODO */
+
+/*
+ * Allocation flags attributes/access options.
+ */
+#define ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
+#define ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
+#define ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
+#define ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28) /* TODO */
+#define ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
+#define ALLOC_MEM_FLAGS_COHERENT	(1 << 26) /* For GFXv9 or later */
+
 /**
  * struct kfd2kgd_calls
  *
@@ -186,6 +205,41 @@ struct tile_config {
  *
  * @get_vram_usage: Returns current VRAM usage
  *
+ * @create_process_vm: Create a VM address space for a given process and GPU
+ *
+ * @destroy_process_vm: Destroy a VM
+ *
+ * @get_process_page_dir: Get physical address of a VM page directory
+ *
+ * @set_vm_context_page_table_base: Program page table base for a VMID
+ *
+ * @alloc_memory_of_gpu: Allocate GPUVM memory
+ *
+ * @free_memory_of_gpu: Free GPUVM memory
+ *
+ * @map_memory_to_gpu: Map GPUVM memory into a specific VM address
+ * space. Allocates and updates page tables and page directories as
+ * needed. This function may return before all page table updates have
+ * completed. This allows multiple map operations (on multiple GPUs)
+ * to happen concurrently. Use sync_memory to synchronize with all
+ * pending updates.
+ *
+ * @unmap_memor_to_gpu: Unmap GPUVM memory from a specific VM address space
+ *
+ * @sync_memory: Wait for pending page table updates to complete
+ *
+ * @map_gtt_bo_to_kernel: Map a GTT BO for kernel access
+ * Pins the BO, maps it to kernel address space. Such BOs are never evicted.
+ * The kernel virtual address remains valid until the BO is freed.
+ *
+ * @restore_process_bos: Restore all BOs that belong to the
+ * process. This is intended for restoring memory mappings after a TTM
+ * eviction.
+ *
+ * @invalidate_tlbs: Invalidate TLBs for a specific PASID
+ *
+ * @invalidate_tlbs_vmid: Invalidate TLBs for a specific VMID
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -275,6 +329,29 @@ struct kfd2kgd_calls {
 	void (*get_cu_info)(struct kgd_dev *kgd,
 			struct kfd_cu_info *cu_info);
 	uint64_t (*get_vram_usage)(struct kgd_dev *kgd);
+
+	int (*create_process_vm)(struct kgd_dev *kgd, void **vm,
+			void **process_info, struct dma_fence **ef);
+	void (*destroy_process_vm)(struct kgd_dev *kgd, void *vm);
+	uint32_t (*get_process_page_dir)(void *vm);
+	void (*set_vm_context_page_table_base)(struct kgd_dev *kgd,
+			uint32_t vmid, uint32_t page_table_base);
+	int (*alloc_memory_of_gpu)(struct kgd_dev *kgd, uint64_t va,
+			uint64_t size, void *vm,
+			struct kgd_mem **mem, uint64_t *offset,
+			uint32_t flags);
+	int (*free_memory_of_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem);
+	int (*map_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void *vm);
+	int (*unmap_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void *vm);
+	int (*sync_memory)(struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
+	int (*map_gtt_bo_to_kernel)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void **kptr, uint64_t *size);
+	int (*restore_process_bos)(void *process_info, struct dma_fence **ef);
+
+	int (*invalidate_tlbs)(struct kgd_dev *kgd, uint16_t pasid);
+	int (*invalidate_tlbs_vmid)(struct kgd_dev *kgd, uint16_t vmid);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 10/25] drm/amdgpu: Add submit IB function for KFD
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional Felix Kuehling
                     ` (14 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This can be used for flushing caches when not using the HWS.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 55 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  8 ++++
 5 files changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 01fb142..010b558 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -359,6 +359,61 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
 }
 
+int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
+				uint32_t vmid, uint64_t gpu_addr,
+				uint32_t *ib_cmd, uint32_t ib_len)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
+	struct amdgpu_ring *ring;
+	struct dma_fence *f = NULL;
+	int ret;
+
+	switch (engine) {
+	case KGD_ENGINE_MEC1:
+		ring = &adev->gfx.compute_ring[0];
+		break;
+	case KGD_ENGINE_SDMA1:
+		ring = &adev->sdma.instance[0].ring;
+		break;
+	case KGD_ENGINE_SDMA2:
+		ring = &adev->sdma.instance[1].ring;
+		break;
+	default:
+		pr_err("Invalid engine in IB submission: %d\n", engine);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = amdgpu_job_alloc(adev, 1, &job, NULL);
+	if (ret)
+		goto err;
+
+	ib = &job->ibs[0];
+	memset(ib, 0, sizeof(struct amdgpu_ib));
+
+	ib->gpu_addr = gpu_addr;
+	ib->ptr = ib_cmd;
+	ib->length_dw = ib_len;
+	/* This works for NO_HWS. TODO: need to handle without knowing VMID */
+	job->vmid = vmid;
+
+	ret = amdgpu_ib_schedule(ring, 1, ib, job, &f);
+	if (ret) {
+		DRM_ERROR("amdgpu: failed to schedule IB.\n");
+		goto err_ib_sched;
+	}
+
+	ret = dma_fence_wait(f, false);
+
+err_ib_sched:
+	dma_fence_put(f);
+	amdgpu_job_free(job);
+err:
+	return ret;
+}
+
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
 {
 	if (adev->kfd) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 87fb4e6..25e9460 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -124,6 +124,10 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 
+int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
+				uint32_t vmid, uint64_t gpu_addr,
+				uint32_t *ib_cmd, uint32_t ib_len);
+
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 65783d1..7485c37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -217,6 +217,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
 	.invalidate_tlbs = invalidate_tlbs,
 	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
+	.submit_ib = amdgpu_amdkfd_submit_ib,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 1b5bf13..7be4534 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -177,6 +177,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
 	.invalidate_tlbs = invalidate_tlbs,
 	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
+	.submit_ib = amdgpu_amdkfd_submit_ib,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 5984fec..b7146e2 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -240,6 +240,10 @@ struct tile_config {
  *
  * @invalidate_tlbs_vmid: Invalidate TLBs for a specific VMID
  *
+ * @submit_ib: Submits an IB to the engine specified by inserting the
+ * IB to the corresonded ring (ring type). The IB is executed with the
+ * specified VMID in a user mode context.
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -352,6 +356,10 @@ struct kfd2kgd_calls {
 
 	int (*invalidate_tlbs)(struct kgd_dev *kgd, uint16_t pasid);
 	int (*invalidate_tlbs_vmid)(struct kgd_dev *kgd, uint16_t vmid);
+
+	int (*submit_ib)(struct kgd_dev *kgd, enum kgd_engine_type engine,
+			uint32_t vmid, uint64_t gpu_addr,
+			uint32_t *ib_cmd, uint32_t ib_len);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
                     ` (13 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
ASIC information. Also allow building KFD without IOMMUv2 support.
This is still useful for dGPUs and prepares for enabling KFD on
architectures that don't support AMD IOMMUv2.

v2:
* Centralize IOMMUv2 code to avoid #ifdefs in too many places

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/Kconfig        |   2 +-
 drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
 11 files changed, 493 insertions(+), 265 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
index bc5a294..5bbeb95 100644
--- a/drivers/gpu/drm/amd/amdkfd/Kconfig
+++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
@@ -4,6 +4,6 @@
 
 config HSA_AMD
 	tristate "HSA kernel driver for AMD GPU devices"
-	depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
+	depends on DRM_AMDGPU && X86_64
 	help
 	  Enable this if you want to use HSA features on AMD GPU devices.
diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index a317e76..0d02422 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -37,6 +37,10 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
 		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
+ifneq ($(CONFIG_AMD_IOMMU_V2),)
+amdkfd-y += kfd_iommu.o
+endif
+
 amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
 
 obj-$(CONFIG_HSA_AMD)	+= amdkfd.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 2bc2816..7493f47 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -22,10 +22,10 @@
 
 #include <linux/pci.h>
 #include <linux/acpi.h>
-#include <linux/amd-iommu.h>
 #include "kfd_crat.h"
 #include "kfd_priv.h"
 #include "kfd_topology.h"
+#include "kfd_iommu.h"
 
 /* GPU Processor ID base for dGPUs for which VCRAT needs to be created.
  * GPU processor ID are expressed with Bit[31]=1.
@@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	struct crat_subtype_generic *sub_type_hdr;
 	struct crat_subtype_computeunit *cu;
 	struct kfd_cu_info cu_info;
-	struct amd_iommu_device_info iommu_info;
 	int avail_size = *size;
 	uint32_t total_num_of_cu;
 	int num_of_cache_entries = 0;
 	int cache_mem_filled = 0;
 	int ret = 0;
-	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
-					 AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
-					 AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
 	struct kfd_local_mem_info local_mem_info;
 
 	if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
@@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	/* Check if this node supports IOMMU. During parsing this flag will
 	 * translate to HSA_CAP_ATS_PRESENT
 	 */
-	iommu_info.flags = 0;
-	if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
-		if ((iommu_info.flags & required_iommu_flags) ==
-				required_iommu_flags)
-			cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
-	}
+	if (!kfd_iommu_check_device(kdev))
+		cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
 
 	crat_table->length += sub_type_hdr->length;
 	crat_table->total_entries++;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 83d6f41..4ac2d61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -20,7 +20,9 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
 #include <linux/amd-iommu.h>
+#endif
 #include <linux/bsearch.h>
 #include <linux/pci.h>
 #include <linux/slab.h>
@@ -28,9 +30,11 @@
 #include "kfd_device_queue_manager.h"
 #include "kfd_pm4_headers_vi.h"
 #include "cwsr_trap_handler_gfx8.asm"
+#include "kfd_iommu.h"
 
 #define MQD_SIZE_ALIGNED 768
 
+#ifdef KFD_SUPPORT_IOMMU_V2
 static const struct kfd_device_info kaveri_device_info = {
 	.asic_family = CHIP_KAVERI,
 	.max_pasid_bits = 16,
@@ -41,6 +45,7 @@ static const struct kfd_device_info kaveri_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 };
 
@@ -54,8 +59,10 @@ static const struct kfd_device_info carrizo_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 };
+#endif
 
 static const struct kfd_device_info hawaii_device_info = {
 	.asic_family = CHIP_HAWAII,
@@ -67,6 +74,7 @@ static const struct kfd_device_info hawaii_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -79,6 +87,7 @@ static const struct kfd_device_info tonga_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -91,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -103,6 +113,7 @@ static const struct kfd_device_info fiji_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -115,6 +126,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -128,6 +140,7 @@ static const struct kfd_device_info polaris10_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -140,6 +153,7 @@ static const struct kfd_device_info polaris10_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -152,6 +166,7 @@ static const struct kfd_device_info polaris11_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -162,6 +177,7 @@ struct kfd_deviceid {
 };
 
 static const struct kfd_deviceid supported_devices[] = {
+#ifdef KFD_SUPPORT_IOMMU_V2
 	{ 0x1304, &kaveri_device_info },	/* Kaveri */
 	{ 0x1305, &kaveri_device_info },	/* Kaveri */
 	{ 0x1306, &kaveri_device_info },	/* Kaveri */
@@ -189,6 +205,7 @@ static const struct kfd_deviceid supported_devices[] = {
 	{ 0x9875, &carrizo_device_info },	/* Carrizo */
 	{ 0x9876, &carrizo_device_info },	/* Carrizo */
 	{ 0x9877, &carrizo_device_info },	/* Carrizo */
+#endif
 	{ 0x67A0, &hawaii_device_info },	/* Hawaii */
 	{ 0x67A1, &hawaii_device_info },	/* Hawaii */
 	{ 0x67A2, &hawaii_device_info },	/* Hawaii */
@@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 	return kfd;
 }
 
-static bool device_iommu_pasid_init(struct kfd_dev *kfd)
-{
-	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
-					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
-					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
-
-	struct amd_iommu_device_info iommu_info;
-	unsigned int pasid_limit;
-	int err;
-
-	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
-	if (err < 0) {
-		dev_err(kfd_device,
-			"error getting iommu info. is the iommu enabled?\n");
-		return false;
-	}
-
-	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
-		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
-									!= 0);
-		return false;
-	}
-
-	pasid_limit = min_t(unsigned int,
-			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
-			iommu_info.max_pasids);
-
-	if (!kfd_set_pasid_limit(pasid_limit)) {
-		dev_err(kfd_device, "error setting pasid limit\n");
-		return false;
-	}
-
-	return true;
-}
-
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
-{
-	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
-
-	if (dev)
-		kfd_process_iommu_unbind_callback(dev, pasid);
-}
-
-/*
- * This function called by IOMMU driver on PPR failure
- */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
-		unsigned long address, u16 flags)
-{
-	struct kfd_dev *dev;
-
-	dev_warn(kfd_device,
-			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
-			PCI_BUS_NUM(pdev->devfn),
-			PCI_SLOT(pdev->devfn),
-			PCI_FUNC(pdev->devfn),
-			pasid,
-			address,
-			flags);
-
-	dev = kfd_device_by_pci_dev(pdev);
-	if (!WARN_ON(!dev))
-		kfd_signal_iommu_event(dev, pasid, address,
-			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
-
-	return AMD_IOMMU_INV_PRI_RSP_INVALID;
-}
-
 static void kfd_cwsr_init(struct kfd_dev *kfd)
 {
 	if (cwsr_enable && kfd->device_info->supports_cwsr) {
@@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 		goto device_queue_manager_error;
 	}
 
-	if (!device_iommu_pasid_init(kfd)) {
-		dev_err(kfd_device,
-			"Error initializing iommuv2 for device %x:%x\n",
-			kfd->pdev->vendor, kfd->pdev->device);
-		goto device_iommu_pasid_error;
+	if (kfd_iommu_device_init(kfd)) {
+		dev_err(kfd_device, "Error initializing iommuv2\n");
+		goto device_iommu_error;
 	}
 
 	kfd_cwsr_init(kfd);
@@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 	goto out;
 
 kfd_resume_error:
-device_iommu_pasid_error:
+device_iommu_error:
 	device_queue_manager_uninit(kfd->dqm);
 device_queue_manager_error:
 	kfd_interrupt_exit(kfd);
@@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 
 	kfd->dqm->ops.stop(kfd->dqm);
 
-	kfd_unbind_processes_from_device(kfd);
-
-	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
-	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
-	amd_iommu_free_device(kfd->pdev);
+	kfd_iommu_suspend(kfd);
 }
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
@@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
 static int kfd_resume(struct kfd_dev *kfd)
 {
 	int err = 0;
-	unsigned int pasid_limit = kfd_get_pasid_limit();
-
-	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
-	if (err)
-		return -ENXIO;
-	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
-					iommu_pasid_shutdown_callback);
-	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
-				     iommu_invalid_ppr_cb);
 
-	err = kfd_bind_processes_to_device(kfd);
-	if (err)
-		goto processes_bind_error;
+	err = kfd_iommu_resume(kfd);
+	if (err) {
+		dev_err(kfd_device,
+			"Failed to resume IOMMU for device %x:%x\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		return err;
+	}
 
 	err = kfd->dqm->ops.start(kfd->dqm);
 	if (err) {
@@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
 	return err;
 
 dqm_start_error:
-processes_bind_error:
-	amd_iommu_free_device(kfd->pdev);
-
+	kfd_iommu_suspend(kfd);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 93aae5c..6fb9c0d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -30,6 +30,7 @@
 #include <linux/memory.h>
 #include "kfd_priv.h"
 #include "kfd_events.h"
+#include "kfd_iommu.h"
 #include <linux/device.h>
 
 /*
@@ -837,6 +838,7 @@ static void lookup_events_by_type_and_signal(struct kfd_process *p,
 	}
 }
 
+#ifdef KFD_SUPPORT_IOMMU_V2
 void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 		unsigned long address, bool is_write_requested,
 		bool is_execute_requested)
@@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 	mutex_unlock(&p->event_mutex);
 	kfd_unref_process(p);
 }
+#endif /* KFD_SUPPORT_IOMMU_V2 */
 
 void kfd_signal_hw_exception_event(unsigned int pasid)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
new file mode 100644
index 0000000..81dee34
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -0,0 +1,356 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/printk.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <linux/amd-iommu.h>
+#include "kfd_priv.h"
+#include "kfd_dbgmgr.h"
+#include "kfd_topology.h"
+#include "kfd_iommu.h"
+
+static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+
+/** kfd_iommu_check_device - Check whether IOMMU is available for device
+ */
+int kfd_iommu_check_device(struct kfd_dev *kfd)
+{
+	struct amd_iommu_device_info iommu_info;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return -ENODEV;
+
+	iommu_info.flags = 0;
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err)
+		return err;
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
+		return -ENODEV;
+
+	return 0;
+}
+
+/** kfd_iommu_device_init - Initialize IOMMU for device
+ */
+int kfd_iommu_device_init(struct kfd_dev *kfd)
+{
+	struct amd_iommu_device_info iommu_info;
+	unsigned int pasid_limit;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return 0;
+
+	iommu_info.flags = 0;
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err < 0) {
+		dev_err(kfd_device,
+			"error getting iommu info. is the iommu enabled?\n");
+		return -ENODEV;
+	}
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
+		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
+									!= 0);
+		return -ENODEV;
+	}
+
+	pasid_limit = min_t(unsigned int,
+			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
+			iommu_info.max_pasids);
+
+	if (!kfd_set_pasid_limit(pasid_limit)) {
+		dev_err(kfd_device, "error setting pasid limit\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
+ *
+ * Binds the given process to the given device using its PASID. This
+ * enables IOMMUv2 address translation for the process on the device.
+ *
+ * This function assumes that the process mutex is held.
+ */
+int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+	struct kfd_process *p = pdd->process;
+	int err;
+
+	if (!dev->device_info->needs_iommu_device || pdd->bound == PDD_BOUND)
+		return 0;
+
+	if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
+		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
+		return -EINVAL;
+	}
+
+	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
+	if (!err)
+		pdd->bound = PDD_BOUND;
+
+	return err;
+}
+
+/** kfd_iommu_unbind_process - Unbind process from all devices
+ *
+ * This removes all IOMMU device bindings of the process. To be used
+ * before process termination.
+ */
+void kfd_iommu_unbind_process(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		if (pdd->bound == PDD_BOUND)
+			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
+}
+
+/* Callback for process shutdown invoked by the IOMMU driver */
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+{
+	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+
+	if (!dev)
+		return;
+
+	/*
+	 * Look for the process that matches the pasid. If there is no such
+	 * process, we either released it in amdkfd's own notifier, or there
+	 * is a bug. Unfortunately, there is no way to tell...
+	 */
+	p = kfd_lookup_process_by_pasid(pasid);
+	if (!p)
+		return;
+
+	pr_debug("Unbinding process %d from IOMMU\n", pasid);
+
+	mutex_lock(kfd_get_dbgmgr_mutex());
+
+	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
+		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
+			kfd_dbgmgr_destroy(dev->dbgmgr);
+			dev->dbgmgr = NULL;
+		}
+	}
+
+	mutex_unlock(kfd_get_dbgmgr_mutex());
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (pdd)
+		/* For GPU relying on IOMMU, we need to dequeue here
+		 * when PASID is still bound.
+		 */
+		kfd_process_dequeue_from_device(pdd);
+
+	mutex_unlock(&p->mutex);
+
+	kfd_unref_process(p);
+}
+
+/* This function called by IOMMU driver on PPR failure */
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
+		unsigned long address, u16 flags)
+{
+	struct kfd_dev *dev;
+
+	dev_warn(kfd_device,
+			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
+			PCI_BUS_NUM(pdev->devfn),
+			PCI_SLOT(pdev->devfn),
+			PCI_FUNC(pdev->devfn),
+			pasid,
+			address,
+			flags);
+
+	dev = kfd_device_by_pci_dev(pdev);
+	if (!WARN_ON(!dev))
+		kfd_signal_iommu_event(dev, pasid, address,
+			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
+
+	return AMD_IOMMU_INV_PRI_RSP_INVALID;
+}
+
+/*
+ * Bind processes do the device that have been temporarily unbound
+ * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
+ */
+static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
+{
+	struct kfd_process_device *pdd;
+	struct kfd_process *p;
+	unsigned int temp;
+	int err = 0;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		mutex_lock(&p->mutex);
+		pdd = kfd_get_process_device_data(kfd, p);
+
+		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
+			mutex_unlock(&p->mutex);
+			continue;
+		}
+
+		err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
+				p->lead_thread);
+		if (err < 0) {
+			pr_err("Unexpected pasid %d binding failure\n",
+					p->pasid);
+			mutex_unlock(&p->mutex);
+			break;
+		}
+
+		pdd->bound = PDD_BOUND;
+		mutex_unlock(&p->mutex);
+	}
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return err;
+}
+
+/*
+ * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
+ * processes will be restored to PDD_BOUND state in
+ * kfd_bind_processes_to_device.
+ */
+static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
+{
+	struct kfd_process_device *pdd;
+	struct kfd_process *p;
+	unsigned int temp;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		mutex_lock(&p->mutex);
+		pdd = kfd_get_process_device_data(kfd, p);
+
+		if (WARN_ON(!pdd)) {
+			mutex_unlock(&p->mutex);
+			continue;
+		}
+
+		if (pdd->bound == PDD_BOUND)
+			pdd->bound = PDD_BOUND_SUSPENDED;
+		mutex_unlock(&p->mutex);
+	}
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+}
+
+/** kfd_iommu_suspend - Prepare IOMMU for suspend
+ *
+ * This unbinds processes from the device and disables the IOMMU for
+ * the device.
+ */
+void kfd_iommu_suspend(struct kfd_dev *kfd)
+{
+	if (!kfd->device_info->needs_iommu_device)
+		return;
+
+	kfd_unbind_processes_from_device(kfd);
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
+	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
+	amd_iommu_free_device(kfd->pdev);
+}
+
+/** kfd_iommu_resume - Restore IOMMU after resume
+ *
+ * This reinitializes the IOMMU for the device and re-binds previously
+ * suspended processes to the device.
+ */
+int kfd_iommu_resume(struct kfd_dev *kfd)
+{
+	unsigned int pasid_limit;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return 0;
+
+	pasid_limit = kfd_get_pasid_limit();
+
+	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+	if (err)
+		return -ENXIO;
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
+					iommu_pasid_shutdown_callback);
+	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
+				     iommu_invalid_ppr_cb);
+
+	err = kfd_bind_processes_to_device(kfd);
+	if (err) {
+		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
+		amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
+		amd_iommu_free_device(kfd->pdev);
+		return err;
+	}
+
+	return 0;
+}
+
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
+/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to topology
+ */
+int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
+{
+	struct kfd_perf_properties *props;
+
+	if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
+		return 0;
+
+	if (!amd_iommu_pc_supported())
+		return 0;
+
+	props = kfd_alloc_struct(props);
+	if (!props)
+		return -ENOMEM;
+	strcpy(props->block_name, "iommu");
+	props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
+		amd_iommu_pc_get_max_counters(0); /* assume one iommu */
+	list_add_tail(&props->list, &kdev->perf_props);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
new file mode 100644
index 0000000..dd23d9f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __KFD_IOMMU_H__
+#define __KFD_IOMMU_H__
+
+#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
+
+#define KFD_SUPPORT_IOMMU_V2
+
+int kfd_iommu_check_device(struct kfd_dev *kfd);
+int kfd_iommu_device_init(struct kfd_dev *kfd);
+
+int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
+void kfd_iommu_unbind_process(struct kfd_process *p);
+
+void kfd_iommu_suspend(struct kfd_dev *kfd);
+int kfd_iommu_resume(struct kfd_dev *kfd);
+
+int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
+
+#else
+
+static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
+{
+	return -ENODEV;
+}
+static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
+{
+	return 0;
+}
+
+static inline int kfd_iommu_bind_process_to_device(
+	struct kfd_process_device *pdd)
+{
+	return 0;
+}
+static inline void kfd_iommu_unbind_process(struct kfd_process *p)
+{
+	/* empty */
+}
+
+static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
+{
+	/* empty */
+}
+static inline int kfd_iommu_resume(struct kfd_dev *kfd)
+{
+	return 0;
+}
+
+static inline int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
+{
+	return 0;
+}
+
+#endif /* defined(CONFIG_AMD_IOMMU_V2) */
+
+#endif /* __KFD_IOMMU_H__ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 594f853..f12eb5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -158,6 +158,7 @@ struct kfd_device_info {
 	uint8_t num_of_watch_points;
 	uint16_t mqd_size_aligned;
 	bool supports_cwsr;
+	bool needs_iommu_device;
 	bool needs_pci_atomics;
 };
 
@@ -517,15 +518,15 @@ struct kfd_process_device {
 	uint64_t scratch_base;
 	uint64_t scratch_limit;
 
-	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
-	enum kfd_pdd_bound bound;
-
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
 	 * function.
 	 */
 	bool already_dequeued;
+
+	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
+	enum kfd_pdd_bound bound;
 };
 
 #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
@@ -590,6 +591,10 @@ struct kfd_process {
 	bool signal_event_limit_reached;
 };
 
+#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
+extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
+extern struct srcu_struct kfd_processes_srcu;
+
 /**
  * Ioctl function type.
  *
@@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
 
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
-int kfd_bind_processes_to_device(struct kfd_dev *dev);
-void kfd_unbind_processes_from_device(struct kfd_dev *dev);
-void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
 struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 4ff5f0f..e9aee76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -35,16 +35,16 @@ struct mm_struct;
 
 #include "kfd_priv.h"
 #include "kfd_dbgmgr.h"
+#include "kfd_iommu.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
  * Unique/indexed by mm_struct*
  */
-#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
-static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
+DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
 static DEFINE_MUTEX(kfd_processes_mutex);
 
-DEFINE_STATIC_SRCU(kfd_processes_srcu);
+DEFINE_SRCU(kfd_processes_srcu);
 
 static struct workqueue_struct *kfd_process_wq;
 
@@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct work_struct *work)
 {
 	struct kfd_process *p = container_of(work, struct kfd_process,
 					     release_work);
-	struct kfd_process_device *pdd;
 
-	pr_debug("Releasing process (pasid %d) in workqueue\n", p->pasid);
-
-	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
-		if (pdd->bound == PDD_BOUND)
-			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
-	}
+	kfd_iommu_unbind_process(p);
 
 	kfd_process_destroy_pdds(p);
 
@@ -429,133 +423,13 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	if (pdd->bound == PDD_BOUND) {
-		return pdd;
-	} else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
-		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
-	if (err < 0)
+	err = kfd_iommu_bind_process_to_device(pdd);
+	if (err)
 		return ERR_PTR(err);
 
-	pdd->bound = PDD_BOUND;
-
 	return pdd;
 }
 
-/*
- * Bind processes do the device that have been temporarily unbound
- * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
- */
-int kfd_bind_processes_to_device(struct kfd_dev *dev)
-{
-	struct kfd_process_device *pdd;
-	struct kfd_process *p;
-	unsigned int temp;
-	int err = 0;
-
-	int idx = srcu_read_lock(&kfd_processes_srcu);
-
-	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
-		mutex_lock(&p->mutex);
-		pdd = kfd_get_process_device_data(dev, p);
-
-		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
-			mutex_unlock(&p->mutex);
-			continue;
-		}
-
-		err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
-				p->lead_thread);
-		if (err < 0) {
-			pr_err("Unexpected pasid %d binding failure\n",
-					p->pasid);
-			mutex_unlock(&p->mutex);
-			break;
-		}
-
-		pdd->bound = PDD_BOUND;
-		mutex_unlock(&p->mutex);
-	}
-
-	srcu_read_unlock(&kfd_processes_srcu, idx);
-
-	return err;
-}
-
-/*
- * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
- * processes will be restored to PDD_BOUND state in
- * kfd_bind_processes_to_device.
- */
-void kfd_unbind_processes_from_device(struct kfd_dev *dev)
-{
-	struct kfd_process_device *pdd;
-	struct kfd_process *p;
-	unsigned int temp;
-
-	int idx = srcu_read_lock(&kfd_processes_srcu);
-
-	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
-		mutex_lock(&p->mutex);
-		pdd = kfd_get_process_device_data(dev, p);
-
-		if (WARN_ON(!pdd)) {
-			mutex_unlock(&p->mutex);
-			continue;
-		}
-
-		if (pdd->bound == PDD_BOUND)
-			pdd->bound = PDD_BOUND_SUSPENDED;
-		mutex_unlock(&p->mutex);
-	}
-
-	srcu_read_unlock(&kfd_processes_srcu, idx);
-}
-
-void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid)
-{
-	struct kfd_process *p;
-	struct kfd_process_device *pdd;
-
-	/*
-	 * Look for the process that matches the pasid. If there is no such
-	 * process, we either released it in amdkfd's own notifier, or there
-	 * is a bug. Unfortunately, there is no way to tell...
-	 */
-	p = kfd_lookup_process_by_pasid(pasid);
-	if (!p)
-		return;
-
-	pr_debug("Unbinding process %d from IOMMU\n", pasid);
-
-	mutex_lock(kfd_get_dbgmgr_mutex());
-
-	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
-		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
-			kfd_dbgmgr_destroy(dev->dbgmgr);
-			dev->dbgmgr = NULL;
-		}
-	}
-
-	mutex_unlock(kfd_get_dbgmgr_mutex());
-
-	mutex_lock(&p->mutex);
-
-	pdd = kfd_get_process_device_data(dev, p);
-	if (pdd)
-		/* For GPU relying on IOMMU, we need to dequeue here
-		 * when PASID is still bound.
-		 */
-		kfd_process_dequeue_from_device(pdd);
-
-	mutex_unlock(&p->mutex);
-
-	kfd_unref_process(p);
-}
-
 struct kfd_process_device *kfd_get_first_process_device_data(
 						struct kfd_process *p)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 7783250..2506155 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -35,6 +35,7 @@
 #include "kfd_crat.h"
 #include "kfd_topology.h"
 #include "kfd_device_queue_manager.h"
+#include "kfd_iommu.h"
 
 /* topology_device_list - Master list of all topology devices */
 static struct list_head topology_device_list;
@@ -875,19 +876,8 @@ static void find_system_memory(const struct dmi_header *dm,
  */
 static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
 {
-	struct kfd_perf_properties *props;
-
-	if (amd_iommu_pc_supported()) {
-		props = kfd_alloc_struct(props);
-		if (!props)
-			return -ENOMEM;
-		strcpy(props->block_name, "iommu");
-		props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
-			amd_iommu_pc_get_max_counters(0); /* assume one iommu */
-		list_add_tail(&props->list, &kdev->perf_props);
-	}
-
-	return 0;
+	/* These are the only counters supported so far */
+	return kfd_iommu_add_perf_counters(kdev);
 }
 
 /* kfd_add_non_crat_information - Add information that is not currently
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 53fca1f..c0be2be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -25,7 +25,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
-#include "kfd_priv.h"
+#include "kfd_crat.h"
 
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
 
@@ -183,8 +183,4 @@ struct kfd_topology_device *kfd_create_topology_device(
 		struct list_head *device_list);
 void kfd_release_topology_device_list(struct list_head *device_list);
 
-extern bool amd_iommu_pc_supported(void);
-extern u8 amd_iommu_pc_get_max_banks(u16 devid);
-extern u8 amd_iommu_pc_get_max_counters(u16 devid);
-
 #endif /* __KFD_TOPOLOGY_H__ */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 12/25] drm/amdkfd: Use per-device sched_policy
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-13-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
                     ` (12 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This was missed in a previous commit that made the scheduler policy
a per-device setting.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index dca6257..47d493e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1296,7 +1296,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.process_termination = process_termination_nocpsch;
 		break;
 	default:
-		pr_err("Invalid scheduling policy %d\n", sched_policy);
+		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
 		goto out_free;
 	}
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 13/25] drm/amdkfd: Remove unaligned memory access
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-14-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
                     ` (11 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Unaligned atomic operations can cause problems on some CPU
architectures. Use simpler bitmask operations instead. Atomic bit
manipulations are not necessary since dqm->lock is held during these
operations.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 25 ++++++++--------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 47d493e..1a28dc2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -118,9 +118,8 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	if (dqm->vmid_bitmap == 0)
 		return -ENOMEM;
 
-	bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap,
-				dqm->dev->vm_info.vmid_num_kfd);
-	clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+	bit = ffs(dqm->vmid_bitmap) - 1;
+	dqm->vmid_bitmap &= ~(1 << bit);
 
 	allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
 	pr_debug("vmid allocation %d\n", allocated_vmid);
@@ -142,7 +141,7 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 	/* Release the vmid mapping */
 	set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
 
-	set_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+	dqm->vmid_bitmap |= (1 << bit);
 	qpd->vmid = 0;
 	q->properties.vmid = 0;
 }
@@ -223,12 +222,8 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
 			continue;
 
 		if (dqm->allocated_queues[pipe] != 0) {
-			bit = find_first_bit(
-				(unsigned long *)&dqm->allocated_queues[pipe],
-				get_queues_per_pipe(dqm));
-
-			clear_bit(bit,
-				(unsigned long *)&dqm->allocated_queues[pipe]);
+			bit = ffs(dqm->allocated_queues[pipe]) - 1;
+			dqm->allocated_queues[pipe] &= ~(1 << bit);
 			q->pipe = pipe;
 			q->queue = bit;
 			set = true;
@@ -249,7 +244,7 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
 static inline void deallocate_hqd(struct device_queue_manager *dqm,
 				struct queue *q)
 {
-	set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
+	dqm->allocated_queues[q->pipe] |= (1 << q->queue);
 }
 
 static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
@@ -589,10 +584,8 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 	if (dqm->sdma_bitmap == 0)
 		return -ENOMEM;
 
-	bit = find_first_bit((unsigned long *)&dqm->sdma_bitmap,
-				CIK_SDMA_QUEUES);
-
-	clear_bit(bit, (unsigned long *)&dqm->sdma_bitmap);
+	bit = ffs(dqm->sdma_bitmap) - 1;
+	dqm->sdma_bitmap &= ~(1 << bit);
 	*sdma_queue_id = bit;
 
 	return 0;
@@ -603,7 +596,7 @@ static void deallocate_sdma_queue(struct device_queue_manager *dqm,
 {
 	if (sdma_queue_id >= CIK_SDMA_QUEUES)
 		return;
-	set_bit(sdma_queue_id, (unsigned long *)&dqm->sdma_bitmap);
+	dqm->sdma_bitmap |= (1 << sdma_queue_id);
 }
 
 static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-15-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
                     ` (10 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, Felix Kuehling

From: Oak Zeng <Oak.Zeng@amd.com>

Populate DRM render device minor in kfd topology

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 2506155..ac28abc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 			dev->node_props.device_id);
 	sysfs_show_32bit_prop(buffer, "location_id",
 			dev->node_props.location_id);
+	sysfs_show_32bit_prop(buffer, "drm_render_minor",
+			dev->node_props.drm_render_minor);
 
 	if (dev->gpu) {
 		log_max_watch_addr =
@@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
 	dev->node_props.max_engine_clk_ccompute =
 		cpufreq_quick_get_max(0) / 1000;
+	dev->node_props.drm_render_minor =
+		gpu->shared_resources.drm_render_minor;
 
 	kfd_fill_mem_clk_max_info(dev);
 	kfd_fill_iolink_non_crat_info(dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index c0be2be..eb54cfc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -71,6 +71,7 @@ struct kfd_node_properties {
 	uint32_t location_id;
 	uint32_t max_engine_clk_fcompute;
 	uint32_t max_engine_clk_ccompute;
+	int32_t  drm_render_minor;
 	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-16-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
                     ` (9 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Create/destroy the GPUVM context during PDD creation/destruction.
Get VM page table base and program it during process registration
(HWS) or VMID allocation (non-HWS).

v2:
* Used dev instead of pdd->dev in kfd_flush_tlb

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 20 +++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              | 13 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 33 ++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 1a28dc2..b7d0639 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -129,6 +129,15 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
 	program_sh_mem_settings(dqm, qpd);
 
+	/* qpd->page_table_base is set earlier when register_process()
+	 * is called, i.e. when the first queue is created.
+	 */
+	dqm->dev->kfd2kgd->set_vm_context_page_table_base(dqm->dev->kgd,
+			qpd->vmid,
+			qpd->page_table_base);
+	/* invalidate the VM context after pasid and vmid mapping is set up */
+	kfd_flush_tlb(qpd_to_pdd(qpd));
+
 	return 0;
 }
 
@@ -138,6 +147,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 {
 	int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
 
+	kfd_flush_tlb(qpd_to_pdd(qpd));
+
 	/* Release the vmid mapping */
 	set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
 
@@ -450,6 +461,8 @@ static int register_process(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
 	struct device_process_node *n;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
 	int retval;
 
 	n = kzalloc(sizeof(*n), GFP_KERNEL);
@@ -458,9 +471,16 @@ static int register_process(struct device_queue_manager *dqm,
 
 	n->qpd = qpd;
 
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
 	mutex_lock(&dqm->lock);
 	list_add(&n->list, &dqm->queues);
 
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+
 	retval = dqm->asic_ops.update_qpd(dqm, qpd);
 
 	dqm->processes_count++;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index f12eb5d..56c2e36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -518,6 +518,9 @@ struct kfd_process_device {
 	uint64_t scratch_base;
 	uint64_t scratch_limit;
 
+	/* VM context for GPUVM allocations */
+	void *vm;
+
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
@@ -589,6 +592,14 @@ struct kfd_process {
 	size_t signal_mapped_size;
 	size_t signal_event_count;
 	bool signal_event_limit_reached;
+
+	/* Information used for memory eviction */
+	void *kgd_process_info;
+	/* Eviction fence that is attached to all the BOs of this process. The
+	 * fence will be triggered during eviction and new one will be created
+	 * during restore
+	 */
+	struct dma_fence *ef;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
@@ -802,6 +813,8 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint64_t *event_page_offset, uint32_t *event_slot_index);
 int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
 
+void kfd_flush_tlb(struct kfd_process_device *pdd);
+
 int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process *p);
 
 /* Debugfs */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index e9aee76..cf4fa25 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -34,6 +34,7 @@
 struct mm_struct;
 
 #include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
 #include "kfd_dbgmgr.h"
 #include "kfd_iommu.h"
 
@@ -154,6 +155,10 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 		pr_debug("Releasing pdd (topology id %d) for process (pasid %d)\n",
 				pdd->dev->id, p->pasid);
 
+		if (pdd->vm)
+			pdd->dev->kfd2kgd->destroy_process_vm(
+				pdd->dev->kgd, pdd->vm);
+
 		list_del(&pdd->per_device_list);
 
 		if (pdd->qpd.cwsr_kaddr)
@@ -177,6 +182,7 @@ static void kfd_process_wq_release(struct work_struct *work)
 	kfd_iommu_unbind_process(p);
 
 	kfd_process_destroy_pdds(p);
+	dma_fence_put(p->ef);
 
 	kfd_event_free_process(p);
 
@@ -401,7 +407,18 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
+	/* Create the GPUVM context for this specific device */
+	if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
+					    &p->kgd_process_info, &p->ef)) {
+		pr_err("Failed to create process VM object\n");
+		goto err_create_pdd;
+	}
 	return pdd;
+
+err_create_pdd:
+	list_del(&pdd->per_device_list);
+	kfree(pdd);
+	return NULL;
 }
 
 /*
@@ -507,6 +524,22 @@ int kfd_reserved_mem_mmap(struct kfd_process *process,
 			       KFD_CWSR_TBA_TMA_SIZE, vma->vm_page_prot);
 }
 
+void kfd_flush_tlb(struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+	const struct kfd2kgd_calls *f2g = dev->kfd2kgd;
+
+	if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+		/* Nothing to flush until a VMID is assigned, which
+		 * only happens when the first queue is created.
+		 */
+		if (pdd->qpd.vmid)
+			f2g->invalidate_tlbs_vmid(dev->kgd, pdd->qpd.vmid);
+	} else {
+		f2g->invalidate_tlbs(dev->kgd, pdd->process->pasid);
+	}
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
                     ` (8 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

When the TTM memory manager in KGD evicts BOs, all user mode queues
potentially accessing these BOs must be evicted temporarily. Once
user mode queues are evicted, the eviction fence is signaled,
allowing the migration of the BO to proceed.

A delayed worker is scheduled to restore all the BOs belonging to
the evicted process and restart its queues.

During suspend/resume of the GPU we also evict all processes to allow
KGD to save BOs in system memory, since VRAM will be lost.

v2:
* Account for eviction when updating of q->is_active in MQD manager

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  65 +++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 219 ++++++++++++++++++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   9 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   6 +-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  32 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 213 ++++++++++++++++++++
 8 files changed, 547 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 4ac2d61..334669996 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -33,6 +33,7 @@
 #include "kfd_iommu.h"
 
 #define MQD_SIZE_ALIGNED 768
+static atomic_t kfd_device_suspended = ATOMIC_INIT(0);
 
 #ifdef KFD_SUPPORT_IOMMU_V2
 static const struct kfd_device_info kaveri_device_info = {
@@ -469,6 +470,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 	if (!kfd->init_complete)
 		return;
 
+	/* For first KFD device suspend all the KFD processes */
+	if (atomic_inc_return(&kfd_device_suspended) == 1)
+		kfd_suspend_all_processes();
+
 	kfd->dqm->ops.stop(kfd->dqm);
 
 	kfd_iommu_suspend(kfd);
@@ -476,11 +481,21 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
 {
+	int ret, count;
+
 	if (!kfd->init_complete)
 		return 0;
 
-	return kfd_resume(kfd);
+	ret = kfd_resume(kfd);
+	if (ret)
+		return ret;
+
+	count = atomic_dec_return(&kfd_device_suspended);
+	WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
+	if (count == 0)
+		ret = kfd_resume_all_processes();
 
+	return ret;
 }
 
 static int kfd_resume(struct kfd_dev *kfd)
@@ -526,6 +541,54 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	spin_unlock(&kfd->interrupt_lock);
 }
 
+/** kgd2kfd_schedule_evict_and_restore_process - Schedules work queue that will
+ *   prepare for safe eviction of KFD BOs that belong to the specified
+ *   process.
+ *
+ * @mm: mm_struct that identifies the specified KFD process
+ * @fence: eviction fence attached to KFD process BOs
+ *
+ */
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence)
+{
+	struct kfd_process *p;
+	unsigned long active_time;
+	unsigned long delay_jiffies = msecs_to_jiffies(PROCESS_ACTIVE_TIME_MS);
+
+	if (!fence)
+		return -EINVAL;
+
+	if (dma_fence_is_signaled(fence))
+		return 0;
+
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return -ENODEV;
+
+	if (fence->seqno == p->last_eviction_seqno)
+		goto out;
+
+	p->last_eviction_seqno = fence->seqno;
+
+	/* Avoid KFD process starvation. Wait for at least
+	 * PROCESS_ACTIVE_TIME_MS before evicting the process again
+	 */
+	active_time = get_jiffies_64() - p->last_restore_timestamp;
+	if (delay_jiffies > active_time)
+		delay_jiffies -= active_time;
+	else
+		delay_jiffies = 0;
+
+	/* During process initialization eviction_work.dwork is initialized
+	 * to kfd_evict_bo_worker
+	 */
+	schedule_delayed_work(&p->eviction_work, delay_jiffies);
+out:
+	kfd_unref_process(p);
+	return 0;
+}
+
 static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
 				unsigned int chunk_size)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b7d0639..b3b6dab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -21,10 +21,11 @@
  *
  */
 
+#include <linux/ratelimit.h>
+#include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/types.h>
-#include <linux/printk.h>
 #include <linux/bitops.h>
 #include <linux/sched.h>
 #include "kfd_priv.h"
@@ -180,6 +181,14 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 			goto out_unlock;
 	}
 	q->properties.vmid = qpd->vmid;
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	q->properties.tba_addr = qpd->tba_addr;
 	q->properties.tma_addr = qpd->tma_addr;
@@ -377,15 +386,29 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 {
 	int retval;
 	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
 	bool prev_active = false;
 
 	mutex_lock(&dqm->lock);
+	pdd = kfd_get_process_device_data(q->device, q->process);
+	if (!pdd) {
+		retval = -ENODEV;
+		goto out_unlock;
+	}
 	mqd = dqm->ops.get_mqd_manager(dqm,
 			get_mqd_type_from_queue_type(q->properties.type));
 	if (!mqd) {
 		retval = -ENOMEM;
 		goto out_unlock;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (pdd->qpd.evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	/* Save previous activity state for counters */
 	prev_active = q->properties.is_active;
@@ -457,6 +480,187 @@ static struct mqd_manager *get_mqd_manager(
 	return mqd;
 }
 
+static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot evict queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		retval = mqd->destroy_mqd(mqd, q->mqd,
+				KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,
+				KFD_UNMAP_LATENCY_MS, q->pipe, q->queue);
+		if (retval)
+			goto out;
+		dqm->queue_count--;
+	}
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		dqm->queue_count--;
+	}
+	retval = execute_queues_cpsch(dqm,
+				qpd->is_debug ?
+				KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
+					  struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	if (!list_empty(&qpd->queues_list)) {
+		dqm->dev->kfd2kgd->set_vm_context_page_table_base(
+				dqm->dev->kgd,
+				qpd->vmid,
+				qpd->page_table_base);
+		kfd_flush_tlb(pdd);
+	}
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot restore queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		retval = mqd->load_mqd(mqd, q->mqd, q->pipe,
+				       q->queue, &q->properties,
+				       q->process->mm);
+		if (retval)
+			goto out;
+		dqm->queue_count++;
+	}
+	qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_cpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		dqm->queue_count++;
+	}
+	retval = execute_queues_cpsch(dqm,
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+	if (!retval)
+		qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
 static int register_process(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
@@ -853,6 +1057,14 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		retval = -ENOMEM;
 		goto out;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
 
@@ -1291,6 +1503,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_cpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_cpsch;
+		dqm->ops.restore_process_queues = restore_process_queues_cpsch;
 		break;
 	case KFD_SCHED_POLICY_NO_HWS:
 		/* initialize dqm for no cp scheduling */
@@ -1307,6 +1521,9 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_nocpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_nocpsch;
+		dqm->ops.restore_process_queues =
+			restore_process_queues_nocpsch;
 		break;
 	default:
 		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 68be0aa..412beff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -79,6 +79,10 @@ struct device_process_node {
  *
  * @process_termination: Clears all process queues belongs to that device.
  *
+ * @evict_process_queues: Evict all active queues of a process
+ *
+ * @restore_process_queues: Restore all evicted queues queues of a process
+ *
  */
 
 struct device_queue_manager_ops {
@@ -129,6 +133,11 @@ struct device_queue_manager_ops {
 
 	int (*process_termination)(struct device_queue_manager *dqm,
 			struct qcm_process_device *qpd);
+
+	int (*evict_process_queues)(struct device_queue_manager *dqm,
+				    struct qcm_process_device *qpd);
+	int (*restore_process_queues)(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd);
 };
 
 struct device_queue_manager_asic_ops {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 3ac72be..65574c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -43,6 +43,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.interrupt	= kgd2kfd_interrupt,
 	.suspend	= kgd2kfd_suspend,
 	.resume		= kgd2kfd_resume,
+	.schedule_evict_and_restore_process =
+			  kgd2kfd_schedule_evict_and_restore_process,
 };
 
 int sched_policy = KFD_SCHED_POLICY_HWS;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index fbe3f83..c00c325 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -202,7 +202,8 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd,
 
 	q->is_active = (q->queue_size > 0 &&
 			q->queue_address != 0 &&
-			q->queue_percent > 0);
+			q->queue_percent > 0 &&
+			!q->is_evicted);
 
 	return 0;
 }
@@ -245,7 +246,8 @@ static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
 
 	q->is_active = (q->queue_size > 0 &&
 			q->queue_address != 0 &&
-			q->queue_percent > 0);
+			q->queue_percent > 0 &&
+			!q->is_evicted);
 
 	return 0;
 }
@@ -377,7 +379,8 @@ static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
 
 	q->is_active = (q->queue_size > 0 &&
 			q->queue_address != 0 &&
-			q->queue_percent > 0);
+			q->queue_percent > 0 &&
+			!q->is_evicted);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 58221c1..89e4242 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -198,7 +198,8 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd,
 
 	q->is_active = (q->queue_size > 0 &&
 			q->queue_address != 0 &&
-			q->queue_percent > 0);
+			q->queue_percent > 0 &&
+			!q->is_evicted);
 
 	return 0;
 }
@@ -342,7 +343,8 @@ static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
 
 	q->is_active = (q->queue_size > 0 &&
 			q->queue_address != 0 &&
-			q->queue_percent > 0);
+			q->queue_percent > 0 &&
+			!q->is_evicted);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 56c2e36..cac7aa2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -335,7 +335,11 @@ enum kfd_queue_format {
  * @is_interop: Defines if this is a interop queue. Interop queue means that
  * the queue can access both graphics and compute resources.
  *
- * @is_active: Defines if the queue is active or not.
+ * @is_evicted: Defines if the queue is evicted. Only active queues
+ * are evicted, rendering them inactive.
+ *
+ * @is_active: Defines if the queue is active or not. @is_active and
+ * @is_evicted are protected by the DQM lock.
  *
  * @vmid: If the scheduling mode is no cp scheduling the field defines the vmid
  * of the queue.
@@ -357,6 +361,7 @@ struct queue_properties {
 	uint32_t __iomem *doorbell_ptr;
 	uint32_t doorbell_off;
 	bool is_interop;
+	bool is_evicted;
 	bool is_active;
 	/* Not relevant for user mode queues in cp scheduling */
 	unsigned int vmid;
@@ -460,6 +465,7 @@ struct qcm_process_device {
 	unsigned int queue_count;
 	unsigned int vmid;
 	bool is_debug;
+	unsigned int evicted; /* eviction counter, 0=active */
 
 	/* This flag tells if we should reset all wavefronts on
 	 * process termination
@@ -486,6 +492,17 @@ struct qcm_process_device {
 	uint64_t tma_addr;
 };
 
+/* KFD Memory Eviction */
+
+/* Approx. wait time before attempting to restore evicted BOs */
+#define PROCESS_RESTORE_TIME_MS 100
+/* Approx. back off time if restore fails due to lack of memory */
+#define PROCESS_BACK_OFF_TIME_MS 100
+/* Approx. time before evicting the process again */
+#define PROCESS_ACTIVE_TIME_MS 10
+
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence);
 
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
@@ -600,6 +617,16 @@ struct kfd_process {
 	 * during restore
 	 */
 	struct dma_fence *ef;
+
+	/* Work items for evicting and restoring BOs */
+	struct delayed_work eviction_work;
+	struct delayed_work restore_work;
+	/* seqno of the last scheduled eviction */
+	unsigned int last_eviction_seqno;
+	/* Approx. the last timestamp (in jiffies) when the process was
+	 * restored after an eviction
+	 */
+	unsigned long last_restore_timestamp;
 };
 
 #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
@@ -629,7 +656,10 @@ void kfd_process_destroy_wq(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
 void kfd_unref_process(struct kfd_process *p);
+void kfd_suspend_all_processes(void);
+int kfd_resume_all_processes(void);
 
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index cf4fa25..18b2b86 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -55,6 +55,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 					struct file *filep);
 static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep);
 
+static void evict_process_worker(struct work_struct *work);
+static void restore_process_worker(struct work_struct *work);
+
 
 void kfd_process_create_wq(void)
 {
@@ -230,6 +233,9 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
 	mutex_unlock(&kfd_processes_mutex);
 	synchronize_srcu(&kfd_processes_srcu);
 
+	cancel_delayed_work_sync(&p->eviction_work);
+	cancel_delayed_work_sync(&p->restore_work);
+
 	mutex_lock(&p->mutex);
 
 	/* Iterate over all process device data structures and if the
@@ -351,6 +357,10 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	if (err != 0)
 		goto err_init_apertures;
 
+	INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
+	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
+	process->last_restore_timestamp = get_jiffies_64();
+
 	err = kfd_process_init_cwsr(process, filep);
 	if (err)
 		goto err_init_cwsr;
@@ -402,6 +412,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
 	pdd->qpd.dqm = dev->dqm;
 	pdd->qpd.pqm = &p->pqm;
+	pdd->qpd.evicted = 0;
 	pdd->process = p;
 	pdd->bound = PDD_UNBOUND;
 	pdd->already_dequeued = false;
@@ -490,6 +501,208 @@ struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 	return ret_p;
 }
 
+/* This increments the process->ref counter. */
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
+{
+	struct kfd_process *p;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	p = find_process_by_mm(mm);
+	if (p)
+		kref_get(&p->ref);
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return p;
+}
+
+/* process_evict_queues - Evict all user queues of a process
+ *
+ * Eviction is reference-counted per process-device. This means multiple
+ * evictions from different sources can be nested safely.
+ */
+static int process_evict_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r = 0;
+	unsigned int n_evicted = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
+							    &pdd->qpd);
+		if (r) {
+			pr_err("Failed to evict process queues\n");
+			goto fail;
+		}
+		n_evicted++;
+	}
+
+	return r;
+
+fail:
+	/* To keep state consistent, roll back partial eviction by
+	 * restoring queues
+	 */
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		if (n_evicted == 0)
+			break;
+		if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd))
+			pr_err("Failed to restore queues\n");
+
+		n_evicted--;
+	}
+
+	return r;
+}
+
+/* process_restore_queues - Restore all user queues of a process */
+static  int process_restore_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r, ret = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd);
+		if (r) {
+			pr_err("Failed to restore process queues\n");
+			if (!ret)
+				ret = r;
+		}
+	}
+
+	return ret;
+}
+
+static void evict_process_worker(struct work_struct *work)
+{
+	int ret;
+	struct kfd_process *p;
+	struct delayed_work *dwork;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, eviction_work);
+	WARN_ONCE(p->last_eviction_seqno != p->ef->seqno,
+		  "Eviction fence mismatch\n");
+
+	/* Narrow window of overlap between restore and evict work
+	 * item is possible. Once amdgpu_amdkfd_gpuvm_restore_process_bos
+	 * unreserves KFD BOs, it is possible to evicted again. But
+	 * restore has few more steps of finish. So lets wait for any
+	 * previous restore work to complete
+	 */
+	flush_delayed_work(&p->restore_work);
+
+	pr_debug("Started evicting pasid %d\n", p->pasid);
+	ret = process_evict_queues(p);
+	if (!ret) {
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+		schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
+
+		pr_debug("Finished evicting pasid %d\n", p->pasid);
+	} else
+		pr_err("Failed to evict queues of pasid %d\n", p->pasid);
+}
+
+static void restore_process_worker(struct work_struct *work)
+{
+	struct delayed_work *dwork;
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+	int ret = 0;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, restore_work);
+
+	/* Call restore_process_bos on the first KGD device. This function
+	 * takes care of restoring the whole process including other devices.
+	 * Restore can fail if enough memory is not available. If so,
+	 * reschedule again.
+	 */
+	pdd = list_first_entry(&p->per_device_data,
+			       struct kfd_process_device,
+			       per_device_list);
+
+	pr_debug("Started restoring pasid %d\n", p->pasid);
+
+	/* Setting last_restore_timestamp before successful restoration.
+	 * Otherwise this would have to be set by KGD (restore_process_bos)
+	 * before KFD BOs are unreserved. If not, the process can be evicted
+	 * again before the timestamp is set.
+	 * If restore fails, the timestamp will be set again in the next
+	 * attempt. This would mean that the minimum GPU quanta would be
+	 * PROCESS_ACTIVE_TIME_MS - (time to execute the following two
+	 * functions)
+	 */
+
+	p->last_restore_timestamp = get_jiffies_64();
+	ret = pdd->dev->kfd2kgd->restore_process_bos(p->kgd_process_info,
+						     &p->ef);
+	if (ret) {
+		pr_debug("Failed to restore BOs of pasid %d, retry after %d ms\n",
+			 p->pasid, PROCESS_BACK_OFF_TIME_MS);
+		ret = schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_BACK_OFF_TIME_MS));
+		WARN(!ret, "reschedule restore work failed\n");
+		return;
+	}
+
+	ret = process_restore_queues(p);
+	if (!ret)
+		pr_debug("Finished restoring pasid %d\n", p->pasid);
+	else
+		pr_err("Failed to restore queues of pasid %d\n", p->pasid);
+}
+
+void kfd_suspend_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		cancel_delayed_work_sync(&p->eviction_work);
+		cancel_delayed_work_sync(&p->restore_work);
+
+		if (process_evict_queues(p))
+			pr_err("Failed to suspend process %d\n", p->pasid);
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+}
+
+int kfd_resume_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int ret = 0, idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		if (!schedule_delayed_work(&p->restore_work, 0)) {
+			pr_err("Restore process %d failed during resume\n",
+			       p->pasid);
+			ret = -EFAULT;
+		}
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+	return ret;
+}
+
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 17/25] uapi: Fix type used in ioctl parameter structures
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-18-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-02-07  1:32   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
                     ` (7 subsequent siblings)
  24 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Use __u32 and __u64 instead of POSIX types that may not be defined
in user mode builds.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f4cab5b..111d73b 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -263,10 +263,10 @@ struct kfd_ioctl_get_tile_config_args {
 };
 
 struct kfd_ioctl_set_trap_handler_args {
-	uint64_t tba_addr;		/* to KFD */
-	uint64_t tma_addr;		/* to KFD */
-	uint32_t gpu_id;		/* to KFD */
-	uint32_t pad;
+	__u64 tba_addr;		/* to KFD */
+	__u64 tma_addr;		/* to KFD */
+	__u32 gpu_id;		/* to KFD */
+	__u32 pad;
 };
 
 #define AMDKFD_IOCTL_BASE 'K'
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.

Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.

This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c     | 94 ++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 22 +++----
 include/uapi/linux/kfd_ioctl.h               | 27 +++++++-
 3 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6fe2496..7d40094 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -825,6 +825,97 @@ static int kfd_ioctl_get_process_apertures(struct file *filp,
 	return 0;
 }
 
+static int kfd_ioctl_get_process_apertures_new(struct file *filp,
+				struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_get_process_apertures_new_args *args = data;
+	struct kfd_process_device_apertures *pa;
+	struct kfd_process_device *pdd;
+	uint32_t nodes = 0;
+	int ret;
+
+	dev_dbg(kfd_device, "get apertures for PASID %d", p->pasid);
+
+	if (args->num_of_nodes == 0) {
+		/* Return number of nodes, so that user space can alloacate
+		 * sufficient memory
+		 */
+		mutex_lock(&p->mutex);
+
+		if (!kfd_has_process_device_data(p))
+			goto out_unlock;
+
+		/* Run over all pdd of the process */
+		pdd = kfd_get_first_process_device_data(p);
+		do {
+			args->num_of_nodes++;
+			pdd = kfd_get_next_process_device_data(p, pdd);
+		} while (pdd);
+
+		goto out_unlock;
+	}
+
+	/* Fill in process-aperture information for all available
+	 * nodes, but not more than args->num_of_nodes as that is
+	 * the amount of memory allocated by user
+	 */
+	pa = kzalloc((sizeof(struct kfd_process_device_apertures) *
+				args->num_of_nodes), GFP_KERNEL);
+	if (!pa)
+		return -ENOMEM;
+
+	mutex_lock(&p->mutex);
+
+	if (!kfd_has_process_device_data(p)) {
+		args->num_of_nodes = 0;
+		kfree(pa);
+		goto out_unlock;
+	}
+
+	/* Run over all pdd of the process */
+	pdd = kfd_get_first_process_device_data(p);
+	do {
+		pa[nodes].gpu_id = pdd->dev->id;
+		pa[nodes].lds_base = pdd->lds_base;
+		pa[nodes].lds_limit = pdd->lds_limit;
+		pa[nodes].gpuvm_base = pdd->gpuvm_base;
+		pa[nodes].gpuvm_limit = pdd->gpuvm_limit;
+		pa[nodes].scratch_base = pdd->scratch_base;
+		pa[nodes].scratch_limit = pdd->scratch_limit;
+
+		dev_dbg(kfd_device,
+			"gpu id %u\n", pdd->dev->id);
+		dev_dbg(kfd_device,
+			"lds_base %llX\n", pdd->lds_base);
+		dev_dbg(kfd_device,
+			"lds_limit %llX\n", pdd->lds_limit);
+		dev_dbg(kfd_device,
+			"gpuvm_base %llX\n", pdd->gpuvm_base);
+		dev_dbg(kfd_device,
+			"gpuvm_limit %llX\n", pdd->gpuvm_limit);
+		dev_dbg(kfd_device,
+			"scratch_base %llX\n", pdd->scratch_base);
+		dev_dbg(kfd_device,
+			"scratch_limit %llX\n", pdd->scratch_limit);
+		nodes++;
+
+		pdd = kfd_get_next_process_device_data(p, pdd);
+	} while (pdd && (nodes < args->num_of_nodes));
+	mutex_unlock(&p->mutex);
+
+	args->num_of_nodes = nodes;
+	ret = copy_to_user(
+			(void __user *)args->kfd_process_device_apertures_ptr,
+			pa,
+			(nodes * sizeof(struct kfd_process_device_apertures)));
+	kfree(pa);
+	return ret ? -EFAULT : 0;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return 0;
+}
+
 static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 					void *data)
 {
@@ -1017,6 +1108,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_TRAP_HANDLER,
 			kfd_ioctl_set_trap_handler, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
+			kfd_ioctl_get_process_apertures_new, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 7377513..a06b010 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -282,14 +282,14 @@
 	(((uint64_t)(base) & \
 		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
 
-#define MAKE_SCRATCH_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x100000000L)
+#define MAKE_SCRATCH_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
 
 #define MAKE_SCRATCH_APP_LIMIT(base) \
 	(((uint64_t)base & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
-#define MAKE_LDS_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x0)
+#define MAKE_LDS_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x0)
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
@@ -314,7 +314,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			return -1;
 		}
 		/*
-		 * For 64 bit process aperture will be statically reserved in
+		 * For 64 bit process apertures will be statically reserved in
 		 * the x86_64 non canonical process address space
 		 * amdkfd doesn't currently support apertures for 32 bit process
 		 */
@@ -323,12 +323,11 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_base = pdd->gpuvm_limit = 0;
 			pdd->scratch_base = pdd->scratch_limit = 0;
 		} else {
-			/*
-			 * node id couldn't be 0 - the three MSB bits of
-			 * aperture shoudn't be 0
+			/* Same LDS and scratch apertures can be used
+			 * on all GPUs. This allows using more dGPUs
+			 * than placement options for apertures.
 			 */
-			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
-
+			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
 			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
@@ -336,8 +335,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_limit =
 					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
 
-			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
-
+			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 		}
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 111d73b..5201437 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -107,8 +107,6 @@ struct kfd_ioctl_get_clock_counters_args {
 	__u32 pad;
 };
 
-#define NUM_OF_SUPPORTED_GPUS 7
-
 struct kfd_process_device_apertures {
 	__u64 lds_base;		/* from KFD */
 	__u64 lds_limit;		/* from KFD */
@@ -120,6 +118,12 @@ struct kfd_process_device_apertures {
 	__u32 pad;
 };
 
+/*
+ * AMDKFD_IOC_GET_PROCESS_APERTURES is deprecated. Use
+ * AMDKFD_IOC_GET_PROCESS_APERTURES_NEW instead, which supports an
+ * unlimited number of GPUs.
+ */
+#define NUM_OF_SUPPORTED_GPUS 7
 struct kfd_ioctl_get_process_apertures_args {
 	struct kfd_process_device_apertures
 			process_apertures[NUM_OF_SUPPORTED_GPUS];/* from KFD */
@@ -129,6 +133,19 @@ struct kfd_ioctl_get_process_apertures_args {
 	__u32 pad;
 };
 
+struct kfd_ioctl_get_process_apertures_new_args {
+	/* User allocated. Pointer to struct kfd_process_device_apertures
+	 * filled in by Kernel
+	 */
+	__u64 kfd_process_device_apertures_ptr;
+	/* to KFD - indicates amount of memory present in
+	 *  kfd_process_device_apertures_ptr
+	 * from KFD - Number of entries filled by KFD.
+	 */
+	__u32 num_of_nodes;
+	__u32 pad;
+};
+
 #define MAX_ALLOWED_NUM_POINTS    100
 #define MAX_ALLOWED_AW_BUFF_SIZE 4096
 #define MAX_ALLOWED_WAC_BUFF_SIZE  128
@@ -332,7 +349,11 @@ struct kfd_ioctl_set_trap_handler_args {
 #define AMDKFD_IOC_SET_TRAP_HANDLER		\
 		AMDKFD_IOW(0x13, struct kfd_ioctl_set_trap_handler_args)
 
+#define AMDKFD_IOC_GET_PROCESS_APERTURES_NEW	\
+		AMDKFD_IOWR(0x14,		\
+			struct kfd_ioctl_get_process_apertures_new_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x14
+#define AMDKFD_COMMAND_END		0x15
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
                     ` (5 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Set up the GPUVM aperture for SVM (shared virtual memory) that allows
sharing a part of virtual address space between GPUs and CPUs.

Report the size of the the GPUVM size supported by KGD accurately.

The low part of the GPUVM aperture is reserved for kernel use. This is
for kernel-allocated buffers that are only accessed on the GPU:
- CWSR trap handler
- IB for submitting commands in user-mode context from kernel mode

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 37 ++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h        |  4 +++
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index a06b010..66852de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -278,9 +278,8 @@
 #define MAKE_GPUVM_APP_BASE(gpu_num) \
 	(((uint64_t)(gpu_num) << 61) + 0x1000000000000L)
 
-#define MAKE_GPUVM_APP_LIMIT(base) \
-	(((uint64_t)(base) & \
-		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
+#define MAKE_GPUVM_APP_LIMIT(base, size) \
+	(((uint64_t)(base) & 0xFFFFFF0000000000UL) + (size) - 1)
 
 #define MAKE_SCRATCH_APP_BASE() \
 	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
@@ -293,6 +292,14 @@
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
+/* User mode manages most of the SVM aperture address space. The low
+ * 16MB are reserved for kernel use (CWSR trap handler and kernel IB
+ * for now).
+ */
+#define SVM_USER_BASE 0x1000000ull
+#define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE)
+#define SVM_IB_BASE   (SVM_CWSR_BASE - PAGE_SIZE)
+
 int kfd_init_apertures(struct kfd_process *process)
 {
 	uint8_t id  = 0;
@@ -330,14 +337,28 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
-			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
-
-			pdd->gpuvm_limit =
-					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
-
 			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+
+			if (dev->device_info->needs_iommu_device) {
+				/* APUs: GPUVM aperture in
+				 * non-canonical address space
+				 */
+				pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
+				pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(
+					pdd->gpuvm_base,
+					dev->shared_resources.gpuvm_size);
+			} else {
+				/* dGPUs: SVM aperture starting at 0
+				 * with small reserved space for kernel
+				 */
+				pdd->gpuvm_base = SVM_USER_BASE;
+				pdd->gpuvm_limit =
+					dev->shared_resources.gpuvm_size - 1;
+				pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+				pdd->qpd.ib_base = SVM_IB_BASE;
+			}
 		}
 
 		dev_dbg(kfd_device, "node id %u\n", id);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cac7aa2..a0748ae 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -488,8 +488,12 @@ struct qcm_process_device {
 
 	/* CWSR memory */
 	void *cwsr_kaddr;
+	uint64_t cwsr_base;
 	uint64_t tba_addr;
 	uint64_t tma_addr;
+
+	/* IB memory */
+	uint64_t ib_base;
 };
 
 /* KFD Memory Eviction */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Also used for cleaning up on process termination.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 11 ++++++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 66 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index a0748ae..4c8c08a1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -542,6 +542,9 @@ struct kfd_process_device {
 	/* VM context for GPUVM allocations */
 	void *vm;
 
+	/* GPUVM allocations storage */
+	struct idr alloc_idr;
+
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
@@ -675,6 +678,14 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma);
 
+/* KFD process API for creating and translating handles */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem);
+void *kfd_process_device_translate_handle(struct kfd_process_device *p,
+					int handle);
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle);
+
 /* Process device data iterator */
 struct kfd_process_device *kfd_get_first_process_device_data(
 							struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 18b2b86..85f1a9e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -149,6 +149,32 @@ void kfd_unref_process(struct kfd_process *p)
 	kref_put(&p->ref, kfd_process_ref_release);
 }
 
+static void kfd_process_free_outstanding_kfd_bos(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	int id;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		/*
+		 * Remove all handles from idr and release appropriate
+		 * local memory object
+		 */
+		idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+			list_for_each_entry(peer_pdd, &p->per_device_data,
+					per_device_list) {
+				peer_pdd->dev->kfd2kgd->unmap_memory_to_gpu(
+						peer_pdd->dev->kgd,
+						mem, peer_pdd->vm);
+			}
+
+			pdd->dev->kfd2kgd->free_memory_of_gpu(
+					pdd->dev->kgd, mem);
+			kfd_process_device_remove_obj_handle(pdd, id);
+		}
+	}
+}
+
 static void kfd_process_destroy_pdds(struct kfd_process *p)
 {
 	struct kfd_process_device *pdd, *temp;
@@ -168,6 +194,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
+		idr_destroy(&pdd->alloc_idr);
+
 		kfree(pdd);
 	}
 }
@@ -184,6 +212,8 @@ static void kfd_process_wq_release(struct work_struct *work)
 
 	kfd_iommu_unbind_process(p);
 
+	kfd_process_free_outstanding_kfd_bos(p);
+
 	kfd_process_destroy_pdds(p);
 	dma_fence_put(p->ef);
 
@@ -368,6 +398,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return process;
 
 err_init_cwsr:
+	kfd_process_free_outstanding_kfd_bos(process);
 	kfd_process_destroy_pdds(process);
 err_init_apertures:
 	pqm_uninit(&process->pqm);
@@ -418,6 +449,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
+	/* Init idr used for memory handle translation */
+	idr_init(&pdd->alloc_idr);
+
 	/* Create the GPUVM context for this specific device */
 	if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
 					    &p->kgd_process_info, &p->ef)) {
@@ -427,6 +461,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	return pdd;
 
 err_create_pdd:
+	idr_destroy(&pdd->alloc_idr);
 	list_del(&pdd->per_device_list);
 	kfree(pdd);
 	return NULL;
@@ -480,6 +515,37 @@ bool kfd_has_process_device_data(struct kfd_process *p)
 	return !(list_empty(&p->per_device_data));
 }
 
+/* Create specific handle mapped to mem from process local memory idr
+ * Assumes that the process lock is held.
+ */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem)
+{
+	return idr_alloc(&pdd->alloc_idr, mem, 0, 0, GFP_KERNEL);
+}
+
+/* Translate specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void *kfd_process_device_translate_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle < 0)
+		return NULL;
+
+	return idr_find(&pdd->alloc_idr, handle);
+}
+
+/* Remove specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle >= 0)
+		idr_remove(&pdd->alloc_idr, handle);
+}
+
 /* This increments the process->ref counter. */
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Add helpers for allocating GPUVM memory in kernel mode and use them
to allocate memory for the CWSR trap handler.

v2:
* Use dev instead of pdd->dev in kfd_process_free_gpuvm

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 125 +++++++++++++++++++++++++++----
 1 file changed, 112 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 85f1a9e..486e2ee 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -73,6 +73,84 @@ void kfd_process_destroy_wq(void)
 	}
 }
 
+static void kfd_process_free_gpuvm(struct kgd_mem *mem,
+			struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+
+	dev->kfd2kgd->unmap_memory_to_gpu(dev->kgd, mem, pdd->vm);
+	dev->kfd2kgd->free_memory_of_gpu(dev->kgd, mem);
+}
+
+/* kfd_process_alloc_gpuvm - Allocate GPU VM for the KFD process
+ *	This function should be only called right after the process
+ *	is created and when kfd_processes_mutex is still being held
+ *	to avoid concurrency. Because of that exclusiveness, we do
+ *	not need to take p->mutex.
+ */
+static int kfd_process_alloc_gpuvm(struct kfd_process *p,
+		struct kfd_dev *kdev, uint64_t gpu_va, uint32_t size,
+		void **kptr, struct kfd_process_device *pdd, uint32_t flags)
+{
+	int err;
+	void *mem = NULL;
+	int handle;
+
+	err = kdev->kfd2kgd->alloc_memory_of_gpu(kdev->kgd, gpu_va, size,
+				pdd->vm,
+				(struct kgd_mem **)&mem, NULL, flags);
+	if (err)
+		goto err_alloc_mem;
+
+	err = kdev->kfd2kgd->map_memory_to_gpu(
+				kdev->kgd, (struct kgd_mem *)mem, pdd->vm);
+	if (err)
+		goto err_map_mem;
+
+	err = kdev->kfd2kgd->sync_memory(kdev->kgd, (struct kgd_mem *) mem,
+				true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Create an obj handle so kfd_process_device_remove_obj_handle
+	 * will take care of the bo removal when the process finishes.
+	 * We do not need to take p->mutex, because the process is just
+	 * created and the ioctls have not had the chance to run.
+	 */
+	handle = kfd_process_device_create_obj_handle(pdd, mem);
+
+	if (handle < 0) {
+		err = handle;
+		goto free_gpuvm;
+	}
+
+	if (kptr) {
+		err = kdev->kfd2kgd->map_gtt_bo_to_kernel(kdev->kgd,
+				(struct kgd_mem *)mem, kptr, NULL);
+		if (err) {
+			pr_debug("Map GTT BO to kernel failed\n");
+			goto free_obj_handle;
+		}
+	}
+
+	return err;
+
+free_obj_handle:
+	kfd_process_device_remove_obj_handle(pdd, handle);
+free_gpuvm:
+sync_memory_failed:
+	kfd_process_free_gpuvm(mem, pdd);
+	return err;
+
+err_map_mem:
+	kdev->kfd2kgd->free_memory_of_gpu(kdev->kgd, mem);
+err_alloc_mem:
+	*kptr = NULL;
+	return err;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -190,7 +268,7 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 
 		list_del(&pdd->per_device_list);
 
-		if (pdd->qpd.cwsr_kaddr)
+		if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
@@ -307,24 +385,45 @@ static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep)
 	struct kfd_process_device *pdd = NULL;
 	struct kfd_dev *dev = NULL;
 	struct qcm_process_device *qpd = NULL;
+	void *kaddr;
+	const uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+		ALLOC_MEM_FLAGS_NO_SUBSTITUTE | ALLOC_MEM_FLAGS_EXECUTABLE;
+	int ret;
 
 	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
 		dev = pdd->dev;
 		qpd = &pdd->qpd;
 		if (!dev->cwsr_enabled || qpd->cwsr_kaddr)
 			continue;
-		offset = (dev->id | KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
-		qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
-			KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
-			MAP_SHARED, offset);
-
-		if (IS_ERR_VALUE(qpd->tba_addr)) {
-			int err = qpd->tba_addr;
-
-			pr_err("Failure to set tba address. error %d.\n", err);
-			qpd->tba_addr = 0;
-			qpd->cwsr_kaddr = NULL;
-			return err;
+		if (qpd->cwsr_base) {
+			/* cwsr_base is only set for dGPU */
+			ret = kfd_process_alloc_gpuvm(p, dev, qpd->cwsr_base,
+				KFD_CWSR_TBA_TMA_SIZE, &kaddr, pdd, flags);
+			if (!ret) {
+				qpd->cwsr_kaddr = kaddr;
+				qpd->tba_addr = qpd->cwsr_base;
+			} else
+				/* In case of error, the kfd_bos for some pdds
+				 * which are already allocated successfully
+				 * will be freed in upper level function
+				 * i.e. create_process().
+				 */
+				return ret;
+		} else {
+			offset = (dev->id |
+				KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
+			qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
+				KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
+				MAP_SHARED, offset);
+
+			if (IS_ERR_VALUE(qpd->tba_addr)) {
+				ret = qpd->tba_addr;
+				pr_err("Failure to set tba address. error %d.\n",
+				       ret);
+				qpd->tba_addr = 0;
+				qpd->cwsr_kaddr = NULL;
+				return ret;
+			}
 		}
 
 		memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (20 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
                     ` (2 subsequent siblings)
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Amber Lin, Felix Kuehling

On GFX7 the CP does not perform a TC flush when queues are unmapped.
To avoid TC eviction from accessing an invalid VMID, flush it
explicitly before releasing a VMID.

v2:
* Fix unnecessary list_for_each_entry_safe

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 22 +++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 37 ++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 50 ++++++++++++++++++++++
 4 files changed, 111 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b3b6dab..c18e048 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -142,12 +142,31 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	return 0;
 }
 
+static int flush_texture_cache_nocpsch(struct kfd_dev *kdev,
+				struct qcm_process_device *qpd)
+{
+	uint32_t len;
+
+	if (!qpd->ib_kaddr)
+		return -ENOMEM;
+
+	len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+
+	return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid,
+				qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len);
+}
+
 static void deallocate_vmid(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
 				struct queue *q)
 {
 	int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
 
+	/* On GFX v7, CP doesn't flush TC at dequeue */
+	if (q->device->device_info->asic_family == CHIP_HAWAII)
+		if (flush_texture_cache_nocpsch(q->device, qpd))
+			pr_err("Failed to flush TC\n");
+
 	kfd_flush_tlb(qpd_to_pdd(qpd));
 
 	/* Release the vmid mapping */
@@ -792,11 +811,12 @@ static void uninitialize(struct device_queue_manager *dqm)
 static int start_nocpsch(struct device_queue_manager *dqm)
 {
 	init_interrupts(dqm);
-	return 0;
+	return pm_init(&dqm->packets, dqm);
 }
 
 static int stop_nocpsch(struct device_queue_manager *dqm)
 {
+	pm_uninit(&dqm->packets);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 0ecbd1f..7614375 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -356,6 +356,43 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 	return retval;
 }
 
+/* pm_create_release_mem - Create a RELEASE_MEM packet and return the size
+ *     of this packet
+ *     @gpu_addr - GPU address of the packet. It's a virtual address.
+ *     @buffer - buffer to fill up with the packet. It's a CPU kernel pointer
+ *     Return - length of the packet
+ */
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer)
+{
+	struct pm4_mec_release_mem *packet;
+
+	WARN_ON(!buffer);
+
+	packet = (struct pm4_mec_release_mem *)buffer;
+	memset(buffer, 0, sizeof(*packet));
+
+	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
+						 sizeof(*packet));
+
+	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
+	packet->bitfields2.event_index = event_index___release_mem__end_of_pipe;
+	packet->bitfields2.tcl1_action_ena = 1;
+	packet->bitfields2.tc_action_ena = 1;
+	packet->bitfields2.cache_policy = cache_policy___release_mem__lru;
+	packet->bitfields2.atc = 0;
+
+	packet->bitfields3.data_sel = data_sel___release_mem__send_32_bit_low;
+	packet->bitfields3.int_sel =
+		int_sel___release_mem__send_interrupt_after_write_confirm;
+
+	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
+	packet->address_hi = upper_32_bits(gpu_addr);
+
+	packet->data_lo = 0;
+
+	return sizeof(*packet) / sizeof(unsigned int);
+}
+
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 {
 	pm->dqm = dqm;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4c8c08a1..c79624b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -494,6 +494,7 @@ struct qcm_process_device {
 
 	/* IB memory */
 	uint64_t ib_base;
+	void *ib_kaddr;
 };
 
 /* KFD Memory Eviction */
@@ -831,6 +832,8 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 
 void pm_release_ib(struct packet_manager *pm);
 
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer);
+
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 486e2ee..0684c49 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -151,6 +151,52 @@ static int kfd_process_alloc_gpuvm(struct kfd_process *p,
 	return err;
 }
 
+/* kfd_process_reserve_ib_mem - Reserve memory inside the process for IB usage
+ *	The memory reserved is for KFD to submit IB to AMDGPU from kernel.
+ *	If the memory is reserved successfully, ib_kaddr will have
+ *	the CPU/kernel address. Check ib_kaddr before accessing the
+ *	memory.
+ */
+static int kfd_process_reserve_ib_mem(struct kfd_process *p)
+{
+	int ret = 0;
+	struct kfd_process_device *pdd = NULL;
+	struct kfd_dev *kdev = NULL;
+	struct qcm_process_device *qpd = NULL;
+	void *kaddr;
+	uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+			 ALLOC_MEM_FLAGS_NO_SUBSTITUTE |
+			 ALLOC_MEM_FLAGS_WRITABLE |
+			 ALLOC_MEM_FLAGS_EXECUTABLE;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		kdev = pdd->dev;
+		qpd = &pdd->qpd;
+		if (qpd->ib_kaddr)
+			continue;
+
+		if (qpd->ib_base) { /* is dGPU */
+			ret = kfd_process_alloc_gpuvm(p, kdev,
+				qpd->ib_base, PAGE_SIZE,
+				&kaddr, pdd, flags);
+			if (!ret)
+				qpd->ib_kaddr = kaddr;
+			else
+				/* In case of error, the kfd_bos for some pdds
+				 * which are already allocated successfully
+				 * will be freed in upper level function
+				 * i.e. create_process().
+				 */
+				return ret;
+		} else {
+			/* FIXME: Support APU */
+			continue;
+		}
+	}
+
+	return 0;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -490,6 +536,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
 	process->last_restore_timestamp = get_jiffies_64();
 
+	err = kfd_process_reserve_ib_mem(process);
+	if (err)
+		goto err_reserve_ib_mem;
 	err = kfd_process_init_cwsr(process, filep);
 	if (err)
 		goto err_init_cwsr;
@@ -497,6 +546,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return process;
 
 err_init_cwsr:
+err_reserve_ib_mem:
 	kfd_process_free_outstanding_kfd_bos(process);
 	kfd_process_destroy_pdds(process);
 err_init_apertures:
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (21 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
  2018-02-07  1:32   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

v2:
* Fix error handling after kfd_bind_process_to_device in
  kfd_ioctl_map_memory_to_gpu

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c        | 329 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   8 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |   2 +
 include/uapi/linux/kfd_ioctl.h                  |  54 +++-
 4 files changed, 392 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 7d40094..2f480c5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1046,6 +1046,323 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
 	return 0;
 }
 
+bool kfd_dev_is_large_bar(struct kfd_dev *dev)
+{
+	struct kfd_local_mem_info mem_info;
+
+	if (dev->device_info->needs_iommu_device)
+		return false;
+
+	dev->kfd2kgd->get_local_mem_info(dev->kgd, &mem_info);
+	if (mem_info.local_mem_size_private == 0 &&
+			mem_info.local_mem_size_public > 0)
+		return true;
+	return false;
+}
+
+static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_alloc_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int idr_handle;
+	long err;
+	uint64_t offset = args->mmap_offset;
+	uint32_t flags = args->flags;
+
+	if (args->size == 0)
+		return -EINVAL;
+
+	dev = kfd_device_by_id(args->gpu_id);
+	if (!dev)
+		return -EINVAL;
+
+	if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) &&
+		(flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) &&
+		!kfd_dev_is_large_bar(dev)) {
+		pr_err("Alloc host visible vram on small bar is not allowed\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto err_unlock;
+	}
+
+	err = dev->kfd2kgd->alloc_memory_of_gpu(
+		dev->kgd, args->va_addr, args->size,
+		pdd->vm, (struct kgd_mem **) &mem, &offset,
+		flags);
+
+	if (err)
+		goto err_unlock;
+
+	idr_handle = kfd_process_device_create_obj_handle(pdd, mem);
+	if (idr_handle < 0) {
+		err = -EFAULT;
+		goto err_free;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
+	args->mmap_offset = offset;
+
+	return 0;
+
+err_free:
+	dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
+}
+
+static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_free_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int ret;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		pr_err("Process device data doesn't exist\n");
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	mem = kfd_process_device_translate_handle(
+		pdd, GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	ret = dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+
+	/* If freeing the buffer failed, leave the handle in place for
+	 * clean-up during process tear-down.
+	 */
+	if (!ret)
+		kfd_process_device_remove_obj_handle(
+			pdd, GET_IDR_HANDLE(args->handle));
+
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return ret;
+}
+
+static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_map_memory_to_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	int i, num_dev = 0;
+	uint32_t *devices_arr = NULL;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->device_ids_array_size) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->device_ids_array_size & 3) {
+		pr_debug("Misaligned device IDs array size %u\n",
+			 args->device_ids_array_size);
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->device_ids_array_size, GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->device_ids_array_size);
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	num_dev = args->device_ids_array_size / sizeof(uint32_t);
+	for (i = 0 ; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			pr_debug("Getting device by id failed for 0x%x\n",
+				 devices_arr[i]);
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_bind_process_to_device(peer, p);
+		if (IS_ERR(peer_pdd)) {
+			err = PTR_ERR(peer_pdd);
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = peer->kfd2kgd->map_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to map to gpu %d/%d\n",
+			       i, num_dev);
+			goto map_memory_to_gpu_failed;
+		}
+	}
+
+	mutex_unlock(&p->mutex);
+
+	err = dev->kfd2kgd->sync_memory(dev->kgd, (struct kgd_mem *) mem, true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Flush TLBs after waiting for the page table updates to complete */
+	for (i = 0; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (WARN_ON_ONCE(!peer))
+			continue;
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (WARN_ON_ONCE(!peer_pdd))
+			continue;
+		kfd_flush_tlb(peer_pdd);
+	}
+
+	kfree(devices_arr);
+
+	return err;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+map_memory_to_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+sync_memory_failed:
+	kfree(devices_arr);
+
+	return err;
+}
+
+static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_unmap_memory_from_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	uint32_t *devices_arr = NULL, num_dev, i;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->device_ids_array_size) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->device_ids_array_size & 3) {
+		pr_debug("Misaligned device IDs array size %u\n",
+			 args->device_ids_array_size);
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->device_ids_array_size, GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->device_ids_array_size);
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		pr_debug("Process device data doesn't exist\n");
+		err = -ENODEV;
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	num_dev = args->device_ids_array_size / sizeof(uint32_t);
+	for (i = 0 ; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (!peer_pdd) {
+			err = -ENODEV;
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = dev->kfd2kgd->unmap_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to unmap from gpu %d/%d\n",
+			       i, num_dev);
+			goto unmap_memory_from_gpu_failed;
+		}
+	}
+	kfree(devices_arr);
+
+	mutex_unlock(&p->mutex);
+
+	return 0;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+unmap_memory_from_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+	kfree(devices_arr);
+	return err;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
 			    .cmd_drv = 0, .name = #ioctl}
@@ -1111,6 +1428,18 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
 			kfd_ioctl_get_process_apertures_new, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_ALLOC_MEMORY_OF_GPU,
+			kfd_ioctl_alloc_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_FREE_MEMORY_OF_GPU,
+			kfd_ioctl_free_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_MAP_MEMORY_TO_GPU,
+			kfd_ioctl_map_memory_to_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU,
+			kfd_ioctl_unmap_memory_from_gpu, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index c79624b..449822b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -509,6 +509,14 @@ struct qcm_process_device {
 int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
 					       struct dma_fence *fence);
 
+/* 8 byte handle containing GPU ID in the most significant 4 bytes and
+ * idr_handle in the least significant 4 bytes
+ */
+#define MAKE_HANDLE(gpu_id, idr_handle) \
+	(((uint64_t)(gpu_id) << 32) + idr_handle)
+#define GET_GPU_ID(handle) (handle >> 32)
+#define GET_IDR_HANDLE(handle) (handle & 0xFFFFFFFF)
+
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
 	PDD_BOUND,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index b7146e2..9e4d392 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -130,6 +130,7 @@ struct tile_config {
 
 /*
  * Allocation flag domains
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_VRAM		(1 << 0)
 #define ALLOC_MEM_FLAGS_GTT		(1 << 1)
@@ -138,6 +139,7 @@ struct tile_config {
 
 /*
  * Allocation flags attributes/access options.
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
 #define ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 5201437..e2ba6bf 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -286,6 +286,46 @@ struct kfd_ioctl_set_trap_handler_args {
 	__u32 pad;
 };
 
+/* Allocation flags: memory types */
+#define KFD_IOC_ALLOC_MEM_FLAGS_VRAM		(1 << 0)
+#define KFD_IOC_ALLOC_MEM_FLAGS_GTT		(1 << 1)
+#define KFD_IOC_ALLOC_MEM_FLAGS_USERPTR		(1 << 2)
+#define KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL	(1 << 3)
+/* Allocation flags: attributes/access options */
+#define KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
+#define KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
+#define KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
+#define KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28)
+#define KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
+#define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
+
+struct kfd_ioctl_alloc_memory_of_gpu_args {
+	__u64 va_addr;		/* to KFD */
+	__u64 size;		/* to KFD */
+	__u64 handle;		/* from KFD */
+	__u64 mmap_offset;	/* to KFD (userptr), from KFD (mmap offset) */
+	__u32 gpu_id;		/* to KFD */
+	__u32 flags;
+};
+
+struct kfd_ioctl_free_memory_of_gpu_args {
+	__u64 handle;		/* to KFD */
+};
+
+struct kfd_ioctl_map_memory_to_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 device_ids_array_size;	/* to KFD */
+	__u32 pad;
+};
+
+struct kfd_ioctl_unmap_memory_from_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 device_ids_array_size;	/* to KFD */
+	__u32 pad;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)			_IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)		_IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -353,7 +393,19 @@ struct kfd_ioctl_set_trap_handler_args {
 		AMDKFD_IOWR(0x14,		\
 			struct kfd_ioctl_get_process_apertures_new_args)
 
+#define AMDKFD_IOC_ALLOC_MEMORY_OF_GPU		\
+		AMDKFD_IOWR(0x15, struct kfd_ioctl_alloc_memory_of_gpu_args)
+
+#define AMDKFD_IOC_FREE_MEMORY_OF_GPU		\
+		AMDKFD_IOWR(0x16, struct kfd_ioctl_free_memory_of_gpu_args)
+
+#define AMDKFD_IOC_MAP_MEMORY_TO_GPU		\
+		AMDKFD_IOWR(0x17, struct kfd_ioctl_map_memory_to_gpu_args)
+
+#define AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU	\
+		AMDKFD_IOWR(0x18, struct kfd_ioctl_unmap_memory_from_gpu_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x15
+#define AMDKFD_COMMAND_END		0x19
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (22 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  2018-02-07  1:32   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

The events page must be accessible in user mode by the GPU and CPU
as well as in kernel mode by the CPU. On dGPUs user mode virtual
addresses are managed by the Thunk's GPU memory allocation code.
Therefore we can't allocate the memory in kernel mode like we do
on APUs. But KFD still needs to map the memory for kernel access.
To facilitate this, the Thunk provides the buffer handle of the
events page to KFD when creating the first event.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 31 ++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 2f480c5..ec48010 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -922,6 +922,58 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 	struct kfd_ioctl_create_event_args *args = data;
 	int err;
 
+	/* For dGPUs the event page is allocated in user mode. The
+	 * handle is passed to KFD with the first call to this IOCTL
+	 * through the event_page_offset field.
+	 */
+	if (args->event_page_offset) {
+		struct kfd_dev *kfd;
+		struct kfd_process_device *pdd;
+		void *mem, *kern_addr;
+		uint64_t size;
+
+		if (p->signal_page) {
+			pr_err("Event page is already set\n");
+			return -EINVAL;
+		}
+
+		kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
+		if (!kfd) {
+			pr_err("Getting device by id failed in %s\n", __func__);
+			return -EINVAL;
+		}
+
+		mutex_lock(&p->mutex);
+		pdd = kfd_bind_process_to_device(kfd, p);
+		if (IS_ERR(pdd)) {
+			err = PTR_ERR(pdd);
+			goto out_unlock;
+		}
+
+		mem = kfd_process_device_translate_handle(pdd,
+				GET_IDR_HANDLE(args->event_page_offset));
+		if (!mem) {
+			pr_err("Can't find BO, offset is 0x%llx\n",
+			       args->event_page_offset);
+			err = -EINVAL;
+			goto out_unlock;
+		}
+		mutex_unlock(&p->mutex);
+
+		err = kfd->kfd2kgd->map_gtt_bo_to_kernel(kfd->kgd,
+						mem, &kern_addr, &size);
+		if (err) {
+			pr_err("Failed to map event page to kernel\n");
+			return err;
+		}
+
+		err = kfd_event_page_set(p, kern_addr, size);
+		if (err) {
+			pr_err("Failed to set event page\n");
+			return err;
+		}
+	}
+
 	err = kfd_event_create(filp, p, args->event_type,
 				args->auto_reset != 0, args->node_id,
 				&args->event_id, &args->event_trigger_data,
@@ -929,6 +981,10 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 				&args->event_slot_index);
 
 	return err;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
 }
 
 static int kfd_ioctl_destroy_event(struct file *filp, struct kfd_process *p,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 6fb9c0d..4890a90 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -52,6 +52,7 @@ struct kfd_event_waiter {
 struct kfd_signal_page {
 	uint64_t *kernel_address;
 	uint64_t __user *user_address;
+	bool need_to_free_pages;
 };
 
 
@@ -79,6 +80,7 @@ static struct kfd_signal_page *allocate_signal_page(struct kfd_process *p)
 	       KFD_SIGNAL_EVENT_LIMIT * 8);
 
 	page->kernel_address = backing_store;
+	page->need_to_free_pages = true;
 	pr_debug("Allocated new event signal page at %p, for process %p\n",
 			page, p);
 
@@ -269,8 +271,9 @@ static void shutdown_signal_page(struct kfd_process *p)
 	struct kfd_signal_page *page = p->signal_page;
 
 	if (page) {
-		free_pages((unsigned long)page->kernel_address,
-				get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
+		if (page->need_to_free_pages)
+			free_pages((unsigned long)page->kernel_address,
+				   get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
 		kfree(page);
 	}
 }
@@ -292,6 +295,30 @@ static bool event_can_be_cpu_signaled(const struct kfd_event *ev)
 	return ev->type == KFD_EVENT_TYPE_SIGNAL;
 }
 
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size)
+{
+	struct kfd_signal_page *page;
+
+	if (p->signal_page)
+		return -EBUSY;
+
+	page = kzalloc(sizeof(*page), GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
+	/* Initialize all events to unsignaled */
+	memset(kernel_address, (uint8_t) UNSIGNALED_EVENT_SLOT,
+	       KFD_SIGNAL_EVENT_LIMIT * 8);
+
+	page->kernel_address = kernel_address;
+
+	p->signal_page = page;
+	p->signal_mapped_size = size;
+
+	return 0;
+}
+
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 449822b..afe381f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -863,6 +863,8 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
 void kfd_signal_hw_exception_event(unsigned int pasid);
 int kfd_set_event(struct kfd_process *p, uint32_t event_id);
 int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size);
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (23 preceding siblings ...)
  2018-02-07  1:32   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
@ 2018-02-07  1:32   ` Felix Kuehling
  24 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Simulate large-BAR system by exporting only visible memory. This
limits the amount of available VRAM to the size of the BAR, but
enables CPU access to VRAM.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c    | 3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 6 ++++++
 4 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ec48010..7c79144 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1106,6 +1106,11 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev)
 {
 	struct kfd_local_mem_info mem_info;
 
+	if (debug_largebar) {
+		pr_debug("Simulate large-bar allocation on non large-bar machine\n");
+		return true;
+	}
+
 	if (dev->device_info->needs_iommu_device)
 		return false;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 7493f47..3c6c4cdd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1117,6 +1117,9 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
 			sub_type_hdr->length);
 
+	if (debug_largebar)
+		local_mem_info.local_mem_size_private = 0;
+
 	if (local_mem_info.local_mem_size_private == 0)
 		ret = kfd_fill_gpu_memory_affinity(&avail_size,
 				kdev, HSA_MEM_HEAP_TYPE_FB_PUBLIC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 65574c6..b0acb06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -71,6 +71,11 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
 	"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = enable)");
 
+int debug_largebar;
+module_param(debug_largebar, int, 0444);
+MODULE_PARM_DESC(debug_largebar,
+	"Debug large-bar flag used to simulate large-bar capability on non-large bar machine (0 = disable, 1 = enable)");
+
 int ignore_crat;
 module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index afe381f..63e86926 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -105,6 +105,12 @@ extern int cwsr_enable;
 extern int send_sigterm;
 
 /*
+ * This kernel module is used to simulate large bar machine on non-large bar
+ * enabled machines.
+ */
+extern int debug_largebar;
+
+/*
  * Ignore CRAT table during KFD initialization, can be used to work around
  * broken CRAT tables on some AMD systems
  */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional
       [not found]     ` <1517967174-21709-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-07 11:20       ` Christian König
       [not found]         ` <281bede7-0ae6-c7d1-3d3a-a3e0497244c1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-07 11:20 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
> dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
> ASIC information. Also allow building KFD without IOMMUv2 support.
> This is still useful for dGPUs and prepares for enabling KFD on
> architectures that don't support AMD IOMMUv2.
>
> v2:
> * Centralize IOMMUv2 code to avoid #ifdefs in too many places
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/Kconfig        |   2 +-
>   drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356 ++++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
>   11 files changed, 493 insertions(+), 265 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
> index bc5a294..5bbeb95 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig
> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
> @@ -4,6 +4,6 @@
>   
>   config HSA_AMD
>   	tristate "HSA kernel driver for AMD GPU devices"
> -	depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
> +	depends on DRM_AMDGPU && X86_64

You still need a weak dependency on AMD_IOMMU_V2 here, in other words 
add "imply AMD_IOMMU_V2".

This prevents illegal combinations like linking amdkfd into the kernel 
while amd_iommu_v2 is a module.

But it should still allow to completely disable amd_iommu_v2 and compile 
amdkfd without support for it.

Christian.

>   	help
>   	  Enable this if you want to use HSA features on AMD GPU devices.
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
> index a317e76..0d02422 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -37,6 +37,10 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
>   		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
>   		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
>   
> +ifneq ($(CONFIG_AMD_IOMMU_V2),)
> +amdkfd-y += kfd_iommu.o
> +endif
> +
>   amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
>   
>   obj-$(CONFIG_HSA_AMD)	+= amdkfd.o
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 2bc2816..7493f47 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -22,10 +22,10 @@
>   
>   #include <linux/pci.h>
>   #include <linux/acpi.h>
> -#include <linux/amd-iommu.h>
>   #include "kfd_crat.h"
>   #include "kfd_priv.h"
>   #include "kfd_topology.h"
> +#include "kfd_iommu.h"
>   
>   /* GPU Processor ID base for dGPUs for which VCRAT needs to be created.
>    * GPU processor ID are expressed with Bit[31]=1.
> @@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
>   	struct crat_subtype_generic *sub_type_hdr;
>   	struct crat_subtype_computeunit *cu;
>   	struct kfd_cu_info cu_info;
> -	struct amd_iommu_device_info iommu_info;
>   	int avail_size = *size;
>   	uint32_t total_num_of_cu;
>   	int num_of_cache_entries = 0;
>   	int cache_mem_filled = 0;
>   	int ret = 0;
> -	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> -					 AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> -					 AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>   	struct kfd_local_mem_info local_mem_info;
>   
>   	if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
> @@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
>   	/* Check if this node supports IOMMU. During parsing this flag will
>   	 * translate to HSA_CAP_ATS_PRESENT
>   	 */
> -	iommu_info.flags = 0;
> -	if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
> -		if ((iommu_info.flags & required_iommu_flags) ==
> -				required_iommu_flags)
> -			cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
> -	}
> +	if (!kfd_iommu_check_device(kdev))
> +		cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>   
>   	crat_table->length += sub_type_hdr->length;
>   	crat_table->total_entries++;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 83d6f41..4ac2d61 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -20,7 +20,9 @@
>    * OTHER DEALINGS IN THE SOFTWARE.
>    */
>   
> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
>   #include <linux/amd-iommu.h>
> +#endif
>   #include <linux/bsearch.h>
>   #include <linux/pci.h>
>   #include <linux/slab.h>
> @@ -28,9 +30,11 @@
>   #include "kfd_device_queue_manager.h"
>   #include "kfd_pm4_headers_vi.h"
>   #include "cwsr_trap_handler_gfx8.asm"
> +#include "kfd_iommu.h"
>   
>   #define MQD_SIZE_ALIGNED 768
>   
> +#ifdef KFD_SUPPORT_IOMMU_V2
>   static const struct kfd_device_info kaveri_device_info = {
>   	.asic_family = CHIP_KAVERI,
>   	.max_pasid_bits = 16,
> @@ -41,6 +45,7 @@ static const struct kfd_device_info kaveri_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = false,
> +	.needs_iommu_device = true,
>   	.needs_pci_atomics = false,
>   };
>   
> @@ -54,8 +59,10 @@ static const struct kfd_device_info carrizo_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = true,
>   	.needs_pci_atomics = false,
>   };
> +#endif
>   
>   static const struct kfd_device_info hawaii_device_info = {
>   	.asic_family = CHIP_HAWAII,
> @@ -67,6 +74,7 @@ static const struct kfd_device_info hawaii_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = false,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = false,
>   };
>   
> @@ -79,6 +87,7 @@ static const struct kfd_device_info tonga_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = false,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = true,
>   };
>   
> @@ -91,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = false,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = false,
>   };
>   
> @@ -103,6 +113,7 @@ static const struct kfd_device_info fiji_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = true,
>   };
>   
> @@ -115,6 +126,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = false,
>   };
>   
> @@ -128,6 +140,7 @@ static const struct kfd_device_info polaris10_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = true,
>   };
>   
> @@ -140,6 +153,7 @@ static const struct kfd_device_info polaris10_vf_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = false,
>   };
>   
> @@ -152,6 +166,7 @@ static const struct kfd_device_info polaris11_device_info = {
>   	.num_of_watch_points = 4,
>   	.mqd_size_aligned = MQD_SIZE_ALIGNED,
>   	.supports_cwsr = true,
> +	.needs_iommu_device = false,
>   	.needs_pci_atomics = true,
>   };
>   
> @@ -162,6 +177,7 @@ struct kfd_deviceid {
>   };
>   
>   static const struct kfd_deviceid supported_devices[] = {
> +#ifdef KFD_SUPPORT_IOMMU_V2
>   	{ 0x1304, &kaveri_device_info },	/* Kaveri */
>   	{ 0x1305, &kaveri_device_info },	/* Kaveri */
>   	{ 0x1306, &kaveri_device_info },	/* Kaveri */
> @@ -189,6 +205,7 @@ static const struct kfd_deviceid supported_devices[] = {
>   	{ 0x9875, &carrizo_device_info },	/* Carrizo */
>   	{ 0x9876, &carrizo_device_info },	/* Carrizo */
>   	{ 0x9877, &carrizo_device_info },	/* Carrizo */
> +#endif
>   	{ 0x67A0, &hawaii_device_info },	/* Hawaii */
>   	{ 0x67A1, &hawaii_device_info },	/* Hawaii */
>   	{ 0x67A2, &hawaii_device_info },	/* Hawaii */
> @@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>   	return kfd;
>   }
>   
> -static bool device_iommu_pasid_init(struct kfd_dev *kfd)
> -{
> -	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> -					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> -					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> -
> -	struct amd_iommu_device_info iommu_info;
> -	unsigned int pasid_limit;
> -	int err;
> -
> -	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> -	if (err < 0) {
> -		dev_err(kfd_device,
> -			"error getting iommu info. is the iommu enabled?\n");
> -		return false;
> -	}
> -
> -	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
> -		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
> -		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
> -		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
> -		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
> -									!= 0);
> -		return false;
> -	}
> -
> -	pasid_limit = min_t(unsigned int,
> -			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
> -			iommu_info.max_pasids);
> -
> -	if (!kfd_set_pasid_limit(pasid_limit)) {
> -		dev_err(kfd_device, "error setting pasid limit\n");
> -		return false;
> -	}
> -
> -	return true;
> -}
> -
> -static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> -{
> -	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
> -
> -	if (dev)
> -		kfd_process_iommu_unbind_callback(dev, pasid);
> -}
> -
> -/*
> - * This function called by IOMMU driver on PPR failure
> - */
> -static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
> -		unsigned long address, u16 flags)
> -{
> -	struct kfd_dev *dev;
> -
> -	dev_warn(kfd_device,
> -			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
> -			PCI_BUS_NUM(pdev->devfn),
> -			PCI_SLOT(pdev->devfn),
> -			PCI_FUNC(pdev->devfn),
> -			pasid,
> -			address,
> -			flags);
> -
> -	dev = kfd_device_by_pci_dev(pdev);
> -	if (!WARN_ON(!dev))
> -		kfd_signal_iommu_event(dev, pasid, address,
> -			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
> -
> -	return AMD_IOMMU_INV_PRI_RSP_INVALID;
> -}
> -
>   static void kfd_cwsr_init(struct kfd_dev *kfd)
>   {
>   	if (cwsr_enable && kfd->device_info->supports_cwsr) {
> @@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   		goto device_queue_manager_error;
>   	}
>   
> -	if (!device_iommu_pasid_init(kfd)) {
> -		dev_err(kfd_device,
> -			"Error initializing iommuv2 for device %x:%x\n",
> -			kfd->pdev->vendor, kfd->pdev->device);
> -		goto device_iommu_pasid_error;
> +	if (kfd_iommu_device_init(kfd)) {
> +		dev_err(kfd_device, "Error initializing iommuv2\n");
> +		goto device_iommu_error;
>   	}
>   
>   	kfd_cwsr_init(kfd);
> @@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>   	goto out;
>   
>   kfd_resume_error:
> -device_iommu_pasid_error:
> +device_iommu_error:
>   	device_queue_manager_uninit(kfd->dqm);
>   device_queue_manager_error:
>   	kfd_interrupt_exit(kfd);
> @@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>   
>   	kfd->dqm->ops.stop(kfd->dqm);
>   
> -	kfd_unbind_processes_from_device(kfd);
> -
> -	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> -	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> -	amd_iommu_free_device(kfd->pdev);
> +	kfd_iommu_suspend(kfd);
>   }
>   
>   int kgd2kfd_resume(struct kfd_dev *kfd)
> @@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
>   static int kfd_resume(struct kfd_dev *kfd)
>   {
>   	int err = 0;
> -	unsigned int pasid_limit = kfd_get_pasid_limit();
> -
> -	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> -	if (err)
> -		return -ENXIO;
> -	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> -					iommu_pasid_shutdown_callback);
> -	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
> -				     iommu_invalid_ppr_cb);
>   
> -	err = kfd_bind_processes_to_device(kfd);
> -	if (err)
> -		goto processes_bind_error;
> +	err = kfd_iommu_resume(kfd);
> +	if (err) {
> +		dev_err(kfd_device,
> +			"Failed to resume IOMMU for device %x:%x\n",
> +			kfd->pdev->vendor, kfd->pdev->device);
> +		return err;
> +	}
>   
>   	err = kfd->dqm->ops.start(kfd->dqm);
>   	if (err) {
> @@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
>   	return err;
>   
>   dqm_start_error:
> -processes_bind_error:
> -	amd_iommu_free_device(kfd->pdev);
> -
> +	kfd_iommu_suspend(kfd);
>   	return err;
>   }
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 93aae5c..6fb9c0d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -30,6 +30,7 @@
>   #include <linux/memory.h>
>   #include "kfd_priv.h"
>   #include "kfd_events.h"
> +#include "kfd_iommu.h"
>   #include <linux/device.h>
>   
>   /*
> @@ -837,6 +838,7 @@ static void lookup_events_by_type_and_signal(struct kfd_process *p,
>   	}
>   }
>   
> +#ifdef KFD_SUPPORT_IOMMU_V2
>   void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
>   		unsigned long address, bool is_write_requested,
>   		bool is_execute_requested)
> @@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
>   	mutex_unlock(&p->event_mutex);
>   	kfd_unref_process(p);
>   }
> +#endif /* KFD_SUPPORT_IOMMU_V2 */
>   
>   void kfd_signal_hw_exception_event(unsigned int pasid)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> new file mode 100644
> index 0000000..81dee34
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> @@ -0,0 +1,356 @@
> +/*
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/pci.h>
> +#include <linux/amd-iommu.h>
> +#include "kfd_priv.h"
> +#include "kfd_dbgmgr.h"
> +#include "kfd_topology.h"
> +#include "kfd_iommu.h"
> +
> +static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> +					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> +					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +
> +/** kfd_iommu_check_device - Check whether IOMMU is available for device
> + */
> +int kfd_iommu_check_device(struct kfd_dev *kfd)
> +{
> +	struct amd_iommu_device_info iommu_info;
> +	int err;
> +
> +	if (!kfd->device_info->needs_iommu_device)
> +		return -ENODEV;
> +
> +	iommu_info.flags = 0;
> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +	if (err)
> +		return err;
> +
> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
> +		return -ENODEV;
> +
> +	return 0;
> +}
> +
> +/** kfd_iommu_device_init - Initialize IOMMU for device
> + */
> +int kfd_iommu_device_init(struct kfd_dev *kfd)
> +{
> +	struct amd_iommu_device_info iommu_info;
> +	unsigned int pasid_limit;
> +	int err;
> +
> +	if (!kfd->device_info->needs_iommu_device)
> +		return 0;
> +
> +	iommu_info.flags = 0;
> +	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +	if (err < 0) {
> +		dev_err(kfd_device,
> +			"error getting iommu info. is the iommu enabled?\n");
> +		return -ENODEV;
> +	}
> +
> +	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
> +		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
> +		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
> +									!= 0);
> +		return -ENODEV;
> +	}
> +
> +	pasid_limit = min_t(unsigned int,
> +			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
> +			iommu_info.max_pasids);
> +
> +	if (!kfd_set_pasid_limit(pasid_limit)) {
> +		dev_err(kfd_device, "error setting pasid limit\n");
> +		return -EBUSY;
> +	}
> +
> +	return 0;
> +}
> +
> +/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
> + *
> + * Binds the given process to the given device using its PASID. This
> + * enables IOMMUv2 address translation for the process on the device.
> + *
> + * This function assumes that the process mutex is held.
> + */
> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
> +{
> +	struct kfd_dev *dev = pdd->dev;
> +	struct kfd_process *p = pdd->process;
> +	int err;
> +
> +	if (!dev->device_info->needs_iommu_device || pdd->bound == PDD_BOUND)
> +		return 0;
> +
> +	if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
> +		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
> +		return -EINVAL;
> +	}
> +
> +	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
> +	if (!err)
> +		pdd->bound = PDD_BOUND;
> +
> +	return err;
> +}
> +
> +/** kfd_iommu_unbind_process - Unbind process from all devices
> + *
> + * This removes all IOMMU device bindings of the process. To be used
> + * before process termination.
> + */
> +void kfd_iommu_unbind_process(struct kfd_process *p)
> +{
> +	struct kfd_process_device *pdd;
> +
> +	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> +		if (pdd->bound == PDD_BOUND)
> +			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> +}
> +
> +/* Callback for process shutdown invoked by the IOMMU driver */
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
> +{
> +	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
> +	struct kfd_process *p;
> +	struct kfd_process_device *pdd;
> +
> +	if (!dev)
> +		return;
> +
> +	/*
> +	 * Look for the process that matches the pasid. If there is no such
> +	 * process, we either released it in amdkfd's own notifier, or there
> +	 * is a bug. Unfortunately, there is no way to tell...
> +	 */
> +	p = kfd_lookup_process_by_pasid(pasid);
> +	if (!p)
> +		return;
> +
> +	pr_debug("Unbinding process %d from IOMMU\n", pasid);
> +
> +	mutex_lock(kfd_get_dbgmgr_mutex());
> +
> +	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> +		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> +			kfd_dbgmgr_destroy(dev->dbgmgr);
> +			dev->dbgmgr = NULL;
> +		}
> +	}
> +
> +	mutex_unlock(kfd_get_dbgmgr_mutex());
> +
> +	mutex_lock(&p->mutex);
> +
> +	pdd = kfd_get_process_device_data(dev, p);
> +	if (pdd)
> +		/* For GPU relying on IOMMU, we need to dequeue here
> +		 * when PASID is still bound.
> +		 */
> +		kfd_process_dequeue_from_device(pdd);
> +
> +	mutex_unlock(&p->mutex);
> +
> +	kfd_unref_process(p);
> +}
> +
> +/* This function called by IOMMU driver on PPR failure */
> +static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
> +		unsigned long address, u16 flags)
> +{
> +	struct kfd_dev *dev;
> +
> +	dev_warn(kfd_device,
> +			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
> +			PCI_BUS_NUM(pdev->devfn),
> +			PCI_SLOT(pdev->devfn),
> +			PCI_FUNC(pdev->devfn),
> +			pasid,
> +			address,
> +			flags);
> +
> +	dev = kfd_device_by_pci_dev(pdev);
> +	if (!WARN_ON(!dev))
> +		kfd_signal_iommu_event(dev, pasid, address,
> +			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
> +
> +	return AMD_IOMMU_INV_PRI_RSP_INVALID;
> +}
> +
> +/*
> + * Bind processes do the device that have been temporarily unbound
> + * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
> + */
> +static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
> +{
> +	struct kfd_process_device *pdd;
> +	struct kfd_process *p;
> +	unsigned int temp;
> +	int err = 0;
> +
> +	int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +		mutex_lock(&p->mutex);
> +		pdd = kfd_get_process_device_data(kfd, p);
> +
> +		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
> +			mutex_unlock(&p->mutex);
> +			continue;
> +		}
> +
> +		err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
> +				p->lead_thread);
> +		if (err < 0) {
> +			pr_err("Unexpected pasid %d binding failure\n",
> +					p->pasid);
> +			mutex_unlock(&p->mutex);
> +			break;
> +		}
> +
> +		pdd->bound = PDD_BOUND;
> +		mutex_unlock(&p->mutex);
> +	}
> +
> +	srcu_read_unlock(&kfd_processes_srcu, idx);
> +
> +	return err;
> +}
> +
> +/*
> + * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
> + * processes will be restored to PDD_BOUND state in
> + * kfd_bind_processes_to_device.
> + */
> +static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
> +{
> +	struct kfd_process_device *pdd;
> +	struct kfd_process *p;
> +	unsigned int temp;
> +
> +	int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +		mutex_lock(&p->mutex);
> +		pdd = kfd_get_process_device_data(kfd, p);
> +
> +		if (WARN_ON(!pdd)) {
> +			mutex_unlock(&p->mutex);
> +			continue;
> +		}
> +
> +		if (pdd->bound == PDD_BOUND)
> +			pdd->bound = PDD_BOUND_SUSPENDED;
> +		mutex_unlock(&p->mutex);
> +	}
> +
> +	srcu_read_unlock(&kfd_processes_srcu, idx);
> +}
> +
> +/** kfd_iommu_suspend - Prepare IOMMU for suspend
> + *
> + * This unbinds processes from the device and disables the IOMMU for
> + * the device.
> + */
> +void kfd_iommu_suspend(struct kfd_dev *kfd)
> +{
> +	if (!kfd->device_info->needs_iommu_device)
> +		return;
> +
> +	kfd_unbind_processes_from_device(kfd);
> +
> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> +	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> +	amd_iommu_free_device(kfd->pdev);
> +}
> +
> +/** kfd_iommu_resume - Restore IOMMU after resume
> + *
> + * This reinitializes the IOMMU for the device and re-binds previously
> + * suspended processes to the device.
> + */
> +int kfd_iommu_resume(struct kfd_dev *kfd)
> +{
> +	unsigned int pasid_limit;
> +	int err;
> +
> +	if (!kfd->device_info->needs_iommu_device)
> +		return 0;
> +
> +	pasid_limit = kfd_get_pasid_limit();
> +
> +	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +	if (err)
> +		return -ENXIO;
> +
> +	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> +					iommu_pasid_shutdown_callback);
> +	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
> +				     iommu_invalid_ppr_cb);
> +
> +	err = kfd_bind_processes_to_device(kfd);
> +	if (err) {
> +		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> +		amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> +		amd_iommu_free_device(kfd->pdev);
> +		return err;
> +	}
> +
> +	return 0;
> +}
> +
> +extern bool amd_iommu_pc_supported(void);
> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
> +
> +/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to topology
> + */
> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
> +{
> +	struct kfd_perf_properties *props;
> +
> +	if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
> +		return 0;
> +
> +	if (!amd_iommu_pc_supported())
> +		return 0;
> +
> +	props = kfd_alloc_struct(props);
> +	if (!props)
> +		return -ENOMEM;
> +	strcpy(props->block_name, "iommu");
> +	props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
> +		amd_iommu_pc_get_max_counters(0); /* assume one iommu */
> +	list_add_tail(&props->list, &kdev->perf_props);
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
> new file mode 100644
> index 0000000..dd23d9f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __KFD_IOMMU_H__
> +#define __KFD_IOMMU_H__
> +
> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
> +
> +#define KFD_SUPPORT_IOMMU_V2
> +
> +int kfd_iommu_check_device(struct kfd_dev *kfd);
> +int kfd_iommu_device_init(struct kfd_dev *kfd);
> +
> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
> +void kfd_iommu_unbind_process(struct kfd_process *p);
> +
> +void kfd_iommu_suspend(struct kfd_dev *kfd);
> +int kfd_iommu_resume(struct kfd_dev *kfd);
> +
> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
> +
> +#else
> +
> +static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
> +{
> +	return -ENODEV;
> +}
> +static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
> +{
> +	return 0;
> +}
> +
> +static inline int kfd_iommu_bind_process_to_device(
> +	struct kfd_process_device *pdd)
> +{
> +	return 0;
> +}
> +static inline void kfd_iommu_unbind_process(struct kfd_process *p)
> +{
> +	/* empty */
> +}
> +
> +static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
> +{
> +	/* empty */
> +}
> +static inline int kfd_iommu_resume(struct kfd_dev *kfd)
> +{
> +	return 0;
> +}
> +
> +static inline int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
> +{
> +	return 0;
> +}
> +
> +#endif /* defined(CONFIG_AMD_IOMMU_V2) */
> +
> +#endif /* __KFD_IOMMU_H__ */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 594f853..f12eb5d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -158,6 +158,7 @@ struct kfd_device_info {
>   	uint8_t num_of_watch_points;
>   	uint16_t mqd_size_aligned;
>   	bool supports_cwsr;
> +	bool needs_iommu_device;
>   	bool needs_pci_atomics;
>   };
>   
> @@ -517,15 +518,15 @@ struct kfd_process_device {
>   	uint64_t scratch_base;
>   	uint64_t scratch_limit;
>   
> -	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
> -	enum kfd_pdd_bound bound;
> -
>   	/* Flag used to tell the pdd has dequeued from the dqm.
>   	 * This is used to prevent dev->dqm->ops.process_termination() from
>   	 * being called twice when it is already called in IOMMU callback
>   	 * function.
>   	 */
>   	bool already_dequeued;
> +
> +	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
> +	enum kfd_pdd_bound bound;
>   };
>   
>   #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
> @@ -590,6 +591,10 @@ struct kfd_process {
>   	bool signal_event_limit_reached;
>   };
>   
> +#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> +extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
> +extern struct srcu_struct kfd_processes_srcu;
> +
>   /**
>    * Ioctl function type.
>    *
> @@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
>   
>   struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>   						struct kfd_process *p);
> -int kfd_bind_processes_to_device(struct kfd_dev *dev);
> -void kfd_unbind_processes_from_device(struct kfd_dev *dev);
> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid);
>   struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
>   							struct kfd_process *p);
>   struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 4ff5f0f..e9aee76 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -35,16 +35,16 @@ struct mm_struct;
>   
>   #include "kfd_priv.h"
>   #include "kfd_dbgmgr.h"
> +#include "kfd_iommu.h"
>   
>   /*
>    * List of struct kfd_process (field kfd_process).
>    * Unique/indexed by mm_struct*
>    */
> -#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> -static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
> +DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>   static DEFINE_MUTEX(kfd_processes_mutex);
>   
> -DEFINE_STATIC_SRCU(kfd_processes_srcu);
> +DEFINE_SRCU(kfd_processes_srcu);
>   
>   static struct workqueue_struct *kfd_process_wq;
>   
> @@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct work_struct *work)
>   {
>   	struct kfd_process *p = container_of(work, struct kfd_process,
>   					     release_work);
> -	struct kfd_process_device *pdd;
>   
> -	pr_debug("Releasing process (pasid %d) in workqueue\n", p->pasid);
> -
> -	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> -		if (pdd->bound == PDD_BOUND)
> -			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> -	}
> +	kfd_iommu_unbind_process(p);
>   
>   	kfd_process_destroy_pdds(p);
>   
> @@ -429,133 +423,13 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>   		return ERR_PTR(-ENOMEM);
>   	}
>   
> -	if (pdd->bound == PDD_BOUND) {
> -		return pdd;
> -	} else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
> -		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
> -		return ERR_PTR(-EINVAL);
> -	}
> -
> -	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
> -	if (err < 0)
> +	err = kfd_iommu_bind_process_to_device(pdd);
> +	if (err)
>   		return ERR_PTR(err);
>   
> -	pdd->bound = PDD_BOUND;
> -
>   	return pdd;
>   }
>   
> -/*
> - * Bind processes do the device that have been temporarily unbound
> - * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
> - */
> -int kfd_bind_processes_to_device(struct kfd_dev *dev)
> -{
> -	struct kfd_process_device *pdd;
> -	struct kfd_process *p;
> -	unsigned int temp;
> -	int err = 0;
> -
> -	int idx = srcu_read_lock(&kfd_processes_srcu);
> -
> -	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> -		mutex_lock(&p->mutex);
> -		pdd = kfd_get_process_device_data(dev, p);
> -
> -		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
> -			mutex_unlock(&p->mutex);
> -			continue;
> -		}
> -
> -		err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
> -				p->lead_thread);
> -		if (err < 0) {
> -			pr_err("Unexpected pasid %d binding failure\n",
> -					p->pasid);
> -			mutex_unlock(&p->mutex);
> -			break;
> -		}
> -
> -		pdd->bound = PDD_BOUND;
> -		mutex_unlock(&p->mutex);
> -	}
> -
> -	srcu_read_unlock(&kfd_processes_srcu, idx);
> -
> -	return err;
> -}
> -
> -/*
> - * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
> - * processes will be restored to PDD_BOUND state in
> - * kfd_bind_processes_to_device.
> - */
> -void kfd_unbind_processes_from_device(struct kfd_dev *dev)
> -{
> -	struct kfd_process_device *pdd;
> -	struct kfd_process *p;
> -	unsigned int temp;
> -
> -	int idx = srcu_read_lock(&kfd_processes_srcu);
> -
> -	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> -		mutex_lock(&p->mutex);
> -		pdd = kfd_get_process_device_data(dev, p);
> -
> -		if (WARN_ON(!pdd)) {
> -			mutex_unlock(&p->mutex);
> -			continue;
> -		}
> -
> -		if (pdd->bound == PDD_BOUND)
> -			pdd->bound = PDD_BOUND_SUSPENDED;
> -		mutex_unlock(&p->mutex);
> -	}
> -
> -	srcu_read_unlock(&kfd_processes_srcu, idx);
> -}
> -
> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid)
> -{
> -	struct kfd_process *p;
> -	struct kfd_process_device *pdd;
> -
> -	/*
> -	 * Look for the process that matches the pasid. If there is no such
> -	 * process, we either released it in amdkfd's own notifier, or there
> -	 * is a bug. Unfortunately, there is no way to tell...
> -	 */
> -	p = kfd_lookup_process_by_pasid(pasid);
> -	if (!p)
> -		return;
> -
> -	pr_debug("Unbinding process %d from IOMMU\n", pasid);
> -
> -	mutex_lock(kfd_get_dbgmgr_mutex());
> -
> -	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> -		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> -			kfd_dbgmgr_destroy(dev->dbgmgr);
> -			dev->dbgmgr = NULL;
> -		}
> -	}
> -
> -	mutex_unlock(kfd_get_dbgmgr_mutex());
> -
> -	mutex_lock(&p->mutex);
> -
> -	pdd = kfd_get_process_device_data(dev, p);
> -	if (pdd)
> -		/* For GPU relying on IOMMU, we need to dequeue here
> -		 * when PASID is still bound.
> -		 */
> -		kfd_process_dequeue_from_device(pdd);
> -
> -	mutex_unlock(&p->mutex);
> -
> -	kfd_unref_process(p);
> -}
> -
>   struct kfd_process_device *kfd_get_first_process_device_data(
>   						struct kfd_process *p)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 7783250..2506155 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -35,6 +35,7 @@
>   #include "kfd_crat.h"
>   #include "kfd_topology.h"
>   #include "kfd_device_queue_manager.h"
> +#include "kfd_iommu.h"
>   
>   /* topology_device_list - Master list of all topology devices */
>   static struct list_head topology_device_list;
> @@ -875,19 +876,8 @@ static void find_system_memory(const struct dmi_header *dm,
>    */
>   static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>   {
> -	struct kfd_perf_properties *props;
> -
> -	if (amd_iommu_pc_supported()) {
> -		props = kfd_alloc_struct(props);
> -		if (!props)
> -			return -ENOMEM;
> -		strcpy(props->block_name, "iommu");
> -		props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
> -			amd_iommu_pc_get_max_counters(0); /* assume one iommu */
> -		list_add_tail(&props->list, &kdev->perf_props);
> -	}
> -
> -	return 0;
> +	/* These are the only counters supported so far */
> +	return kfd_iommu_add_perf_counters(kdev);
>   }
>   
>   /* kfd_add_non_crat_information - Add information that is not currently
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index 53fca1f..c0be2be 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -25,7 +25,7 @@
>   
>   #include <linux/types.h>
>   #include <linux/list.h>
> -#include "kfd_priv.h"
> +#include "kfd_crat.h"
>   
>   #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>   
> @@ -183,8 +183,4 @@ struct kfd_topology_device *kfd_create_topology_device(
>   		struct list_head *device_list);
>   void kfd_release_topology_device_list(struct list_head *device_list);
>   
> -extern bool amd_iommu_pc_supported(void);
> -extern u8 amd_iommu_pc_get_max_banks(u16 devid);
> -extern u8 amd_iommu_pc_get_max_counters(u16 devid);
> -
>   #endif /* __KFD_TOPOLOGY_H__ */

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional
       [not found]         ` <281bede7-0ae6-c7d1-3d3a-a3e0497244c1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-07 20:51           ` Felix Kuehling
       [not found]             ` <bfe03de8-63fb-efb4-94e5-5eaf4628bfc1-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-07 20:51 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

[-- Attachment #1: Type: text/plain, Size: 44069 bytes --]

On 2018-02-07 06:20 AM, Christian König wrote:
> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>> dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
>> ASIC information. Also allow building KFD without IOMMUv2 support.
>> This is still useful for dGPUs and prepares for enabling KFD on
>> architectures that don't support AMD IOMMUv2.
>>
>> v2:
>> * Centralize IOMMUv2 code to avoid #ifdefs in too many places
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/Kconfig        |   2 +-
>>   drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356
>> ++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
>>   11 files changed, 493 insertions(+), 265 deletions(-)
>>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig
>> b/drivers/gpu/drm/amd/amdkfd/Kconfig
>> index bc5a294..5bbeb95 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig
>> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
>> @@ -4,6 +4,6 @@
>>     config HSA_AMD
>>       tristate "HSA kernel driver for AMD GPU devices"
>> -    depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
>> +    depends on DRM_AMDGPU && X86_64
>
> You still need a weak dependency on AMD_IOMMU_V2 here, in other words
> add "imply AMD_IOMMU_V2".
>
> This prevents illegal combinations like linking amdkfd into the kernel
> while amd_iommu_v2 is a module.
>
> But it should still allow to completely disable amd_iommu_v2 and
> compile amdkfd without support for it.

Thanks, that's good to know. An updated patch is attached (to avoid
resending the whole series).

Regards,
  Felix

>
> Christian.
>
>>       help
>>         Enable this if you want to use HSA features on AMD GPU devices.
>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile
>> b/drivers/gpu/drm/amd/amdkfd/Makefile
>> index a317e76..0d02422 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>> @@ -37,6 +37,10 @@ amdkfd-y    := kfd_module.o kfd_device.o
>> kfd_chardev.o kfd_topology.o \
>>           kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
>>           kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
>>   +ifneq ($(CONFIG_AMD_IOMMU_V2),)
>> +amdkfd-y += kfd_iommu.o
>> +endif
>> +
>>   amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
>>     obj-$(CONFIG_HSA_AMD)    += amdkfd.o
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> index 2bc2816..7493f47 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>> @@ -22,10 +22,10 @@
>>     #include <linux/pci.h>
>>   #include <linux/acpi.h>
>> -#include <linux/amd-iommu.h>
>>   #include "kfd_crat.h"
>>   #include "kfd_priv.h"
>>   #include "kfd_topology.h"
>> +#include "kfd_iommu.h"
>>     /* GPU Processor ID base for dGPUs for which VCRAT needs to be
>> created.
>>    * GPU processor ID are expressed with Bit[31]=1.
>> @@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void
>> *pcrat_image,
>>       struct crat_subtype_generic *sub_type_hdr;
>>       struct crat_subtype_computeunit *cu;
>>       struct kfd_cu_info cu_info;
>> -    struct amd_iommu_device_info iommu_info;
>>       int avail_size = *size;
>>       uint32_t total_num_of_cu;
>>       int num_of_cache_entries = 0;
>>       int cache_mem_filled = 0;
>>       int ret = 0;
>> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>> -                     AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>> -                     AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>>       struct kfd_local_mem_info local_mem_info;
>>         if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
>> @@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void
>> *pcrat_image,
>>       /* Check if this node supports IOMMU. During parsing this flag
>> will
>>        * translate to HSA_CAP_ATS_PRESENT
>>        */
>> -    iommu_info.flags = 0;
>> -    if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
>> -        if ((iommu_info.flags & required_iommu_flags) ==
>> -                required_iommu_flags)
>> -            cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>> -    }
>> +    if (!kfd_iommu_check_device(kdev))
>> +        cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>>         crat_table->length += sub_type_hdr->length;
>>       crat_table->total_entries++;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> index 83d6f41..4ac2d61 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>> @@ -20,7 +20,9 @@
>>    * OTHER DEALINGS IN THE SOFTWARE.
>>    */
>>   +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) ||
>> defined(CONFIG_AMD_IOMMU_V2)
>>   #include <linux/amd-iommu.h>
>> +#endif
>>   #include <linux/bsearch.h>
>>   #include <linux/pci.h>
>>   #include <linux/slab.h>
>> @@ -28,9 +30,11 @@
>>   #include "kfd_device_queue_manager.h"
>>   #include "kfd_pm4_headers_vi.h"
>>   #include "cwsr_trap_handler_gfx8.asm"
>> +#include "kfd_iommu.h"
>>     #define MQD_SIZE_ALIGNED 768
>>   +#ifdef KFD_SUPPORT_IOMMU_V2
>>   static const struct kfd_device_info kaveri_device_info = {
>>       .asic_family = CHIP_KAVERI,
>>       .max_pasid_bits = 16,
>> @@ -41,6 +45,7 @@ static const struct kfd_device_info
>> kaveri_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = false,
>> +    .needs_iommu_device = true,
>>       .needs_pci_atomics = false,
>>   };
>>   @@ -54,8 +59,10 @@ static const struct kfd_device_info
>> carrizo_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = true,
>>       .needs_pci_atomics = false,
>>   };
>> +#endif
>>     static const struct kfd_device_info hawaii_device_info = {
>>       .asic_family = CHIP_HAWAII,
>> @@ -67,6 +74,7 @@ static const struct kfd_device_info
>> hawaii_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = false,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = false,
>>   };
>>   @@ -79,6 +87,7 @@ static const struct kfd_device_info
>> tonga_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = false,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = true,
>>   };
>>   @@ -91,6 +100,7 @@ static const struct kfd_device_info
>> tonga_vf_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = false,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = false,
>>   };
>>   @@ -103,6 +113,7 @@ static const struct kfd_device_info
>> fiji_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = true,
>>   };
>>   @@ -115,6 +126,7 @@ static const struct kfd_device_info
>> fiji_vf_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = false,
>>   };
>>   @@ -128,6 +140,7 @@ static const struct kfd_device_info
>> polaris10_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = true,
>>   };
>>   @@ -140,6 +153,7 @@ static const struct kfd_device_info
>> polaris10_vf_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = false,
>>   };
>>   @@ -152,6 +166,7 @@ static const struct kfd_device_info
>> polaris11_device_info = {
>>       .num_of_watch_points = 4,
>>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>       .supports_cwsr = true,
>> +    .needs_iommu_device = false,
>>       .needs_pci_atomics = true,
>>   };
>>   @@ -162,6 +177,7 @@ struct kfd_deviceid {
>>   };
>>     static const struct kfd_deviceid supported_devices[] = {
>> +#ifdef KFD_SUPPORT_IOMMU_V2
>>       { 0x1304, &kaveri_device_info },    /* Kaveri */
>>       { 0x1305, &kaveri_device_info },    /* Kaveri */
>>       { 0x1306, &kaveri_device_info },    /* Kaveri */
>> @@ -189,6 +205,7 @@ static const struct kfd_deviceid
>> supported_devices[] = {
>>       { 0x9875, &carrizo_device_info },    /* Carrizo */
>>       { 0x9876, &carrizo_device_info },    /* Carrizo */
>>       { 0x9877, &carrizo_device_info },    /* Carrizo */
>> +#endif
>>       { 0x67A0, &hawaii_device_info },    /* Hawaii */
>>       { 0x67A1, &hawaii_device_info },    /* Hawaii */
>>       { 0x67A2, &hawaii_device_info },    /* Hawaii */
>> @@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>>       return kfd;
>>   }
>>   -static bool device_iommu_pasid_init(struct kfd_dev *kfd)
>> -{
>> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>> -                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>> -                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>> -
>> -    struct amd_iommu_device_info iommu_info;
>> -    unsigned int pasid_limit;
>> -    int err;
>> -
>> -    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>> -    if (err < 0) {
>> -        dev_err(kfd_device,
>> -            "error getting iommu info. is the iommu enabled?\n");
>> -        return false;
>> -    }
>> -
>> -    if ((iommu_info.flags & required_iommu_flags) !=
>> required_iommu_flags) {
>> -        dev_err(kfd_device, "error required iommu flags ats %i, pri
>> %i, pasid %i\n",
>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
>> -                                    != 0);
>> -        return false;
>> -    }
>> -
>> -    pasid_limit = min_t(unsigned int,
>> -            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>> -            iommu_info.max_pasids);
>> -
>> -    if (!kfd_set_pasid_limit(pasid_limit)) {
>> -        dev_err(kfd_device, "error setting pasid limit\n");
>> -        return false;
>> -    }
>> -
>> -    return true;
>> -}
>> -
>> -static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
>> pasid)
>> -{
>> -    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>> -
>> -    if (dev)
>> -        kfd_process_iommu_unbind_callback(dev, pasid);
>> -}
>> -
>> -/*
>> - * This function called by IOMMU driver on PPR failure
>> - */
>> -static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
>> -        unsigned long address, u16 flags)
>> -{
>> -    struct kfd_dev *dev;
>> -
>> -    dev_warn(kfd_device,
>> -            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
>> flags 0x%X",
>> -            PCI_BUS_NUM(pdev->devfn),
>> -            PCI_SLOT(pdev->devfn),
>> -            PCI_FUNC(pdev->devfn),
>> -            pasid,
>> -            address,
>> -            flags);
>> -
>> -    dev = kfd_device_by_pci_dev(pdev);
>> -    if (!WARN_ON(!dev))
>> -        kfd_signal_iommu_event(dev, pasid, address,
>> -            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
>> -
>> -    return AMD_IOMMU_INV_PRI_RSP_INVALID;
>> -}
>> -
>>   static void kfd_cwsr_init(struct kfd_dev *kfd)
>>   {
>>       if (cwsr_enable && kfd->device_info->supports_cwsr) {
>> @@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>           goto device_queue_manager_error;
>>       }
>>   -    if (!device_iommu_pasid_init(kfd)) {
>> -        dev_err(kfd_device,
>> -            "Error initializing iommuv2 for device %x:%x\n",
>> -            kfd->pdev->vendor, kfd->pdev->device);
>> -        goto device_iommu_pasid_error;
>> +    if (kfd_iommu_device_init(kfd)) {
>> +        dev_err(kfd_device, "Error initializing iommuv2\n");
>> +        goto device_iommu_error;
>>       }
>>         kfd_cwsr_init(kfd);
>> @@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>       goto out;
>>     kfd_resume_error:
>> -device_iommu_pasid_error:
>> +device_iommu_error:
>>       device_queue_manager_uninit(kfd->dqm);
>>   device_queue_manager_error:
>>       kfd_interrupt_exit(kfd);
>> @@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>>         kfd->dqm->ops.stop(kfd->dqm);
>>   -    kfd_unbind_processes_from_device(kfd);
>> -
>> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>> -    amd_iommu_free_device(kfd->pdev);
>> +    kfd_iommu_suspend(kfd);
>>   }
>>     int kgd2kfd_resume(struct kfd_dev *kfd)
>> @@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
>>   static int kfd_resume(struct kfd_dev *kfd)
>>   {
>>       int err = 0;
>> -    unsigned int pasid_limit = kfd_get_pasid_limit();
>> -
>> -    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>> -    if (err)
>> -        return -ENXIO;
>> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
>> -                    iommu_pasid_shutdown_callback);
>> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
>> -                     iommu_invalid_ppr_cb);
>>   -    err = kfd_bind_processes_to_device(kfd);
>> -    if (err)
>> -        goto processes_bind_error;
>> +    err = kfd_iommu_resume(kfd);
>> +    if (err) {
>> +        dev_err(kfd_device,
>> +            "Failed to resume IOMMU for device %x:%x\n",
>> +            kfd->pdev->vendor, kfd->pdev->device);
>> +        return err;
>> +    }
>>         err = kfd->dqm->ops.start(kfd->dqm);
>>       if (err) {
>> @@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
>>       return err;
>>     dqm_start_error:
>> -processes_bind_error:
>> -    amd_iommu_free_device(kfd->pdev);
>> -
>> +    kfd_iommu_suspend(kfd);
>>       return err;
>>   }
>>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> index 93aae5c..6fb9c0d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>> @@ -30,6 +30,7 @@
>>   #include <linux/memory.h>
>>   #include "kfd_priv.h"
>>   #include "kfd_events.h"
>> +#include "kfd_iommu.h"
>>   #include <linux/device.h>
>>     /*
>> @@ -837,6 +838,7 @@ static void
>> lookup_events_by_type_and_signal(struct kfd_process *p,
>>       }
>>   }
>>   +#ifdef KFD_SUPPORT_IOMMU_V2
>>   void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
>>           unsigned long address, bool is_write_requested,
>>           bool is_execute_requested)
>> @@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
>> unsigned int pasid,
>>       mutex_unlock(&p->event_mutex);
>>       kfd_unref_process(p);
>>   }
>> +#endif /* KFD_SUPPORT_IOMMU_V2 */
>>     void kfd_signal_hw_exception_event(unsigned int pasid)
>>   {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>> new file mode 100644
>> index 0000000..81dee34
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>> @@ -0,0 +1,356 @@
>> +/*
>> + * Copyright 2018 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#include <linux/printk.h>
>> +#include <linux/device.h>
>> +#include <linux/slab.h>
>> +#include <linux/pci.h>
>> +#include <linux/amd-iommu.h>
>> +#include "kfd_priv.h"
>> +#include "kfd_dbgmgr.h"
>> +#include "kfd_topology.h"
>> +#include "kfd_iommu.h"
>> +
>> +static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>> +                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>> +                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>> +
>> +/** kfd_iommu_check_device - Check whether IOMMU is available for
>> device
>> + */
>> +int kfd_iommu_check_device(struct kfd_dev *kfd)
>> +{
>> +    struct amd_iommu_device_info iommu_info;
>> +    int err;
>> +
>> +    if (!kfd->device_info->needs_iommu_device)
>> +        return -ENODEV;
>> +
>> +    iommu_info.flags = 0;
>> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>> +    if (err)
>> +        return err;
>> +
>> +    if ((iommu_info.flags & required_iommu_flags) !=
>> required_iommu_flags)
>> +        return -ENODEV;
>> +
>> +    return 0;
>> +}
>> +
>> +/** kfd_iommu_device_init - Initialize IOMMU for device
>> + */
>> +int kfd_iommu_device_init(struct kfd_dev *kfd)
>> +{
>> +    struct amd_iommu_device_info iommu_info;
>> +    unsigned int pasid_limit;
>> +    int err;
>> +
>> +    if (!kfd->device_info->needs_iommu_device)
>> +        return 0;
>> +
>> +    iommu_info.flags = 0;
>> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>> +    if (err < 0) {
>> +        dev_err(kfd_device,
>> +            "error getting iommu info. is the iommu enabled?\n");
>> +        return -ENODEV;
>> +    }
>> +
>> +    if ((iommu_info.flags & required_iommu_flags) !=
>> required_iommu_flags) {
>> +        dev_err(kfd_device, "error required iommu flags ats %i, pri
>> %i, pasid %i\n",
>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
>> +                                    != 0);
>> +        return -ENODEV;
>> +    }
>> +
>> +    pasid_limit = min_t(unsigned int,
>> +            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>> +            iommu_info.max_pasids);
>> +
>> +    if (!kfd_set_pasid_limit(pasid_limit)) {
>> +        dev_err(kfd_device, "error setting pasid limit\n");
>> +        return -EBUSY;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
>> + *
>> + * Binds the given process to the given device using its PASID. This
>> + * enables IOMMUv2 address translation for the process on the device.
>> + *
>> + * This function assumes that the process mutex is held.
>> + */
>> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
>> +{
>> +    struct kfd_dev *dev = pdd->dev;
>> +    struct kfd_process *p = pdd->process;
>> +    int err;
>> +
>> +    if (!dev->device_info->needs_iommu_device || pdd->bound ==
>> PDD_BOUND)
>> +        return 0;
>> +
>> +    if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
>> +        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
>> +    if (!err)
>> +        pdd->bound = PDD_BOUND;
>> +
>> +    return err;
>> +}
>> +
>> +/** kfd_iommu_unbind_process - Unbind process from all devices
>> + *
>> + * This removes all IOMMU device bindings of the process. To be used
>> + * before process termination.
>> + */
>> +void kfd_iommu_unbind_process(struct kfd_process *p)
>> +{
>> +    struct kfd_process_device *pdd;
>> +
>> +    list_for_each_entry(pdd, &p->per_device_data, per_device_list)
>> +        if (pdd->bound == PDD_BOUND)
>> +            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>> +}
>> +
>> +/* Callback for process shutdown invoked by the IOMMU driver */
>> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
>> pasid)
>> +{
>> +    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>> +    struct kfd_process *p;
>> +    struct kfd_process_device *pdd;
>> +
>> +    if (!dev)
>> +        return;
>> +
>> +    /*
>> +     * Look for the process that matches the pasid. If there is no such
>> +     * process, we either released it in amdkfd's own notifier, or
>> there
>> +     * is a bug. Unfortunately, there is no way to tell...
>> +     */
>> +    p = kfd_lookup_process_by_pasid(pasid);
>> +    if (!p)
>> +        return;
>> +
>> +    pr_debug("Unbinding process %d from IOMMU\n", pasid);
>> +
>> +    mutex_lock(kfd_get_dbgmgr_mutex());
>> +
>> +    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
>> +        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
>> +            kfd_dbgmgr_destroy(dev->dbgmgr);
>> +            dev->dbgmgr = NULL;
>> +        }
>> +    }
>> +
>> +    mutex_unlock(kfd_get_dbgmgr_mutex());
>> +
>> +    mutex_lock(&p->mutex);
>> +
>> +    pdd = kfd_get_process_device_data(dev, p);
>> +    if (pdd)
>> +        /* For GPU relying on IOMMU, we need to dequeue here
>> +         * when PASID is still bound.
>> +         */
>> +        kfd_process_dequeue_from_device(pdd);
>> +
>> +    mutex_unlock(&p->mutex);
>> +
>> +    kfd_unref_process(p);
>> +}
>> +
>> +/* This function called by IOMMU driver on PPR failure */
>> +static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
>> +        unsigned long address, u16 flags)
>> +{
>> +    struct kfd_dev *dev;
>> +
>> +    dev_warn(kfd_device,
>> +            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
>> flags 0x%X",
>> +            PCI_BUS_NUM(pdev->devfn),
>> +            PCI_SLOT(pdev->devfn),
>> +            PCI_FUNC(pdev->devfn),
>> +            pasid,
>> +            address,
>> +            flags);
>> +
>> +    dev = kfd_device_by_pci_dev(pdev);
>> +    if (!WARN_ON(!dev))
>> +        kfd_signal_iommu_event(dev, pasid, address,
>> +            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
>> +
>> +    return AMD_IOMMU_INV_PRI_RSP_INVALID;
>> +}
>> +
>> +/*
>> + * Bind processes do the device that have been temporarily unbound
>> + * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
>> + */
>> +static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
>> +{
>> +    struct kfd_process_device *pdd;
>> +    struct kfd_process *p;
>> +    unsigned int temp;
>> +    int err = 0;
>> +
>> +    int idx = srcu_read_lock(&kfd_processes_srcu);
>> +
>> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>> +        mutex_lock(&p->mutex);
>> +        pdd = kfd_get_process_device_data(kfd, p);
>> +
>> +        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
>> +            mutex_unlock(&p->mutex);
>> +            continue;
>> +        }
>> +
>> +        err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
>> +                p->lead_thread);
>> +        if (err < 0) {
>> +            pr_err("Unexpected pasid %d binding failure\n",
>> +                    p->pasid);
>> +            mutex_unlock(&p->mutex);
>> +            break;
>> +        }
>> +
>> +        pdd->bound = PDD_BOUND;
>> +        mutex_unlock(&p->mutex);
>> +    }
>> +
>> +    srcu_read_unlock(&kfd_processes_srcu, idx);
>> +
>> +    return err;
>> +}
>> +
>> +/*
>> + * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
>> + * processes will be restored to PDD_BOUND state in
>> + * kfd_bind_processes_to_device.
>> + */
>> +static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
>> +{
>> +    struct kfd_process_device *pdd;
>> +    struct kfd_process *p;
>> +    unsigned int temp;
>> +
>> +    int idx = srcu_read_lock(&kfd_processes_srcu);
>> +
>> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>> +        mutex_lock(&p->mutex);
>> +        pdd = kfd_get_process_device_data(kfd, p);
>> +
>> +        if (WARN_ON(!pdd)) {
>> +            mutex_unlock(&p->mutex);
>> +            continue;
>> +        }
>> +
>> +        if (pdd->bound == PDD_BOUND)
>> +            pdd->bound = PDD_BOUND_SUSPENDED;
>> +        mutex_unlock(&p->mutex);
>> +    }
>> +
>> +    srcu_read_unlock(&kfd_processes_srcu, idx);
>> +}
>> +
>> +/** kfd_iommu_suspend - Prepare IOMMU for suspend
>> + *
>> + * This unbinds processes from the device and disables the IOMMU for
>> + * the device.
>> + */
>> +void kfd_iommu_suspend(struct kfd_dev *kfd)
>> +{
>> +    if (!kfd->device_info->needs_iommu_device)
>> +        return;
>> +
>> +    kfd_unbind_processes_from_device(kfd);
>> +
>> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>> +    amd_iommu_free_device(kfd->pdev);
>> +}
>> +
>> +/** kfd_iommu_resume - Restore IOMMU after resume
>> + *
>> + * This reinitializes the IOMMU for the device and re-binds previously
>> + * suspended processes to the device.
>> + */
>> +int kfd_iommu_resume(struct kfd_dev *kfd)
>> +{
>> +    unsigned int pasid_limit;
>> +    int err;
>> +
>> +    if (!kfd->device_info->needs_iommu_device)
>> +        return 0;
>> +
>> +    pasid_limit = kfd_get_pasid_limit();
>> +
>> +    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>> +    if (err)
>> +        return -ENXIO;
>> +
>> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
>> +                    iommu_pasid_shutdown_callback);
>> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
>> +                     iommu_invalid_ppr_cb);
>> +
>> +    err = kfd_bind_processes_to_device(kfd);
>> +    if (err) {
>> +        amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>> +        amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>> +        amd_iommu_free_device(kfd->pdev);
>> +        return err;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +extern bool amd_iommu_pc_supported(void);
>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>> +
>> +/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to
>> topology
>> + */
>> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
>> +{
>> +    struct kfd_perf_properties *props;
>> +
>> +    if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
>> +        return 0;
>> +
>> +    if (!amd_iommu_pc_supported())
>> +        return 0;
>> +
>> +    props = kfd_alloc_struct(props);
>> +    if (!props)
>> +        return -ENOMEM;
>> +    strcpy(props->block_name, "iommu");
>> +    props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>> +        amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>> +    list_add_tail(&props->list, &kdev->perf_props);
>> +
>> +    return 0;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>> new file mode 100644
>> index 0000000..dd23d9f
>> --- /dev/null
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>> @@ -0,0 +1,78 @@
>> +/*
>> + * Copyright 2018 Advanced Micro Devices, Inc.
>> + *
>> + * Permission is hereby granted, free of charge, to any person
>> obtaining a
>> + * copy of this software and associated documentation files (the
>> "Software"),
>> + * to deal in the Software without restriction, including without
>> limitation
>> + * the rights to use, copy, modify, merge, publish, distribute,
>> sublicense,
>> + * and/or sell copies of the Software, and to permit persons to whom
>> the
>> + * Software is furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be
>> included in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>> EVENT SHALL
>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>> DAMAGES OR
>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>> OTHERWISE,
>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>> USE OR
>> + * OTHER DEALINGS IN THE SOFTWARE.
>> + */
>> +
>> +#ifndef __KFD_IOMMU_H__
>> +#define __KFD_IOMMU_H__
>> +
>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
>> +
>> +#define KFD_SUPPORT_IOMMU_V2
>> +
>> +int kfd_iommu_check_device(struct kfd_dev *kfd);
>> +int kfd_iommu_device_init(struct kfd_dev *kfd);
>> +
>> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
>> +void kfd_iommu_unbind_process(struct kfd_process *p);
>> +
>> +void kfd_iommu_suspend(struct kfd_dev *kfd);
>> +int kfd_iommu_resume(struct kfd_dev *kfd);
>> +
>> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
>> +
>> +#else
>> +
>> +static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
>> +{
>> +    return -ENODEV;
>> +}
>> +static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
>> +{
>> +    return 0;
>> +}
>> +
>> +static inline int kfd_iommu_bind_process_to_device(
>> +    struct kfd_process_device *pdd)
>> +{
>> +    return 0;
>> +}
>> +static inline void kfd_iommu_unbind_process(struct kfd_process *p)
>> +{
>> +    /* empty */
>> +}
>> +
>> +static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
>> +{
>> +    /* empty */
>> +}
>> +static inline int kfd_iommu_resume(struct kfd_dev *kfd)
>> +{
>> +    return 0;
>> +}
>> +
>> +static inline int kfd_iommu_add_perf_counters(struct
>> kfd_topology_device *kdev)
>> +{
>> +    return 0;
>> +}
>> +
>> +#endif /* defined(CONFIG_AMD_IOMMU_V2) */
>> +
>> +#endif /* __KFD_IOMMU_H__ */
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> index 594f853..f12eb5d 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>> @@ -158,6 +158,7 @@ struct kfd_device_info {
>>       uint8_t num_of_watch_points;
>>       uint16_t mqd_size_aligned;
>>       bool supports_cwsr;
>> +    bool needs_iommu_device;
>>       bool needs_pci_atomics;
>>   };
>>   @@ -517,15 +518,15 @@ struct kfd_process_device {
>>       uint64_t scratch_base;
>>       uint64_t scratch_limit;
>>   -    /* Is this process/pasid bound to this device?
>> (amd_iommu_bind_pasid) */
>> -    enum kfd_pdd_bound bound;
>> -
>>       /* Flag used to tell the pdd has dequeued from the dqm.
>>        * This is used to prevent dev->dqm->ops.process_termination()
>> from
>>        * being called twice when it is already called in IOMMU callback
>>        * function.
>>        */
>>       bool already_dequeued;
>> +
>> +    /* Is this process/pasid bound to this device?
>> (amd_iommu_bind_pasid) */
>> +    enum kfd_pdd_bound bound;
>>   };
>>     #define qpd_to_pdd(x) container_of(x, struct kfd_process_device,
>> qpd)
>> @@ -590,6 +591,10 @@ struct kfd_process {
>>       bool signal_event_limit_reached;
>>   };
>>   +#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
>> +extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>> +extern struct srcu_struct kfd_processes_srcu;
>> +
>>   /**
>>    * Ioctl function type.
>>    *
>> @@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
>>     struct kfd_process_device *kfd_bind_process_to_device(struct
>> kfd_dev *dev,
>>                           struct kfd_process *p);
>> -int kfd_bind_processes_to_device(struct kfd_dev *dev);
>> -void kfd_unbind_processes_from_device(struct kfd_dev *dev);
>> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
>> int pasid);
>>   struct kfd_process_device *kfd_get_process_device_data(struct
>> kfd_dev *dev,
>>                               struct kfd_process *p);
>>   struct kfd_process_device *kfd_create_process_device_data(struct
>> kfd_dev *dev,
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> index 4ff5f0f..e9aee76 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>> @@ -35,16 +35,16 @@ struct mm_struct;
>>     #include "kfd_priv.h"
>>   #include "kfd_dbgmgr.h"
>> +#include "kfd_iommu.h"
>>     /*
>>    * List of struct kfd_process (field kfd_process).
>>    * Unique/indexed by mm_struct*
>>    */
>> -#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
>> -static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>> +DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>>   static DEFINE_MUTEX(kfd_processes_mutex);
>>   -DEFINE_STATIC_SRCU(kfd_processes_srcu);
>> +DEFINE_SRCU(kfd_processes_srcu);
>>     static struct workqueue_struct *kfd_process_wq;
>>   @@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct
>> work_struct *work)
>>   {
>>       struct kfd_process *p = container_of(work, struct kfd_process,
>>                            release_work);
>> -    struct kfd_process_device *pdd;
>>   -    pr_debug("Releasing process (pasid %d) in workqueue\n",
>> p->pasid);
>> -
>> -    list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
>> -        if (pdd->bound == PDD_BOUND)
>> -            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>> -    }
>> +    kfd_iommu_unbind_process(p);
>>         kfd_process_destroy_pdds(p);
>>   @@ -429,133 +423,13 @@ struct kfd_process_device
>> *kfd_bind_process_to_device(struct kfd_dev *dev,
>>           return ERR_PTR(-ENOMEM);
>>       }
>>   -    if (pdd->bound == PDD_BOUND) {
>> -        return pdd;
>> -    } else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
>> -        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
>> -        return ERR_PTR(-EINVAL);
>> -    }
>> -
>> -    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
>> -    if (err < 0)
>> +    err = kfd_iommu_bind_process_to_device(pdd);
>> +    if (err)
>>           return ERR_PTR(err);
>>   -    pdd->bound = PDD_BOUND;
>> -
>>       return pdd;
>>   }
>>   -/*
>> - * Bind processes do the device that have been temporarily unbound
>> - * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
>> - */
>> -int kfd_bind_processes_to_device(struct kfd_dev *dev)
>> -{
>> -    struct kfd_process_device *pdd;
>> -    struct kfd_process *p;
>> -    unsigned int temp;
>> -    int err = 0;
>> -
>> -    int idx = srcu_read_lock(&kfd_processes_srcu);
>> -
>> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>> -        mutex_lock(&p->mutex);
>> -        pdd = kfd_get_process_device_data(dev, p);
>> -
>> -        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
>> -            mutex_unlock(&p->mutex);
>> -            continue;
>> -        }
>> -
>> -        err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
>> -                p->lead_thread);
>> -        if (err < 0) {
>> -            pr_err("Unexpected pasid %d binding failure\n",
>> -                    p->pasid);
>> -            mutex_unlock(&p->mutex);
>> -            break;
>> -        }
>> -
>> -        pdd->bound = PDD_BOUND;
>> -        mutex_unlock(&p->mutex);
>> -    }
>> -
>> -    srcu_read_unlock(&kfd_processes_srcu, idx);
>> -
>> -    return err;
>> -}
>> -
>> -/*
>> - * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
>> - * processes will be restored to PDD_BOUND state in
>> - * kfd_bind_processes_to_device.
>> - */
>> -void kfd_unbind_processes_from_device(struct kfd_dev *dev)
>> -{
>> -    struct kfd_process_device *pdd;
>> -    struct kfd_process *p;
>> -    unsigned int temp;
>> -
>> -    int idx = srcu_read_lock(&kfd_processes_srcu);
>> -
>> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>> -        mutex_lock(&p->mutex);
>> -        pdd = kfd_get_process_device_data(dev, p);
>> -
>> -        if (WARN_ON(!pdd)) {
>> -            mutex_unlock(&p->mutex);
>> -            continue;
>> -        }
>> -
>> -        if (pdd->bound == PDD_BOUND)
>> -            pdd->bound = PDD_BOUND_SUSPENDED;
>> -        mutex_unlock(&p->mutex);
>> -    }
>> -
>> -    srcu_read_unlock(&kfd_processes_srcu, idx);
>> -}
>> -
>> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
>> int pasid)
>> -{
>> -    struct kfd_process *p;
>> -    struct kfd_process_device *pdd;
>> -
>> -    /*
>> -     * Look for the process that matches the pasid. If there is no such
>> -     * process, we either released it in amdkfd's own notifier, or
>> there
>> -     * is a bug. Unfortunately, there is no way to tell...
>> -     */
>> -    p = kfd_lookup_process_by_pasid(pasid);
>> -    if (!p)
>> -        return;
>> -
>> -    pr_debug("Unbinding process %d from IOMMU\n", pasid);
>> -
>> -    mutex_lock(kfd_get_dbgmgr_mutex());
>> -
>> -    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
>> -        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
>> -            kfd_dbgmgr_destroy(dev->dbgmgr);
>> -            dev->dbgmgr = NULL;
>> -        }
>> -    }
>> -
>> -    mutex_unlock(kfd_get_dbgmgr_mutex());
>> -
>> -    mutex_lock(&p->mutex);
>> -
>> -    pdd = kfd_get_process_device_data(dev, p);
>> -    if (pdd)
>> -        /* For GPU relying on IOMMU, we need to dequeue here
>> -         * when PASID is still bound.
>> -         */
>> -        kfd_process_dequeue_from_device(pdd);
>> -
>> -    mutex_unlock(&p->mutex);
>> -
>> -    kfd_unref_process(p);
>> -}
>> -
>>   struct kfd_process_device *kfd_get_first_process_device_data(
>>                           struct kfd_process *p)
>>   {
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 7783250..2506155 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -35,6 +35,7 @@
>>   #include "kfd_crat.h"
>>   #include "kfd_topology.h"
>>   #include "kfd_device_queue_manager.h"
>> +#include "kfd_iommu.h"
>>     /* topology_device_list - Master list of all topology devices */
>>   static struct list_head topology_device_list;
>> @@ -875,19 +876,8 @@ static void find_system_memory(const struct
>> dmi_header *dm,
>>    */
>>   static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>   {
>> -    struct kfd_perf_properties *props;
>> -
>> -    if (amd_iommu_pc_supported()) {
>> -        props = kfd_alloc_struct(props);
>> -        if (!props)
>> -            return -ENOMEM;
>> -        strcpy(props->block_name, "iommu");
>> -        props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>> -            amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>> -        list_add_tail(&props->list, &kdev->perf_props);
>> -    }
>> -
>> -    return 0;
>> +    /* These are the only counters supported so far */
>> +    return kfd_iommu_add_perf_counters(kdev);
>>   }
>>     /* kfd_add_non_crat_information - Add information that is not
>> currently
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index 53fca1f..c0be2be 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -25,7 +25,7 @@
>>     #include <linux/types.h>
>>   #include <linux/list.h>
>> -#include "kfd_priv.h"
>> +#include "kfd_crat.h"
>>     #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>>   @@ -183,8 +183,4 @@ struct kfd_topology_device
>> *kfd_create_topology_device(
>>           struct list_head *device_list);
>>   void kfd_release_topology_device_list(struct list_head *device_list);
>>   -extern bool amd_iommu_pc_supported(void);
>> -extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>> -extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>> -
>>   #endif /* __KFD_TOPOLOGY_H__ */
>


[-- Attachment #2: 0011-drm-amdkfd-Centralize-IOMMUv2-code-and-make-it-condi.patch --]
[-- Type: text/x-patch, Size: 34288 bytes --]

>From a4a3e51819fe3d587850784ee8c436f9101f76df Mon Sep 17 00:00:00 2001
From: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
Date: Fri, 8 Dec 2017 19:22:12 -0500
Subject: [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it
 conditional

dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
ASIC information. Also allow building KFD without IOMMUv2 support.
This is still useful for dGPUs and prepares for enabling KFD on
architectures that don't support AMD IOMMUv2.

v2:
* Centralize IOMMUv2 code to avoid #ifdefs in too many places

v3:
* Imply AMD_IOMMU_V2 in Kconfig

Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
---
 drivers/gpu/drm/amd/amdkfd/Kconfig        |   3 +-
 drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
 drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356 ++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
 11 files changed, 494 insertions(+), 265 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig b/drivers/gpu/drm/amd/amdkfd/Kconfig
index bc5a294..ed2f06c 100644
--- a/drivers/gpu/drm/amd/amdkfd/Kconfig
+++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
@@ -4,6 +4,7 @@
 
 config HSA_AMD
 	tristate "HSA kernel driver for AMD GPU devices"
-	depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
+	depends on DRM_AMDGPU && X86_64
+	imply AMD_IOMMU_V2
 	help
 	  Enable this if you want to use HSA features on AMD GPU devices.
diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile b/drivers/gpu/drm/amd/amdkfd/Makefile
index a317e76..0d02422 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -37,6 +37,10 @@ amdkfd-y	:= kfd_module.o kfd_device.o kfd_chardev.o kfd_topology.o \
 		kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
 		kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
 
+ifneq ($(CONFIG_AMD_IOMMU_V2),)
+amdkfd-y += kfd_iommu.o
+endif
+
 amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
 
 obj-$(CONFIG_HSA_AMD)	+= amdkfd.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 2bc2816..7493f47 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -22,10 +22,10 @@
 
 #include <linux/pci.h>
 #include <linux/acpi.h>
-#include <linux/amd-iommu.h>
 #include "kfd_crat.h"
 #include "kfd_priv.h"
 #include "kfd_topology.h"
+#include "kfd_iommu.h"
 
 /* GPU Processor ID base for dGPUs for which VCRAT needs to be created.
  * GPU processor ID are expressed with Bit[31]=1.
@@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	struct crat_subtype_generic *sub_type_hdr;
 	struct crat_subtype_computeunit *cu;
 	struct kfd_cu_info cu_info;
-	struct amd_iommu_device_info iommu_info;
 	int avail_size = *size;
 	uint32_t total_num_of_cu;
 	int num_of_cache_entries = 0;
 	int cache_mem_filled = 0;
 	int ret = 0;
-	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
-					 AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
-					 AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
 	struct kfd_local_mem_info local_mem_info;
 
 	if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
@@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	/* Check if this node supports IOMMU. During parsing this flag will
 	 * translate to HSA_CAP_ATS_PRESENT
 	 */
-	iommu_info.flags = 0;
-	if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
-		if ((iommu_info.flags & required_iommu_flags) ==
-				required_iommu_flags)
-			cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
-	}
+	if (!kfd_iommu_check_device(kdev))
+		cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
 
 	crat_table->length += sub_type_hdr->length;
 	crat_table->total_entries++;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 83d6f41..4ac2d61 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -20,7 +20,9 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
+#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
 #include <linux/amd-iommu.h>
+#endif
 #include <linux/bsearch.h>
 #include <linux/pci.h>
 #include <linux/slab.h>
@@ -28,9 +30,11 @@
 #include "kfd_device_queue_manager.h"
 #include "kfd_pm4_headers_vi.h"
 #include "cwsr_trap_handler_gfx8.asm"
+#include "kfd_iommu.h"
 
 #define MQD_SIZE_ALIGNED 768
 
+#ifdef KFD_SUPPORT_IOMMU_V2
 static const struct kfd_device_info kaveri_device_info = {
 	.asic_family = CHIP_KAVERI,
 	.max_pasid_bits = 16,
@@ -41,6 +45,7 @@ static const struct kfd_device_info kaveri_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 };
 
@@ -54,8 +59,10 @@ static const struct kfd_device_info carrizo_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = true,
 	.needs_pci_atomics = false,
 };
+#endif
 
 static const struct kfd_device_info hawaii_device_info = {
 	.asic_family = CHIP_HAWAII,
@@ -67,6 +74,7 @@ static const struct kfd_device_info hawaii_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -79,6 +87,7 @@ static const struct kfd_device_info tonga_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -91,6 +100,7 @@ static const struct kfd_device_info tonga_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = false,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -103,6 +113,7 @@ static const struct kfd_device_info fiji_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -115,6 +126,7 @@ static const struct kfd_device_info fiji_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -128,6 +140,7 @@ static const struct kfd_device_info polaris10_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -140,6 +153,7 @@ static const struct kfd_device_info polaris10_vf_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = false,
 };
 
@@ -152,6 +166,7 @@ static const struct kfd_device_info polaris11_device_info = {
 	.num_of_watch_points = 4,
 	.mqd_size_aligned = MQD_SIZE_ALIGNED,
 	.supports_cwsr = true,
+	.needs_iommu_device = false,
 	.needs_pci_atomics = true,
 };
 
@@ -162,6 +177,7 @@ struct kfd_deviceid {
 };
 
 static const struct kfd_deviceid supported_devices[] = {
+#ifdef KFD_SUPPORT_IOMMU_V2
 	{ 0x1304, &kaveri_device_info },	/* Kaveri */
 	{ 0x1305, &kaveri_device_info },	/* Kaveri */
 	{ 0x1306, &kaveri_device_info },	/* Kaveri */
@@ -189,6 +205,7 @@ static const struct kfd_deviceid supported_devices[] = {
 	{ 0x9875, &carrizo_device_info },	/* Carrizo */
 	{ 0x9876, &carrizo_device_info },	/* Carrizo */
 	{ 0x9877, &carrizo_device_info },	/* Carrizo */
+#endif
 	{ 0x67A0, &hawaii_device_info },	/* Hawaii */
 	{ 0x67A1, &hawaii_device_info },	/* Hawaii */
 	{ 0x67A2, &hawaii_device_info },	/* Hawaii */
@@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
 	return kfd;
 }
 
-static bool device_iommu_pasid_init(struct kfd_dev *kfd)
-{
-	const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
-					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
-					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
-
-	struct amd_iommu_device_info iommu_info;
-	unsigned int pasid_limit;
-	int err;
-
-	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
-	if (err < 0) {
-		dev_err(kfd_device,
-			"error getting iommu info. is the iommu enabled?\n");
-		return false;
-	}
-
-	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
-		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
-		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
-									!= 0);
-		return false;
-	}
-
-	pasid_limit = min_t(unsigned int,
-			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
-			iommu_info.max_pasids);
-
-	if (!kfd_set_pasid_limit(pasid_limit)) {
-		dev_err(kfd_device, "error setting pasid limit\n");
-		return false;
-	}
-
-	return true;
-}
-
-static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
-{
-	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
-
-	if (dev)
-		kfd_process_iommu_unbind_callback(dev, pasid);
-}
-
-/*
- * This function called by IOMMU driver on PPR failure
- */
-static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
-		unsigned long address, u16 flags)
-{
-	struct kfd_dev *dev;
-
-	dev_warn(kfd_device,
-			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
-			PCI_BUS_NUM(pdev->devfn),
-			PCI_SLOT(pdev->devfn),
-			PCI_FUNC(pdev->devfn),
-			pasid,
-			address,
-			flags);
-
-	dev = kfd_device_by_pci_dev(pdev);
-	if (!WARN_ON(!dev))
-		kfd_signal_iommu_event(dev, pasid, address,
-			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
-
-	return AMD_IOMMU_INV_PRI_RSP_INVALID;
-}
-
 static void kfd_cwsr_init(struct kfd_dev *kfd)
 {
 	if (cwsr_enable && kfd->device_info->supports_cwsr) {
@@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 		goto device_queue_manager_error;
 	}
 
-	if (!device_iommu_pasid_init(kfd)) {
-		dev_err(kfd_device,
-			"Error initializing iommuv2 for device %x:%x\n",
-			kfd->pdev->vendor, kfd->pdev->device);
-		goto device_iommu_pasid_error;
+	if (kfd_iommu_device_init(kfd)) {
+		dev_err(kfd_device, "Error initializing iommuv2\n");
+		goto device_iommu_error;
 	}
 
 	kfd_cwsr_init(kfd);
@@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
 	goto out;
 
 kfd_resume_error:
-device_iommu_pasid_error:
+device_iommu_error:
 	device_queue_manager_uninit(kfd->dqm);
 device_queue_manager_error:
 	kfd_interrupt_exit(kfd);
@@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 
 	kfd->dqm->ops.stop(kfd->dqm);
 
-	kfd_unbind_processes_from_device(kfd);
-
-	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
-	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
-	amd_iommu_free_device(kfd->pdev);
+	kfd_iommu_suspend(kfd);
 }
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
@@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
 static int kfd_resume(struct kfd_dev *kfd)
 {
 	int err = 0;
-	unsigned int pasid_limit = kfd_get_pasid_limit();
-
-	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
-	if (err)
-		return -ENXIO;
-	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
-					iommu_pasid_shutdown_callback);
-	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
-				     iommu_invalid_ppr_cb);
 
-	err = kfd_bind_processes_to_device(kfd);
-	if (err)
-		goto processes_bind_error;
+	err = kfd_iommu_resume(kfd);
+	if (err) {
+		dev_err(kfd_device,
+			"Failed to resume IOMMU for device %x:%x\n",
+			kfd->pdev->vendor, kfd->pdev->device);
+		return err;
+	}
 
 	err = kfd->dqm->ops.start(kfd->dqm);
 	if (err) {
@@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
 	return err;
 
 dqm_start_error:
-processes_bind_error:
-	amd_iommu_free_device(kfd->pdev);
-
+	kfd_iommu_suspend(kfd);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 93aae5c..6fb9c0d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -30,6 +30,7 @@
 #include <linux/memory.h>
 #include "kfd_priv.h"
 #include "kfd_events.h"
+#include "kfd_iommu.h"
 #include <linux/device.h>
 
 /*
@@ -837,6 +838,7 @@ static void lookup_events_by_type_and_signal(struct kfd_process *p,
 	}
 }
 
+#ifdef KFD_SUPPORT_IOMMU_V2
 void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 		unsigned long address, bool is_write_requested,
 		bool is_execute_requested)
@@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
 	mutex_unlock(&p->event_mutex);
 	kfd_unref_process(p);
 }
+#endif /* KFD_SUPPORT_IOMMU_V2 */
 
 void kfd_signal_hw_exception_event(unsigned int pasid)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
new file mode 100644
index 0000000..81dee34
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
@@ -0,0 +1,356 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/printk.h>
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/pci.h>
+#include <linux/amd-iommu.h>
+#include "kfd_priv.h"
+#include "kfd_dbgmgr.h"
+#include "kfd_topology.h"
+#include "kfd_iommu.h"
+
+static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
+					AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
+
+/** kfd_iommu_check_device - Check whether IOMMU is available for device
+ */
+int kfd_iommu_check_device(struct kfd_dev *kfd)
+{
+	struct amd_iommu_device_info iommu_info;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return -ENODEV;
+
+	iommu_info.flags = 0;
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err)
+		return err;
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags)
+		return -ENODEV;
+
+	return 0;
+}
+
+/** kfd_iommu_device_init - Initialize IOMMU for device
+ */
+int kfd_iommu_device_init(struct kfd_dev *kfd)
+{
+	struct amd_iommu_device_info iommu_info;
+	unsigned int pasid_limit;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return 0;
+
+	iommu_info.flags = 0;
+	err = amd_iommu_device_info(kfd->pdev, &iommu_info);
+	if (err < 0) {
+		dev_err(kfd_device,
+			"error getting iommu info. is the iommu enabled?\n");
+		return -ENODEV;
+	}
+
+	if ((iommu_info.flags & required_iommu_flags) != required_iommu_flags) {
+		dev_err(kfd_device, "error required iommu flags ats %i, pri %i, pasid %i\n",
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
+		       (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
+									!= 0);
+		return -ENODEV;
+	}
+
+	pasid_limit = min_t(unsigned int,
+			(unsigned int)(1 << kfd->device_info->max_pasid_bits),
+			iommu_info.max_pasids);
+
+	if (!kfd_set_pasid_limit(pasid_limit)) {
+		dev_err(kfd_device, "error setting pasid limit\n");
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
+ *
+ * Binds the given process to the given device using its PASID. This
+ * enables IOMMUv2 address translation for the process on the device.
+ *
+ * This function assumes that the process mutex is held.
+ */
+int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+	struct kfd_process *p = pdd->process;
+	int err;
+
+	if (!dev->device_info->needs_iommu_device || pdd->bound == PDD_BOUND)
+		return 0;
+
+	if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
+		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
+		return -EINVAL;
+	}
+
+	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
+	if (!err)
+		pdd->bound = PDD_BOUND;
+
+	return err;
+}
+
+/** kfd_iommu_unbind_process - Unbind process from all devices
+ *
+ * This removes all IOMMU device bindings of the process. To be used
+ * before process termination.
+ */
+void kfd_iommu_unbind_process(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		if (pdd->bound == PDD_BOUND)
+			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
+}
+
+/* Callback for process shutdown invoked by the IOMMU driver */
+static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int pasid)
+{
+	struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+
+	if (!dev)
+		return;
+
+	/*
+	 * Look for the process that matches the pasid. If there is no such
+	 * process, we either released it in amdkfd's own notifier, or there
+	 * is a bug. Unfortunately, there is no way to tell...
+	 */
+	p = kfd_lookup_process_by_pasid(pasid);
+	if (!p)
+		return;
+
+	pr_debug("Unbinding process %d from IOMMU\n", pasid);
+
+	mutex_lock(kfd_get_dbgmgr_mutex());
+
+	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
+		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
+			kfd_dbgmgr_destroy(dev->dbgmgr);
+			dev->dbgmgr = NULL;
+		}
+	}
+
+	mutex_unlock(kfd_get_dbgmgr_mutex());
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (pdd)
+		/* For GPU relying on IOMMU, we need to dequeue here
+		 * when PASID is still bound.
+		 */
+		kfd_process_dequeue_from_device(pdd);
+
+	mutex_unlock(&p->mutex);
+
+	kfd_unref_process(p);
+}
+
+/* This function called by IOMMU driver on PPR failure */
+static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
+		unsigned long address, u16 flags)
+{
+	struct kfd_dev *dev;
+
+	dev_warn(kfd_device,
+			"Invalid PPR device %x:%x.%x pasid %d address 0x%lX flags 0x%X",
+			PCI_BUS_NUM(pdev->devfn),
+			PCI_SLOT(pdev->devfn),
+			PCI_FUNC(pdev->devfn),
+			pasid,
+			address,
+			flags);
+
+	dev = kfd_device_by_pci_dev(pdev);
+	if (!WARN_ON(!dev))
+		kfd_signal_iommu_event(dev, pasid, address,
+			flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
+
+	return AMD_IOMMU_INV_PRI_RSP_INVALID;
+}
+
+/*
+ * Bind processes do the device that have been temporarily unbound
+ * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
+ */
+static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
+{
+	struct kfd_process_device *pdd;
+	struct kfd_process *p;
+	unsigned int temp;
+	int err = 0;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		mutex_lock(&p->mutex);
+		pdd = kfd_get_process_device_data(kfd, p);
+
+		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
+			mutex_unlock(&p->mutex);
+			continue;
+		}
+
+		err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
+				p->lead_thread);
+		if (err < 0) {
+			pr_err("Unexpected pasid %d binding failure\n",
+					p->pasid);
+			mutex_unlock(&p->mutex);
+			break;
+		}
+
+		pdd->bound = PDD_BOUND;
+		mutex_unlock(&p->mutex);
+	}
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return err;
+}
+
+/*
+ * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
+ * processes will be restored to PDD_BOUND state in
+ * kfd_bind_processes_to_device.
+ */
+static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
+{
+	struct kfd_process_device *pdd;
+	struct kfd_process *p;
+	unsigned int temp;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		mutex_lock(&p->mutex);
+		pdd = kfd_get_process_device_data(kfd, p);
+
+		if (WARN_ON(!pdd)) {
+			mutex_unlock(&p->mutex);
+			continue;
+		}
+
+		if (pdd->bound == PDD_BOUND)
+			pdd->bound = PDD_BOUND_SUSPENDED;
+		mutex_unlock(&p->mutex);
+	}
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+}
+
+/** kfd_iommu_suspend - Prepare IOMMU for suspend
+ *
+ * This unbinds processes from the device and disables the IOMMU for
+ * the device.
+ */
+void kfd_iommu_suspend(struct kfd_dev *kfd)
+{
+	if (!kfd->device_info->needs_iommu_device)
+		return;
+
+	kfd_unbind_processes_from_device(kfd);
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
+	amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
+	amd_iommu_free_device(kfd->pdev);
+}
+
+/** kfd_iommu_resume - Restore IOMMU after resume
+ *
+ * This reinitializes the IOMMU for the device and re-binds previously
+ * suspended processes to the device.
+ */
+int kfd_iommu_resume(struct kfd_dev *kfd)
+{
+	unsigned int pasid_limit;
+	int err;
+
+	if (!kfd->device_info->needs_iommu_device)
+		return 0;
+
+	pasid_limit = kfd_get_pasid_limit();
+
+	err = amd_iommu_init_device(kfd->pdev, pasid_limit);
+	if (err)
+		return -ENXIO;
+
+	amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
+					iommu_pasid_shutdown_callback);
+	amd_iommu_set_invalid_ppr_cb(kfd->pdev,
+				     iommu_invalid_ppr_cb);
+
+	err = kfd_bind_processes_to_device(kfd);
+	if (err) {
+		amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
+		amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
+		amd_iommu_free_device(kfd->pdev);
+		return err;
+	}
+
+	return 0;
+}
+
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
+/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to topology
+ */
+int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
+{
+	struct kfd_perf_properties *props;
+
+	if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
+		return 0;
+
+	if (!amd_iommu_pc_supported())
+		return 0;
+
+	props = kfd_alloc_struct(props);
+	if (!props)
+		return -ENOMEM;
+	strcpy(props->block_name, "iommu");
+	props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
+		amd_iommu_pc_get_max_counters(0); /* assume one iommu */
+	list_add_tail(&props->list, &kdev->perf_props);
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
new file mode 100644
index 0000000..dd23d9f
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
@@ -0,0 +1,78 @@
+/*
+ * Copyright 2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#ifndef __KFD_IOMMU_H__
+#define __KFD_IOMMU_H__
+
+#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
+
+#define KFD_SUPPORT_IOMMU_V2
+
+int kfd_iommu_check_device(struct kfd_dev *kfd);
+int kfd_iommu_device_init(struct kfd_dev *kfd);
+
+int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
+void kfd_iommu_unbind_process(struct kfd_process *p);
+
+void kfd_iommu_suspend(struct kfd_dev *kfd);
+int kfd_iommu_resume(struct kfd_dev *kfd);
+
+int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
+
+#else
+
+static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
+{
+	return -ENODEV;
+}
+static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
+{
+	return 0;
+}
+
+static inline int kfd_iommu_bind_process_to_device(
+	struct kfd_process_device *pdd)
+{
+	return 0;
+}
+static inline void kfd_iommu_unbind_process(struct kfd_process *p)
+{
+	/* empty */
+}
+
+static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
+{
+	/* empty */
+}
+static inline int kfd_iommu_resume(struct kfd_dev *kfd)
+{
+	return 0;
+}
+
+static inline int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
+{
+	return 0;
+}
+
+#endif /* defined(CONFIG_AMD_IOMMU_V2) */
+
+#endif /* __KFD_IOMMU_H__ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 594f853..f12eb5d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -158,6 +158,7 @@ struct kfd_device_info {
 	uint8_t num_of_watch_points;
 	uint16_t mqd_size_aligned;
 	bool supports_cwsr;
+	bool needs_iommu_device;
 	bool needs_pci_atomics;
 };
 
@@ -517,15 +518,15 @@ struct kfd_process_device {
 	uint64_t scratch_base;
 	uint64_t scratch_limit;
 
-	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
-	enum kfd_pdd_bound bound;
-
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
 	 * function.
 	 */
 	bool already_dequeued;
+
+	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
+	enum kfd_pdd_bound bound;
 };
 
 #define qpd_to_pdd(x) container_of(x, struct kfd_process_device, qpd)
@@ -590,6 +591,10 @@ struct kfd_process {
 	bool signal_event_limit_reached;
 };
 
+#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
+extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
+extern struct srcu_struct kfd_processes_srcu;
+
 /**
  * Ioctl function type.
  *
@@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
 
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
-int kfd_bind_processes_to_device(struct kfd_dev *dev);
-void kfd_unbind_processes_from_device(struct kfd_dev *dev);
-void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
 							struct kfd_process *p);
 struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 4ff5f0f..e9aee76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -35,16 +35,16 @@ struct mm_struct;
 
 #include "kfd_priv.h"
 #include "kfd_dbgmgr.h"
+#include "kfd_iommu.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
  * Unique/indexed by mm_struct*
  */
-#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
-static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
+DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
 static DEFINE_MUTEX(kfd_processes_mutex);
 
-DEFINE_STATIC_SRCU(kfd_processes_srcu);
+DEFINE_SRCU(kfd_processes_srcu);
 
 static struct workqueue_struct *kfd_process_wq;
 
@@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct work_struct *work)
 {
 	struct kfd_process *p = container_of(work, struct kfd_process,
 					     release_work);
-	struct kfd_process_device *pdd;
 
-	pr_debug("Releasing process (pasid %d) in workqueue\n", p->pasid);
-
-	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
-		if (pdd->bound == PDD_BOUND)
-			amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
-	}
+	kfd_iommu_unbind_process(p);
 
 	kfd_process_destroy_pdds(p);
 
@@ -429,133 +423,13 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 		return ERR_PTR(-ENOMEM);
 	}
 
-	if (pdd->bound == PDD_BOUND) {
-		return pdd;
-	} else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
-		pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
-		return ERR_PTR(-EINVAL);
-	}
-
-	err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
-	if (err < 0)
+	err = kfd_iommu_bind_process_to_device(pdd);
+	if (err)
 		return ERR_PTR(err);
 
-	pdd->bound = PDD_BOUND;
-
 	return pdd;
 }
 
-/*
- * Bind processes do the device that have been temporarily unbound
- * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
- */
-int kfd_bind_processes_to_device(struct kfd_dev *dev)
-{
-	struct kfd_process_device *pdd;
-	struct kfd_process *p;
-	unsigned int temp;
-	int err = 0;
-
-	int idx = srcu_read_lock(&kfd_processes_srcu);
-
-	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
-		mutex_lock(&p->mutex);
-		pdd = kfd_get_process_device_data(dev, p);
-
-		if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
-			mutex_unlock(&p->mutex);
-			continue;
-		}
-
-		err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
-				p->lead_thread);
-		if (err < 0) {
-			pr_err("Unexpected pasid %d binding failure\n",
-					p->pasid);
-			mutex_unlock(&p->mutex);
-			break;
-		}
-
-		pdd->bound = PDD_BOUND;
-		mutex_unlock(&p->mutex);
-	}
-
-	srcu_read_unlock(&kfd_processes_srcu, idx);
-
-	return err;
-}
-
-/*
- * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
- * processes will be restored to PDD_BOUND state in
- * kfd_bind_processes_to_device.
- */
-void kfd_unbind_processes_from_device(struct kfd_dev *dev)
-{
-	struct kfd_process_device *pdd;
-	struct kfd_process *p;
-	unsigned int temp;
-
-	int idx = srcu_read_lock(&kfd_processes_srcu);
-
-	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
-		mutex_lock(&p->mutex);
-		pdd = kfd_get_process_device_data(dev, p);
-
-		if (WARN_ON(!pdd)) {
-			mutex_unlock(&p->mutex);
-			continue;
-		}
-
-		if (pdd->bound == PDD_BOUND)
-			pdd->bound = PDD_BOUND_SUSPENDED;
-		mutex_unlock(&p->mutex);
-	}
-
-	srcu_read_unlock(&kfd_processes_srcu, idx);
-}
-
-void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid)
-{
-	struct kfd_process *p;
-	struct kfd_process_device *pdd;
-
-	/*
-	 * Look for the process that matches the pasid. If there is no such
-	 * process, we either released it in amdkfd's own notifier, or there
-	 * is a bug. Unfortunately, there is no way to tell...
-	 */
-	p = kfd_lookup_process_by_pasid(pasid);
-	if (!p)
-		return;
-
-	pr_debug("Unbinding process %d from IOMMU\n", pasid);
-
-	mutex_lock(kfd_get_dbgmgr_mutex());
-
-	if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
-		if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
-			kfd_dbgmgr_destroy(dev->dbgmgr);
-			dev->dbgmgr = NULL;
-		}
-	}
-
-	mutex_unlock(kfd_get_dbgmgr_mutex());
-
-	mutex_lock(&p->mutex);
-
-	pdd = kfd_get_process_device_data(dev, p);
-	if (pdd)
-		/* For GPU relying on IOMMU, we need to dequeue here
-		 * when PASID is still bound.
-		 */
-		kfd_process_dequeue_from_device(pdd);
-
-	mutex_unlock(&p->mutex);
-
-	kfd_unref_process(p);
-}
-
 struct kfd_process_device *kfd_get_first_process_device_data(
 						struct kfd_process *p)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 7783250..2506155 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -35,6 +35,7 @@
 #include "kfd_crat.h"
 #include "kfd_topology.h"
 #include "kfd_device_queue_manager.h"
+#include "kfd_iommu.h"
 
 /* topology_device_list - Master list of all topology devices */
 static struct list_head topology_device_list;
@@ -875,19 +876,8 @@ static void find_system_memory(const struct dmi_header *dm,
  */
 static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
 {
-	struct kfd_perf_properties *props;
-
-	if (amd_iommu_pc_supported()) {
-		props = kfd_alloc_struct(props);
-		if (!props)
-			return -ENOMEM;
-		strcpy(props->block_name, "iommu");
-		props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
-			amd_iommu_pc_get_max_counters(0); /* assume one iommu */
-		list_add_tail(&props->list, &kdev->perf_props);
-	}
-
-	return 0;
+	/* These are the only counters supported so far */
+	return kfd_iommu_add_perf_counters(kdev);
 }
 
 /* kfd_add_non_crat_information - Add information that is not currently
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 53fca1f..c0be2be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -25,7 +25,7 @@
 
 #include <linux/types.h>
 #include <linux/list.h>
-#include "kfd_priv.h"
+#include "kfd_crat.h"
 
 #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
 
@@ -183,8 +183,4 @@ struct kfd_topology_device *kfd_create_topology_device(
 		struct list_head *device_list);
 void kfd_release_topology_device_list(struct list_head *device_list);
 
-extern bool amd_iommu_pc_supported(void);
-extern u8 amd_iommu_pc_get_max_banks(u16 devid);
-extern u8 amd_iommu_pc_get_max_counters(u16 devid);
-
 #endif /* __KFD_TOPOLOGY_H__ */
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

* Re: [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional
       [not found]             ` <bfe03de8-63fb-efb4-94e5-5eaf4628bfc1-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-08  8:16               ` Christian König
       [not found]                 ` <50866577-97a4-2786-18af-ddb60a435aea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-08  8:16 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w


[-- Attachment #1.1: Type: text/plain, Size: 45850 bytes --]

Am 07.02.2018 um 21:51 schrieb Felix Kuehling:
> On 2018-02-07 06:20 AM, Christian König wrote:
>> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>>> dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
>>> ASIC information. Also allow building KFD without IOMMUv2 support.
>>> This is still useful for dGPUs and prepares for enabling KFD on
>>> architectures that don't support AMD IOMMUv2.
>>>
>>> v2:
>>> * Centralize IOMMUv2 code to avoid #ifdefs in too many places
>>>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
>>> ---
>>>    drivers/gpu/drm/amd/amdkfd/Kconfig        |   2 +-
>>>    drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
>>>    drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
>>>    drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
>>>    drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356
>>> ++++++++++++++++++++++++++++++
>>>    drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
>>>    drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
>>>    11 files changed, 493 insertions(+), 265 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>>>    create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig
>>> b/drivers/gpu/drm/amd/amdkfd/Kconfig
>>> index bc5a294..5bbeb95 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig
>>> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
>>> @@ -4,6 +4,6 @@
>>>      config HSA_AMD
>>>        tristate "HSA kernel driver for AMD GPU devices"
>>> -    depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
>>> +    depends on DRM_AMDGPU && X86_64
>> You still need a weak dependency on AMD_IOMMU_V2 here, in other words
>> add "imply AMD_IOMMU_V2".
>>
>> This prevents illegal combinations like linking amdkfd into the kernel
>> while amd_iommu_v2 is a module.
>>
>> But it should still allow to completely disable amd_iommu_v2 and
>> compile amdkfd without support for it.
> Thanks, that's good to know. An updated patch is attached (to avoid
> resending the whole series).

Patch is Acked-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> Christian.
>>
>>>        help
>>>          Enable this if you want to use HSA features on AMD GPU devices.
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile
>>> b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> index a317e76..0d02422 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
>>> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
>>> @@ -37,6 +37,10 @@ amdkfd-y    := kfd_module.o kfd_device.o
>>> kfd_chardev.o kfd_topology.o \
>>>            kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
>>>            kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
>>>    +ifneq ($(CONFIG_AMD_IOMMU_V2),)
>>> +amdkfd-y += kfd_iommu.o
>>> +endif
>>> +
>>>    amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
>>>      obj-$(CONFIG_HSA_AMD)    += amdkfd.o
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> index 2bc2816..7493f47 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
>>> @@ -22,10 +22,10 @@
>>>      #include <linux/pci.h>
>>>    #include <linux/acpi.h>
>>> -#include <linux/amd-iommu.h>
>>>    #include "kfd_crat.h"
>>>    #include "kfd_priv.h"
>>>    #include "kfd_topology.h"
>>> +#include "kfd_iommu.h"
>>>      /* GPU Processor ID base for dGPUs for which VCRAT needs to be
>>> created.
>>>     * GPU processor ID are expressed with Bit[31]=1.
>>> @@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void
>>> *pcrat_image,
>>>        struct crat_subtype_generic *sub_type_hdr;
>>>        struct crat_subtype_computeunit *cu;
>>>        struct kfd_cu_info cu_info;
>>> -    struct amd_iommu_device_info iommu_info;
>>>        int avail_size = *size;
>>>        uint32_t total_num_of_cu;
>>>        int num_of_cache_entries = 0;
>>>        int cache_mem_filled = 0;
>>>        int ret = 0;
>>> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>>> -                     AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>>> -                     AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>>>        struct kfd_local_mem_info local_mem_info;
>>>          if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
>>> @@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void
>>> *pcrat_image,
>>>        /* Check if this node supports IOMMU. During parsing this flag
>>> will
>>>         * translate to HSA_CAP_ATS_PRESENT
>>>         */
>>> -    iommu_info.flags = 0;
>>> -    if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
>>> -        if ((iommu_info.flags & required_iommu_flags) ==
>>> -                required_iommu_flags)
>>> -            cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>>> -    }
>>> +    if (!kfd_iommu_check_device(kdev))
>>> +        cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>>>          crat_table->length += sub_type_hdr->length;
>>>        crat_table->total_entries++;
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> index 83d6f41..4ac2d61 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
>>> @@ -20,7 +20,9 @@
>>>     * OTHER DEALINGS IN THE SOFTWARE.
>>>     */
>>>    +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) ||
>>> defined(CONFIG_AMD_IOMMU_V2)
>>>    #include <linux/amd-iommu.h>
>>> +#endif
>>>    #include <linux/bsearch.h>
>>>    #include <linux/pci.h>
>>>    #include <linux/slab.h>
>>> @@ -28,9 +30,11 @@
>>>    #include "kfd_device_queue_manager.h"
>>>    #include "kfd_pm4_headers_vi.h"
>>>    #include "cwsr_trap_handler_gfx8.asm"
>>> +#include "kfd_iommu.h"
>>>      #define MQD_SIZE_ALIGNED 768
>>>    +#ifdef KFD_SUPPORT_IOMMU_V2
>>>    static const struct kfd_device_info kaveri_device_info = {
>>>        .asic_family = CHIP_KAVERI,
>>>        .max_pasid_bits = 16,
>>> @@ -41,6 +45,7 @@ static const struct kfd_device_info
>>> kaveri_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = false,
>>> +    .needs_iommu_device = true,
>>>        .needs_pci_atomics = false,
>>>    };
>>>    @@ -54,8 +59,10 @@ static const struct kfd_device_info
>>> carrizo_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = true,
>>>        .needs_pci_atomics = false,
>>>    };
>>> +#endif
>>>      static const struct kfd_device_info hawaii_device_info = {
>>>        .asic_family = CHIP_HAWAII,
>>> @@ -67,6 +74,7 @@ static const struct kfd_device_info
>>> hawaii_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = false,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = false,
>>>    };
>>>    @@ -79,6 +87,7 @@ static const struct kfd_device_info
>>> tonga_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = false,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = true,
>>>    };
>>>    @@ -91,6 +100,7 @@ static const struct kfd_device_info
>>> tonga_vf_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = false,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = false,
>>>    };
>>>    @@ -103,6 +113,7 @@ static const struct kfd_device_info
>>> fiji_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = true,
>>>    };
>>>    @@ -115,6 +126,7 @@ static const struct kfd_device_info
>>> fiji_vf_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = false,
>>>    };
>>>    @@ -128,6 +140,7 @@ static const struct kfd_device_info
>>> polaris10_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = true,
>>>    };
>>>    @@ -140,6 +153,7 @@ static const struct kfd_device_info
>>> polaris10_vf_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = false,
>>>    };
>>>    @@ -152,6 +166,7 @@ static const struct kfd_device_info
>>> polaris11_device_info = {
>>>        .num_of_watch_points = 4,
>>>        .mqd_size_aligned = MQD_SIZE_ALIGNED,
>>>        .supports_cwsr = true,
>>> +    .needs_iommu_device = false,
>>>        .needs_pci_atomics = true,
>>>    };
>>>    @@ -162,6 +177,7 @@ struct kfd_deviceid {
>>>    };
>>>      static const struct kfd_deviceid supported_devices[] = {
>>> +#ifdef KFD_SUPPORT_IOMMU_V2
>>>        { 0x1304, &kaveri_device_info },    /* Kaveri */
>>>        { 0x1305, &kaveri_device_info },    /* Kaveri */
>>>        { 0x1306, &kaveri_device_info },    /* Kaveri */
>>> @@ -189,6 +205,7 @@ static const struct kfd_deviceid
>>> supported_devices[] = {
>>>        { 0x9875, &carrizo_device_info },    /* Carrizo */
>>>        { 0x9876, &carrizo_device_info },    /* Carrizo */
>>>        { 0x9877, &carrizo_device_info },    /* Carrizo */
>>> +#endif
>>>        { 0x67A0, &hawaii_device_info },    /* Hawaii */
>>>        { 0x67A1, &hawaii_device_info },    /* Hawaii */
>>>        { 0x67A2, &hawaii_device_info },    /* Hawaii */
>>> @@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>>>        return kfd;
>>>    }
>>>    -static bool device_iommu_pasid_init(struct kfd_dev *kfd)
>>> -{
>>> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>>> -                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>>> -                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>>> -
>>> -    struct amd_iommu_device_info iommu_info;
>>> -    unsigned int pasid_limit;
>>> -    int err;
>>> -
>>> -    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>>> -    if (err < 0) {
>>> -        dev_err(kfd_device,
>>> -            "error getting iommu info. is the iommu enabled?\n");
>>> -        return false;
>>> -    }
>>> -
>>> -    if ((iommu_info.flags & required_iommu_flags) !=
>>> required_iommu_flags) {
>>> -        dev_err(kfd_device, "error required iommu flags ats %i, pri
>>> %i, pasid %i\n",
>>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
>>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
>>> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
>>> -                                    != 0);
>>> -        return false;
>>> -    }
>>> -
>>> -    pasid_limit = min_t(unsigned int,
>>> -            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>>> -            iommu_info.max_pasids);
>>> -
>>> -    if (!kfd_set_pasid_limit(pasid_limit)) {
>>> -        dev_err(kfd_device, "error setting pasid limit\n");
>>> -        return false;
>>> -    }
>>> -
>>> -    return true;
>>> -}
>>> -
>>> -static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
>>> pasid)
>>> -{
>>> -    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>>> -
>>> -    if (dev)
>>> -        kfd_process_iommu_unbind_callback(dev, pasid);
>>> -}
>>> -
>>> -/*
>>> - * This function called by IOMMU driver on PPR failure
>>> - */
>>> -static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
>>> -        unsigned long address, u16 flags)
>>> -{
>>> -    struct kfd_dev *dev;
>>> -
>>> -    dev_warn(kfd_device,
>>> -            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
>>> flags 0x%X",
>>> -            PCI_BUS_NUM(pdev->devfn),
>>> -            PCI_SLOT(pdev->devfn),
>>> -            PCI_FUNC(pdev->devfn),
>>> -            pasid,
>>> -            address,
>>> -            flags);
>>> -
>>> -    dev = kfd_device_by_pci_dev(pdev);
>>> -    if (!WARN_ON(!dev))
>>> -        kfd_signal_iommu_event(dev, pasid, address,
>>> -            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
>>> -
>>> -    return AMD_IOMMU_INV_PRI_RSP_INVALID;
>>> -}
>>> -
>>>    static void kfd_cwsr_init(struct kfd_dev *kfd)
>>>    {
>>>        if (cwsr_enable && kfd->device_info->supports_cwsr) {
>>> @@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>>            goto device_queue_manager_error;
>>>        }
>>>    -    if (!device_iommu_pasid_init(kfd)) {
>>> -        dev_err(kfd_device,
>>> -            "Error initializing iommuv2 for device %x:%x\n",
>>> -            kfd->pdev->vendor, kfd->pdev->device);
>>> -        goto device_iommu_pasid_error;
>>> +    if (kfd_iommu_device_init(kfd)) {
>>> +        dev_err(kfd_device, "Error initializing iommuv2\n");
>>> +        goto device_iommu_error;
>>>        }
>>>          kfd_cwsr_init(kfd);
>>> @@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>>>        goto out;
>>>      kfd_resume_error:
>>> -device_iommu_pasid_error:
>>> +device_iommu_error:
>>>        device_queue_manager_uninit(kfd->dqm);
>>>    device_queue_manager_error:
>>>        kfd_interrupt_exit(kfd);
>>> @@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>>>          kfd->dqm->ops.stop(kfd->dqm);
>>>    -    kfd_unbind_processes_from_device(kfd);
>>> -
>>> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>>> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>>> -    amd_iommu_free_device(kfd->pdev);
>>> +    kfd_iommu_suspend(kfd);
>>>    }
>>>      int kgd2kfd_resume(struct kfd_dev *kfd)
>>> @@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
>>>    static int kfd_resume(struct kfd_dev *kfd)
>>>    {
>>>        int err = 0;
>>> -    unsigned int pasid_limit = kfd_get_pasid_limit();
>>> -
>>> -    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>>> -    if (err)
>>> -        return -ENXIO;
>>> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
>>> -                    iommu_pasid_shutdown_callback);
>>> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
>>> -                     iommu_invalid_ppr_cb);
>>>    -    err = kfd_bind_processes_to_device(kfd);
>>> -    if (err)
>>> -        goto processes_bind_error;
>>> +    err = kfd_iommu_resume(kfd);
>>> +    if (err) {
>>> +        dev_err(kfd_device,
>>> +            "Failed to resume IOMMU for device %x:%x\n",
>>> +            kfd->pdev->vendor, kfd->pdev->device);
>>> +        return err;
>>> +    }
>>>          err = kfd->dqm->ops.start(kfd->dqm);
>>>        if (err) {
>>> @@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
>>>        return err;
>>>      dqm_start_error:
>>> -processes_bind_error:
>>> -    amd_iommu_free_device(kfd->pdev);
>>> -
>>> +    kfd_iommu_suspend(kfd);
>>>        return err;
>>>    }
>>>    diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> index 93aae5c..6fb9c0d 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
>>> @@ -30,6 +30,7 @@
>>>    #include <linux/memory.h>
>>>    #include "kfd_priv.h"
>>>    #include "kfd_events.h"
>>> +#include "kfd_iommu.h"
>>>    #include <linux/device.h>
>>>      /*
>>> @@ -837,6 +838,7 @@ static void
>>> lookup_events_by_type_and_signal(struct kfd_process *p,
>>>        }
>>>    }
>>>    +#ifdef KFD_SUPPORT_IOMMU_V2
>>>    void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
>>>            unsigned long address, bool is_write_requested,
>>>            bool is_execute_requested)
>>> @@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
>>> unsigned int pasid,
>>>        mutex_unlock(&p->event_mutex);
>>>        kfd_unref_process(p);
>>>    }
>>> +#endif /* KFD_SUPPORT_IOMMU_V2 */
>>>      void kfd_signal_hw_exception_event(unsigned int pasid)
>>>    {
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>>> new file mode 100644
>>> index 0000000..81dee34
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>>> @@ -0,0 +1,356 @@
>>> +/*
>>> + * Copyright 2018 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person
>>> obtaining a
>>> + * copy of this software and associated documentation files (the
>>> "Software"),
>>> + * to deal in the Software without restriction, including without
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute,
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom
>>> the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#include <linux/printk.h>
>>> +#include <linux/device.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/pci.h>
>>> +#include <linux/amd-iommu.h>
>>> +#include "kfd_priv.h"
>>> +#include "kfd_dbgmgr.h"
>>> +#include "kfd_topology.h"
>>> +#include "kfd_iommu.h"
>>> +
>>> +static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
>>> +                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
>>> +                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>>> +
>>> +/** kfd_iommu_check_device - Check whether IOMMU is available for
>>> device
>>> + */
>>> +int kfd_iommu_check_device(struct kfd_dev *kfd)
>>> +{
>>> +    struct amd_iommu_device_info iommu_info;
>>> +    int err;
>>> +
>>> +    if (!kfd->device_info->needs_iommu_device)
>>> +        return -ENODEV;
>>> +
>>> +    iommu_info.flags = 0;
>>> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>>> +    if (err)
>>> +        return err;
>>> +
>>> +    if ((iommu_info.flags & required_iommu_flags) !=
>>> required_iommu_flags)
>>> +        return -ENODEV;
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/** kfd_iommu_device_init - Initialize IOMMU for device
>>> + */
>>> +int kfd_iommu_device_init(struct kfd_dev *kfd)
>>> +{
>>> +    struct amd_iommu_device_info iommu_info;
>>> +    unsigned int pasid_limit;
>>> +    int err;
>>> +
>>> +    if (!kfd->device_info->needs_iommu_device)
>>> +        return 0;
>>> +
>>> +    iommu_info.flags = 0;
>>> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
>>> +    if (err < 0) {
>>> +        dev_err(kfd_device,
>>> +            "error getting iommu info. is the iommu enabled?\n");
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    if ((iommu_info.flags & required_iommu_flags) !=
>>> required_iommu_flags) {
>>> +        dev_err(kfd_device, "error required iommu flags ats %i, pri
>>> %i, pasid %i\n",
>>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
>>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
>>> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
>>> +                                    != 0);
>>> +        return -ENODEV;
>>> +    }
>>> +
>>> +    pasid_limit = min_t(unsigned int,
>>> +            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
>>> +            iommu_info.max_pasids);
>>> +
>>> +    if (!kfd_set_pasid_limit(pasid_limit)) {
>>> +        dev_err(kfd_device, "error setting pasid limit\n");
>>> +        return -EBUSY;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
>>> + *
>>> + * Binds the given process to the given device using its PASID. This
>>> + * enables IOMMUv2 address translation for the process on the device.
>>> + *
>>> + * This function assumes that the process mutex is held.
>>> + */
>>> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
>>> +{
>>> +    struct kfd_dev *dev = pdd->dev;
>>> +    struct kfd_process *p = pdd->process;
>>> +    int err;
>>> +
>>> +    if (!dev->device_info->needs_iommu_device || pdd->bound ==
>>> PDD_BOUND)
>>> +        return 0;
>>> +
>>> +    if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
>>> +        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
>>> +    if (!err)
>>> +        pdd->bound = PDD_BOUND;
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +/** kfd_iommu_unbind_process - Unbind process from all devices
>>> + *
>>> + * This removes all IOMMU device bindings of the process. To be used
>>> + * before process termination.
>>> + */
>>> +void kfd_iommu_unbind_process(struct kfd_process *p)
>>> +{
>>> +    struct kfd_process_device *pdd;
>>> +
>>> +    list_for_each_entry(pdd, &p->per_device_data, per_device_list)
>>> +        if (pdd->bound == PDD_BOUND)
>>> +            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>>> +}
>>> +
>>> +/* Callback for process shutdown invoked by the IOMMU driver */
>>> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
>>> pasid)
>>> +{
>>> +    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
>>> +    struct kfd_process *p;
>>> +    struct kfd_process_device *pdd;
>>> +
>>> +    if (!dev)
>>> +        return;
>>> +
>>> +    /*
>>> +     * Look for the process that matches the pasid. If there is no such
>>> +     * process, we either released it in amdkfd's own notifier, or
>>> there
>>> +     * is a bug. Unfortunately, there is no way to tell...
>>> +     */
>>> +    p = kfd_lookup_process_by_pasid(pasid);
>>> +    if (!p)
>>> +        return;
>>> +
>>> +    pr_debug("Unbinding process %d from IOMMU\n", pasid);
>>> +
>>> +    mutex_lock(kfd_get_dbgmgr_mutex());
>>> +
>>> +    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
>>> +        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
>>> +            kfd_dbgmgr_destroy(dev->dbgmgr);
>>> +            dev->dbgmgr = NULL;
>>> +        }
>>> +    }
>>> +
>>> +    mutex_unlock(kfd_get_dbgmgr_mutex());
>>> +
>>> +    mutex_lock(&p->mutex);
>>> +
>>> +    pdd = kfd_get_process_device_data(dev, p);
>>> +    if (pdd)
>>> +        /* For GPU relying on IOMMU, we need to dequeue here
>>> +         * when PASID is still bound.
>>> +         */
>>> +        kfd_process_dequeue_from_device(pdd);
>>> +
>>> +    mutex_unlock(&p->mutex);
>>> +
>>> +    kfd_unref_process(p);
>>> +}
>>> +
>>> +/* This function called by IOMMU driver on PPR failure */
>>> +static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
>>> +        unsigned long address, u16 flags)
>>> +{
>>> +    struct kfd_dev *dev;
>>> +
>>> +    dev_warn(kfd_device,
>>> +            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
>>> flags 0x%X",
>>> +            PCI_BUS_NUM(pdev->devfn),
>>> +            PCI_SLOT(pdev->devfn),
>>> +            PCI_FUNC(pdev->devfn),
>>> +            pasid,
>>> +            address,
>>> +            flags);
>>> +
>>> +    dev = kfd_device_by_pci_dev(pdev);
>>> +    if (!WARN_ON(!dev))
>>> +        kfd_signal_iommu_event(dev, pasid, address,
>>> +            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
>>> +
>>> +    return AMD_IOMMU_INV_PRI_RSP_INVALID;
>>> +}
>>> +
>>> +/*
>>> + * Bind processes do the device that have been temporarily unbound
>>> + * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
>>> + */
>>> +static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
>>> +{
>>> +    struct kfd_process_device *pdd;
>>> +    struct kfd_process *p;
>>> +    unsigned int temp;
>>> +    int err = 0;
>>> +
>>> +    int idx = srcu_read_lock(&kfd_processes_srcu);
>>> +
>>> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> +        mutex_lock(&p->mutex);
>>> +        pdd = kfd_get_process_device_data(kfd, p);
>>> +
>>> +        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
>>> +            mutex_unlock(&p->mutex);
>>> +            continue;
>>> +        }
>>> +
>>> +        err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
>>> +                p->lead_thread);
>>> +        if (err < 0) {
>>> +            pr_err("Unexpected pasid %d binding failure\n",
>>> +                    p->pasid);
>>> +            mutex_unlock(&p->mutex);
>>> +            break;
>>> +        }
>>> +
>>> +        pdd->bound = PDD_BOUND;
>>> +        mutex_unlock(&p->mutex);
>>> +    }
>>> +
>>> +    srcu_read_unlock(&kfd_processes_srcu, idx);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +/*
>>> + * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
>>> + * processes will be restored to PDD_BOUND state in
>>> + * kfd_bind_processes_to_device.
>>> + */
>>> +static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
>>> +{
>>> +    struct kfd_process_device *pdd;
>>> +    struct kfd_process *p;
>>> +    unsigned int temp;
>>> +
>>> +    int idx = srcu_read_lock(&kfd_processes_srcu);
>>> +
>>> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> +        mutex_lock(&p->mutex);
>>> +        pdd = kfd_get_process_device_data(kfd, p);
>>> +
>>> +        if (WARN_ON(!pdd)) {
>>> +            mutex_unlock(&p->mutex);
>>> +            continue;
>>> +        }
>>> +
>>> +        if (pdd->bound == PDD_BOUND)
>>> +            pdd->bound = PDD_BOUND_SUSPENDED;
>>> +        mutex_unlock(&p->mutex);
>>> +    }
>>> +
>>> +    srcu_read_unlock(&kfd_processes_srcu, idx);
>>> +}
>>> +
>>> +/** kfd_iommu_suspend - Prepare IOMMU for suspend
>>> + *
>>> + * This unbinds processes from the device and disables the IOMMU for
>>> + * the device.
>>> + */
>>> +void kfd_iommu_suspend(struct kfd_dev *kfd)
>>> +{
>>> +    if (!kfd->device_info->needs_iommu_device)
>>> +        return;
>>> +
>>> +    kfd_unbind_processes_from_device(kfd);
>>> +
>>> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>>> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>>> +    amd_iommu_free_device(kfd->pdev);
>>> +}
>>> +
>>> +/** kfd_iommu_resume - Restore IOMMU after resume
>>> + *
>>> + * This reinitializes the IOMMU for the device and re-binds previously
>>> + * suspended processes to the device.
>>> + */
>>> +int kfd_iommu_resume(struct kfd_dev *kfd)
>>> +{
>>> +    unsigned int pasid_limit;
>>> +    int err;
>>> +
>>> +    if (!kfd->device_info->needs_iommu_device)
>>> +        return 0;
>>> +
>>> +    pasid_limit = kfd_get_pasid_limit();
>>> +
>>> +    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
>>> +    if (err)
>>> +        return -ENXIO;
>>> +
>>> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
>>> +                    iommu_pasid_shutdown_callback);
>>> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
>>> +                     iommu_invalid_ppr_cb);
>>> +
>>> +    err = kfd_bind_processes_to_device(kfd);
>>> +    if (err) {
>>> +        amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
>>> +        amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
>>> +        amd_iommu_free_device(kfd->pdev);
>>> +        return err;
>>> +    }
>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +extern bool amd_iommu_pc_supported(void);
>>> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>> +
>>> +/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to
>>> topology
>>> + */
>>> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
>>> +{
>>> +    struct kfd_perf_properties *props;
>>> +
>>> +    if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
>>> +        return 0;
>>> +
>>> +    if (!amd_iommu_pc_supported())
>>> +        return 0;
>>> +
>>> +    props = kfd_alloc_struct(props);
>>> +    if (!props)
>>> +        return -ENOMEM;
>>> +    strcpy(props->block_name, "iommu");
>>> +    props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>> +        amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>> +    list_add_tail(&props->list, &kdev->perf_props);
>>> +
>>> +    return 0;
>>> +}
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>>> new file mode 100644
>>> index 0000000..dd23d9f
>>> --- /dev/null
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>>> @@ -0,0 +1,78 @@
>>> +/*
>>> + * Copyright 2018 Advanced Micro Devices, Inc.
>>> + *
>>> + * Permission is hereby granted, free of charge, to any person
>>> obtaining a
>>> + * copy of this software and associated documentation files (the
>>> "Software"),
>>> + * to deal in the Software without restriction, including without
>>> limitation
>>> + * the rights to use, copy, modify, merge, publish, distribute,
>>> sublicense,
>>> + * and/or sell copies of the Software, and to permit persons to whom
>>> the
>>> + * Software is furnished to do so, subject to the following conditions:
>>> + *
>>> + * The above copyright notice and this permission notice shall be
>>> included in
>>> + * all copies or substantial portions of the Software.
>>> + *
>>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR
>>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY,
>>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
>>> EVENT SHALL
>>> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
>>> DAMAGES OR
>>> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
>>> OTHERWISE,
>>> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
>>> USE OR
>>> + * OTHER DEALINGS IN THE SOFTWARE.
>>> + */
>>> +
>>> +#ifndef __KFD_IOMMU_H__
>>> +#define __KFD_IOMMU_H__
>>> +
>>> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
>>> +
>>> +#define KFD_SUPPORT_IOMMU_V2
>>> +
>>> +int kfd_iommu_check_device(struct kfd_dev *kfd);
>>> +int kfd_iommu_device_init(struct kfd_dev *kfd);
>>> +
>>> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
>>> +void kfd_iommu_unbind_process(struct kfd_process *p);
>>> +
>>> +void kfd_iommu_suspend(struct kfd_dev *kfd);
>>> +int kfd_iommu_resume(struct kfd_dev *kfd);
>>> +
>>> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
>>> +
>>> +#else
>>> +
>>> +static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
>>> +{
>>> +    return -ENODEV;
>>> +}
>>> +static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static inline int kfd_iommu_bind_process_to_device(
>>> +    struct kfd_process_device *pdd)
>>> +{
>>> +    return 0;
>>> +}
>>> +static inline void kfd_iommu_unbind_process(struct kfd_process *p)
>>> +{
>>> +    /* empty */
>>> +}
>>> +
>>> +static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
>>> +{
>>> +    /* empty */
>>> +}
>>> +static inline int kfd_iommu_resume(struct kfd_dev *kfd)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +static inline int kfd_iommu_add_perf_counters(struct
>>> kfd_topology_device *kdev)
>>> +{
>>> +    return 0;
>>> +}
>>> +
>>> +#endif /* defined(CONFIG_AMD_IOMMU_V2) */
>>> +
>>> +#endif /* __KFD_IOMMU_H__ */
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 594f853..f12eb5d 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -158,6 +158,7 @@ struct kfd_device_info {
>>>        uint8_t num_of_watch_points;
>>>        uint16_t mqd_size_aligned;
>>>        bool supports_cwsr;
>>> +    bool needs_iommu_device;
>>>        bool needs_pci_atomics;
>>>    };
>>>    @@ -517,15 +518,15 @@ struct kfd_process_device {
>>>        uint64_t scratch_base;
>>>        uint64_t scratch_limit;
>>>    -    /* Is this process/pasid bound to this device?
>>> (amd_iommu_bind_pasid) */
>>> -    enum kfd_pdd_bound bound;
>>> -
>>>        /* Flag used to tell the pdd has dequeued from the dqm.
>>>         * This is used to prevent dev->dqm->ops.process_termination()
>>> from
>>>         * being called twice when it is already called in IOMMU callback
>>>         * function.
>>>         */
>>>        bool already_dequeued;
>>> +
>>> +    /* Is this process/pasid bound to this device?
>>> (amd_iommu_bind_pasid) */
>>> +    enum kfd_pdd_bound bound;
>>>    };
>>>      #define qpd_to_pdd(x) container_of(x, struct kfd_process_device,
>>> qpd)
>>> @@ -590,6 +591,10 @@ struct kfd_process {
>>>        bool signal_event_limit_reached;
>>>    };
>>>    +#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
>>> +extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>>> +extern struct srcu_struct kfd_processes_srcu;
>>> +
>>>    /**
>>>     * Ioctl function type.
>>>     *
>>> @@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
>>>      struct kfd_process_device *kfd_bind_process_to_device(struct
>>> kfd_dev *dev,
>>>                            struct kfd_process *p);
>>> -int kfd_bind_processes_to_device(struct kfd_dev *dev);
>>> -void kfd_unbind_processes_from_device(struct kfd_dev *dev);
>>> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
>>> int pasid);
>>>    struct kfd_process_device *kfd_get_process_device_data(struct
>>> kfd_dev *dev,
>>>                                struct kfd_process *p);
>>>    struct kfd_process_device *kfd_create_process_device_data(struct
>>> kfd_dev *dev,
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index 4ff5f0f..e9aee76 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -35,16 +35,16 @@ struct mm_struct;
>>>      #include "kfd_priv.h"
>>>    #include "kfd_dbgmgr.h"
>>> +#include "kfd_iommu.h"
>>>      /*
>>>     * List of struct kfd_process (field kfd_process).
>>>     * Unique/indexed by mm_struct*
>>>     */
>>> -#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
>>> -static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>>> +DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>>>    static DEFINE_MUTEX(kfd_processes_mutex);
>>>    -DEFINE_STATIC_SRCU(kfd_processes_srcu);
>>> +DEFINE_SRCU(kfd_processes_srcu);
>>>      static struct workqueue_struct *kfd_process_wq;
>>>    @@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct
>>> work_struct *work)
>>>    {
>>>        struct kfd_process *p = container_of(work, struct kfd_process,
>>>                             release_work);
>>> -    struct kfd_process_device *pdd;
>>>    -    pr_debug("Releasing process (pasid %d) in workqueue\n",
>>> p->pasid);
>>> -
>>> -    list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
>>> -        if (pdd->bound == PDD_BOUND)
>>> -            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
>>> -    }
>>> +    kfd_iommu_unbind_process(p);
>>>          kfd_process_destroy_pdds(p);
>>>    @@ -429,133 +423,13 @@ struct kfd_process_device
>>> *kfd_bind_process_to_device(struct kfd_dev *dev,
>>>            return ERR_PTR(-ENOMEM);
>>>        }
>>>    -    if (pdd->bound == PDD_BOUND) {
>>> -        return pdd;
>>> -    } else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
>>> -        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
>>> -        return ERR_PTR(-EINVAL);
>>> -    }
>>> -
>>> -    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
>>> -    if (err < 0)
>>> +    err = kfd_iommu_bind_process_to_device(pdd);
>>> +    if (err)
>>>            return ERR_PTR(err);
>>>    -    pdd->bound = PDD_BOUND;
>>> -
>>>        return pdd;
>>>    }
>>>    -/*
>>> - * Bind processes do the device that have been temporarily unbound
>>> - * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
>>> - */
>>> -int kfd_bind_processes_to_device(struct kfd_dev *dev)
>>> -{
>>> -    struct kfd_process_device *pdd;
>>> -    struct kfd_process *p;
>>> -    unsigned int temp;
>>> -    int err = 0;
>>> -
>>> -    int idx = srcu_read_lock(&kfd_processes_srcu);
>>> -
>>> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> -        mutex_lock(&p->mutex);
>>> -        pdd = kfd_get_process_device_data(dev, p);
>>> -
>>> -        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
>>> -            mutex_unlock(&p->mutex);
>>> -            continue;
>>> -        }
>>> -
>>> -        err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
>>> -                p->lead_thread);
>>> -        if (err < 0) {
>>> -            pr_err("Unexpected pasid %d binding failure\n",
>>> -                    p->pasid);
>>> -            mutex_unlock(&p->mutex);
>>> -            break;
>>> -        }
>>> -
>>> -        pdd->bound = PDD_BOUND;
>>> -        mutex_unlock(&p->mutex);
>>> -    }
>>> -
>>> -    srcu_read_unlock(&kfd_processes_srcu, idx);
>>> -
>>> -    return err;
>>> -}
>>> -
>>> -/*
>>> - * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
>>> - * processes will be restored to PDD_BOUND state in
>>> - * kfd_bind_processes_to_device.
>>> - */
>>> -void kfd_unbind_processes_from_device(struct kfd_dev *dev)
>>> -{
>>> -    struct kfd_process_device *pdd;
>>> -    struct kfd_process *p;
>>> -    unsigned int temp;
>>> -
>>> -    int idx = srcu_read_lock(&kfd_processes_srcu);
>>> -
>>> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> -        mutex_lock(&p->mutex);
>>> -        pdd = kfd_get_process_device_data(dev, p);
>>> -
>>> -        if (WARN_ON(!pdd)) {
>>> -            mutex_unlock(&p->mutex);
>>> -            continue;
>>> -        }
>>> -
>>> -        if (pdd->bound == PDD_BOUND)
>>> -            pdd->bound = PDD_BOUND_SUSPENDED;
>>> -        mutex_unlock(&p->mutex);
>>> -    }
>>> -
>>> -    srcu_read_unlock(&kfd_processes_srcu, idx);
>>> -}
>>> -
>>> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
>>> int pasid)
>>> -{
>>> -    struct kfd_process *p;
>>> -    struct kfd_process_device *pdd;
>>> -
>>> -    /*
>>> -     * Look for the process that matches the pasid. If there is no such
>>> -     * process, we either released it in amdkfd's own notifier, or
>>> there
>>> -     * is a bug. Unfortunately, there is no way to tell...
>>> -     */
>>> -    p = kfd_lookup_process_by_pasid(pasid);
>>> -    if (!p)
>>> -        return;
>>> -
>>> -    pr_debug("Unbinding process %d from IOMMU\n", pasid);
>>> -
>>> -    mutex_lock(kfd_get_dbgmgr_mutex());
>>> -
>>> -    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
>>> -        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
>>> -            kfd_dbgmgr_destroy(dev->dbgmgr);
>>> -            dev->dbgmgr = NULL;
>>> -        }
>>> -    }
>>> -
>>> -    mutex_unlock(kfd_get_dbgmgr_mutex());
>>> -
>>> -    mutex_lock(&p->mutex);
>>> -
>>> -    pdd = kfd_get_process_device_data(dev, p);
>>> -    if (pdd)
>>> -        /* For GPU relying on IOMMU, we need to dequeue here
>>> -         * when PASID is still bound.
>>> -         */
>>> -        kfd_process_dequeue_from_device(pdd);
>>> -
>>> -    mutex_unlock(&p->mutex);
>>> -
>>> -    kfd_unref_process(p);
>>> -}
>>> -
>>>    struct kfd_process_device *kfd_get_first_process_device_data(
>>>                            struct kfd_process *p)
>>>    {
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 7783250..2506155 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -35,6 +35,7 @@
>>>    #include "kfd_crat.h"
>>>    #include "kfd_topology.h"
>>>    #include "kfd_device_queue_manager.h"
>>> +#include "kfd_iommu.h"
>>>      /* topology_device_list - Master list of all topology devices */
>>>    static struct list_head topology_device_list;
>>> @@ -875,19 +876,8 @@ static void find_system_memory(const struct
>>> dmi_header *dm,
>>>     */
>>>    static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>>>    {
>>> -    struct kfd_perf_properties *props;
>>> -
>>> -    if (amd_iommu_pc_supported()) {
>>> -        props = kfd_alloc_struct(props);
>>> -        if (!props)
>>> -            return -ENOMEM;
>>> -        strcpy(props->block_name, "iommu");
>>> -        props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
>>> -            amd_iommu_pc_get_max_counters(0); /* assume one iommu */
>>> -        list_add_tail(&props->list, &kdev->perf_props);
>>> -    }
>>> -
>>> -    return 0;
>>> +    /* These are the only counters supported so far */
>>> +    return kfd_iommu_add_perf_counters(kdev);
>>>    }
>>>      /* kfd_add_non_crat_information - Add information that is not
>>> currently
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index 53fca1f..c0be2be 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -25,7 +25,7 @@
>>>      #include <linux/types.h>
>>>    #include <linux/list.h>
>>> -#include "kfd_priv.h"
>>> +#include "kfd_crat.h"
>>>      #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>>>    @@ -183,8 +183,4 @@ struct kfd_topology_device
>>> *kfd_create_topology_device(
>>>            struct list_head *device_list);
>>>    void kfd_release_topology_device_list(struct list_head *device_list);
>>>    -extern bool amd_iommu_pc_supported(void);
>>> -extern u8 amd_iommu_pc_get_max_banks(u16 devid);
>>> -extern u8 amd_iommu_pc_get_max_counters(u16 devid);
>>> -
>>>    #endif /* __KFD_TOPOLOGY_H__ */
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 43294 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]     ` <1517967174-21709-15-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-09 12:34       ` Christian König
       [not found]         ` <16aa5300-7ddc-518a-2080-70cb31b6ad56-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-09 12:34 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

I really wonder if sharing the GPUVM instance from a render node file 
descriptor wouldn't be easier.

You could just use the existing IOCTL for allocating and mapping memory 
on a render node and then give the prepared fd to kfd to use.

Regards,
Christian.

Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
> From: Oak Zeng <Oak.Zeng@amd.com>
>
> Populate DRM render device minor in kfd topology
>
> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
>   2 files changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 2506155..ac28abc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
>   			dev->node_props.device_id);
>   	sysfs_show_32bit_prop(buffer, "location_id",
>   			dev->node_props.location_id);
> +	sysfs_show_32bit_prop(buffer, "drm_render_minor",
> +			dev->node_props.drm_render_minor);
>   
>   	if (dev->gpu) {
>   		log_max_watch_addr =
> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>   		dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
>   	dev->node_props.max_engine_clk_ccompute =
>   		cpufreq_quick_get_max(0) / 1000;
> +	dev->node_props.drm_render_minor =
> +		gpu->shared_resources.drm_render_minor;
>   
>   	kfd_fill_mem_clk_max_info(dev);
>   	kfd_fill_iolink_non_crat_info(dev);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index c0be2be..eb54cfc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -71,6 +71,7 @@ struct kfd_node_properties {
>   	uint32_t location_id;
>   	uint32_t max_engine_clk_fcompute;
>   	uint32_t max_engine_clk_ccompute;
> +	int32_t  drm_render_minor;
>   	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>   };
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]         ` <16aa5300-7ddc-518a-2080-70cb31b6ad56-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-09 20:31           ` Felix Kuehling
       [not found]             ` <5f1c33b3-b3ed-f627-9b5c-347b2399d90f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-09 20:31 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-09 07:34 AM, Christian König wrote:
> I really wonder if sharing the GPUVM instance from a render node file
> descriptor wouldn't be easier.
>
> You could just use the existing IOCTL for allocating and mapping
> memory on a render node and then give the prepared fd to kfd to use.

In amd-kfd-staging we have an ioctl to import a DMABuf into the KFD VM
for interoperability between graphics and compute. I'm planning to
upstream that later. It still depends on the KFD calls for managing the
GPU mappings. The KFD GPU mapping calls need to interact with the
eviction logic. TLB flushing involves the HIQ/KIQ for figuring out the
VMID/PASID mapping, which is managed asynchronously by the HWS, not
amdgpu_ids.c. User pointers need a different MMU notifier. AQL queue
buffers on some GPUs need a double-mapping workaround for a HW
wraparound bug. I could go on. So this is not easy to retrofit into
amdgpu_gem and amdgpu_cs. Also, KFD VMs are created differently
(AMDGPU_VM_CONTEXT_COMPUTE, amdkfd_vm wrapper structure).

What's more, buffers allocated through amdgpu_gem calls create GEM
objects that we don't need. And exporting and importing DMABufs adds
more overhead and a potential to run into the process file-descriptor
limit (maybe the FD could be discarded after importing).

I honestly thought about whether this would be feasible when we
implemented the CPU mapping through the DRM FD. But it would be nothing
short of a complete redesign of the KFD memory management code. It would
be months of work, more instability, for questionable benefits. I don't
think it would be in the interest of end users and customers.

I just thought of a slightly different approach I would consider more
realistic, without having thought through all the details: Adding
KFD-specific memory management ioctls to the amdgpu device. Basically
call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead
of KFD ioctl functions. But we'd still have KFD ioctls for other things,
and the new amdgpu ioctls would be KFD-specific and not useful for
graphics applications. It's probably still several weeks of work, but
shouldn't have major teething issues because the internal logic and
functionality would be basically unchanged. It would just move the
ioctls from one device to another.

Regards,
  Felix

>
> Regards,
> Christian.
>
> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>> From: Oak Zeng <Oak.Zeng@amd.com>
>>
>> Populate DRM render device minor in kfd topology
>>
>> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
>>   2 files changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> index 2506155..ac28abc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj,
>> struct attribute *attr,
>>               dev->node_props.device_id);
>>       sysfs_show_32bit_prop(buffer, "location_id",
>>               dev->node_props.location_id);
>> +    sysfs_show_32bit_prop(buffer, "drm_render_minor",
>> +            dev->node_props.drm_render_minor);
>>         if (dev->gpu) {
>>           log_max_watch_addr =
>> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>>           dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
>>       dev->node_props.max_engine_clk_ccompute =
>>           cpufreq_quick_get_max(0) / 1000;
>> +    dev->node_props.drm_render_minor =
>> +        gpu->shared_resources.drm_render_minor;
>>         kfd_fill_mem_clk_max_info(dev);
>>       kfd_fill_iolink_non_crat_info(dev);
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> index c0be2be..eb54cfc 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>> @@ -71,6 +71,7 @@ struct kfd_node_properties {
>>       uint32_t location_id;
>>       uint32_t max_engine_clk_fcompute;
>>       uint32_t max_engine_clk_ccompute;
>> +    int32_t  drm_render_minor;
>>       uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>>   };
>>   
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]             ` <5f1c33b3-b3ed-f627-9b5c-347b2399d90f-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11  9:55               ` Christian König
       [not found]                 ` <dd476550-9a48-3adc-30e6-8a94bd04833b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-11  9:55 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 09.02.2018 um 21:31 schrieb Felix Kuehling:
> On 2018-02-09 07:34 AM, Christian König wrote:
>> I really wonder if sharing the GPUVM instance from a render node file
>> descriptor wouldn't be easier.
>>
>> You could just use the existing IOCTL for allocating and mapping
>> memory on a render node and then give the prepared fd to kfd to use.
> In amd-kfd-staging we have an ioctl to import a DMABuf into the KFD VM
> for interoperability between graphics and compute. I'm planning to
> upstream that later.

Could you move that around and upstream it first?

> It still depends on the KFD calls for managing the
> GPU mappings. The KFD GPU mapping calls need to interact with the
> eviction logic. TLB flushing involves the HIQ/KIQ for figuring out the
> VMID/PASID mapping, which is managed asynchronously by the HWS, not
> amdgpu_ids.c. User pointers need a different MMU notifier. AQL queue
> buffers on some GPUs need a double-mapping workaround for a HW
> wraparound bug. I could go on. So this is not easy to retrofit into
> amdgpu_gem and amdgpu_cs. Also, KFD VMs are created differently
> (AMDGPU_VM_CONTEXT_COMPUTE, amdkfd_vm wrapper structure).

Well, that is actually the reason why I'm asking about it.

First of all kernel development is not use case driver, e.g. we should 
not implement something because userspace needs it in a specific way, 
but rather because the hardware supports it in a specific way.

This means that in theory we should have the fixes for the HW problems 
in both interfaces. That doesn't make sense at the moment because we 
don't support user space queue through the render node file descriptor, 
but people are starting to ask for that as well.

> What's more, buffers allocated through amdgpu_gem calls create GEM
> objects that we don't need.

I see that as an advantage rather than a problem, cause it fixes a 
couple of problems with the KFD where the address space of the inode is 
not managed correctly as far as I can see.

That implementation issues never caused problems right now because you 
never tried to unmap doorbells. But with the new eviction code that is 
about to change, isn't it?

> And exporting and importing DMABufs adds
> more overhead and a potential to run into the process file-descriptor
> limit (maybe the FD could be discarded after importing).

Closing the file descriptor is a must have after importing, so that 
isn't an issue.

But I agree that exporting and reimporting the file descriptor adds some 
additional overhead.

> I honestly thought about whether this would be feasible when we
> implemented the CPU mapping through the DRM FD. But it would be nothing
> short of a complete redesign of the KFD memory management code. It would
> be months of work, more instability, for questionable benefits. I don't
> think it would be in the interest of end users and customers.

Yeah, agree on that.

> I just thought of a slightly different approach I would consider more
> realistic, without having thought through all the details: Adding
> KFD-specific memory management ioctls to the amdgpu device. Basically
> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead
> of KFD ioctl functions. But we'd still have KFD ioctls for other things,
> and the new amdgpu ioctls would be KFD-specific and not useful for
> graphics applications. It's probably still several weeks of work, but
> shouldn't have major teething issues because the internal logic and
> functionality would be basically unchanged. It would just move the
> ioctls from one device to another.

My thinking went into a similar direction. But instead of exposing the 
KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.

And then use the DRM FD in the KFD for things like the buffer provider 
of a device, e.g. no separate IDR for BOs in the KFD but rather a 
reference to the DRM FD.

We can still manage the VM through KFD IOCTL, but the BOs and the VM are 
actually provided by the DRM FD.

Regards,
Christian.

> Regards,
>    Felix
>
>> Regards,
>> Christian.
>>
>> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>>> From: Oak Zeng <Oak.Zeng@amd.com>
>>>
>>> Populate DRM render device minor in kfd topology
>>>
>>> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
>>>    2 files changed, 5 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> index 2506155..ac28abc 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj,
>>> struct attribute *attr,
>>>                dev->node_props.device_id);
>>>        sysfs_show_32bit_prop(buffer, "location_id",
>>>                dev->node_props.location_id);
>>> +    sysfs_show_32bit_prop(buffer, "drm_render_minor",
>>> +            dev->node_props.drm_render_minor);
>>>          if (dev->gpu) {
>>>            log_max_watch_addr =
>>> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>>>            dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
>>>        dev->node_props.max_engine_clk_ccompute =
>>>            cpufreq_quick_get_max(0) / 1000;
>>> +    dev->node_props.drm_render_minor =
>>> +        gpu->shared_resources.drm_render_minor;
>>>          kfd_fill_mem_clk_max_info(dev);
>>>        kfd_fill_iolink_non_crat_info(dev);
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> index c0be2be..eb54cfc 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>> @@ -71,6 +71,7 @@ struct kfd_node_properties {
>>>        uint32_t location_id;
>>>        uint32_t max_engine_clk_fcompute;
>>>        uint32_t max_engine_clk_ccompute;
>>> +    int32_t  drm_render_minor;
>>>        uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>>>    };
>>>    
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11 12:42       ` Oded Gabbay
  2018-02-12 19:19         ` Felix Kuehling
  0 siblings, 1 reply; 71+ messages in thread
From: Oded Gabbay @ 2018-02-11 12:42 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> This fence is used by KFD to keep memory resident while user mode
> queues are enabled. Trying to evict memory will trigger the
> enable_signaling callback, which starts a KFD eviction, which
> involves preempting user mode queues before signaling the fence.
> There is one such fence per process.
>
> v2:
> * Grab a reference to mm_struct
> * Dereference fence after NULL check
> * Simplify fence release, no need to signal without anyone waiting
> * Added signed-off-by Harish, who is the original author of this code
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 179 +++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  21 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
>  7 files changed, 241 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index d6e5b72..43dc3f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -130,6 +130,7 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += \
>          amdgpu_amdkfd.o \
> +        amdgpu_amdkfd_fence.o \
>          amdgpu_amdkfd_gfx_v8.o
>
>  # add cgs
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 2a519f9..492c7af 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -29,6 +29,8 @@
>  #include <linux/mmu_context.h>
>  #include <kgd_kfd_interface.h>
>
> +extern const struct kgd2kfd_calls *kgd2kfd;
> +
>  struct amdgpu_device;
>
>  struct kgd_mem {
> @@ -37,6 +39,19 @@ struct kgd_mem {
>         void *cpu_ptr;
>  };
>
> +/* KFD Memory Eviction */
> +struct amdgpu_amdkfd_fence {
> +       struct dma_fence base;
> +       struct mm_struct *mm;
> +       spinlock_t lock;
> +       char timeline_name[TASK_COMM_LEN];
> +};
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +                                                      struct mm_struct *mm);
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
> +
>  int amdgpu_amdkfd_init(void);
>  void amdgpu_amdkfd_fini(void);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> new file mode 100644
> index 0000000..cf2f1e9
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/dma-fence.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +#include <linux/stacktrace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/sched/mm.h>
> +#include "amdgpu_amdkfd.h"
> +
> +const struct dma_fence_ops amd_kfd_fence_ops;
> +static atomic_t fence_seq = ATOMIC_INIT(0);
> +
> +/* Eviction Fence
> + * Fence helper functions to deal with KFD memory eviction.
> + * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
> + *  evicted unless all the user queues for that process are evicted.
> + *
> + * All the BOs in a process share an eviction fence. When process X wants
> + * to map VRAM memory but TTM can't find enough space, TTM will attempt to
> + * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
> + * by calling ttm_bo_driver->eviction_valuable().
> + *
> + * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
> + *  to process X. Otherwise, it will return true to indicate BO can be
> + *  evicted by TTM.
> + *
> + * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
> + * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
> + * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
> + *
> + * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
> + *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
> + *  --> amdgpu_amdkfd_fence.enable_signaling
> + *
> + * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
> + * user queues and signal fence. The work item will also start another delayed
> + * work item to restore BOs
> + */
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +                                                      struct mm_struct *mm)
> +{
> +       struct amdgpu_amdkfd_fence *fence = NULL;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (fence == NULL)
> +               return NULL;
> +
> +       /* This reference gets released in amd_kfd_fence_release */
> +       mmgrab(mm);
> +       fence->mm = mm;
> +       get_task_comm(fence->timeline_name, current);
> +       spin_lock_init(&fence->lock);
> +
> +       dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
> +                  context, atomic_inc_return(&fence_seq));
> +
> +       return fence;
> +}
> +
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence;
> +
> +       if (!f)
> +               return NULL;
> +
> +       fence = container_of(f, struct amdgpu_amdkfd_fence, base);
> +       if (fence && f->ops == &amd_kfd_fence_ops)
> +               return fence;
> +
> +       return NULL;
> +}
> +
> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
> +{
> +       return "amdgpu_amdkfd_fence";
> +}
> +
> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       return fence->timeline_name;
> +}
> +
> +/**
> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
> + *  a KFD BO and schedules a job to move the BO.
> + *  If fence is already signaled return true.
> + *  If fence is not signaled schedule a evict KFD process work item.
> + */
> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       if (!fence)
> +               return false;
> +
> +       if (dma_fence_is_signaled(f))
> +               return true;
> +
> +       if (!kgd2kfd->schedule_evict_and_restore_process(fence->mm, f))
> +               return true;
> +
> +       return false;
> +}
> +
> +/**
> + * amd_kfd_fence_release - callback that fence can be freed
> + *
> + * @fence: fence
> + *
> + * This function is called when the reference count becomes zero.
> + * Drops the mm_struct reference and RCU schedules freeing up the fence.
> + */
> +static void amd_kfd_fence_release(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       /* Unconditionally signal the fence. The process is getting
> +        * terminated.
> +        */
> +       if (WARN_ON(!fence))
> +               return; /* Not an amdgpu_amdkfd_fence */
> +
> +       mmdrop(fence->mm);
> +       kfree_rcu(f, rcu);
> +}
> +
> +/**
> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
> + *  if same return TRUE else return FALSE.
> + *
> + * @f: [IN] fence
> + * @mm: [IN] mm that needs to be verified
> + */
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       if (!fence)
> +               return false;
> +       else if (fence->mm == mm)
> +               return true;
> +
> +       return false;
> +}
> +
> +const struct dma_fence_ops amd_kfd_fence_ops = {
> +       .get_driver_name = amd_kfd_fence_get_driver_name,
> +       .get_timeline_name = amd_kfd_fence_get_timeline_name,
> +       .enable_signaling = amd_kfd_fence_enable_signaling,
> +       .signaled = NULL,
> +       .wait = dma_fence_default_wait,
> +       .release = amd_kfd_fence_release,
> +};
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 65d5a4e..ca00dd2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -36,8 +36,9 @@
>  #define AMDGPU_MAX_UVD_ENC_RINGS       2
>
>  /* some special values for the owner field */
> -#define AMDGPU_FENCE_OWNER_UNDEFINED   ((void*)0ul)
> -#define AMDGPU_FENCE_OWNER_VM          ((void*)1ul)
> +#define AMDGPU_FENCE_OWNER_UNDEFINED   ((void *)0ul)
> +#define AMDGPU_FENCE_OWNER_VM          ((void *)1ul)
> +#define AMDGPU_FENCE_OWNER_KFD         ((void *)2ul)
>
>  #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>  #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index df65c66..b8d3b87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -31,6 +31,7 @@
>  #include <drm/drmP.h>
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>
>  struct amdgpu_sync_entry {
>         struct hlist_node       node;
> @@ -85,11 +86,20 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
>   */
>  static void *amdgpu_sync_get_owner(struct dma_fence *f)
>  {
> -       struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> +       struct drm_sched_fence *s_fence;
> +       struct amdgpu_amdkfd_fence *kfd_fence;
> +
> +       if (!f)
> +               return AMDGPU_FENCE_OWNER_UNDEFINED;
>
> +       s_fence = to_drm_sched_fence(f);
>         if (s_fence)
>                 return s_fence->owner;
>
> +       kfd_fence = to_amdgpu_amdkfd_fence(f);
> +       if (kfd_fence)
> +               return AMDGPU_FENCE_OWNER_KFD;
> +
>         return AMDGPU_FENCE_OWNER_UNDEFINED;
>  }
>
> @@ -204,11 +214,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>         for (i = 0; i < flist->shared_count; ++i) {
>                 f = rcu_dereference_protected(flist->shared[i],
>                                               reservation_object_held(resv));
> +               /* We only want to trigger KFD eviction fences on
> +                * evict or move jobs. Skip KFD fences otherwise.
> +                */
> +               fence_owner = amdgpu_sync_get_owner(f);
> +               if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> +                   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +                       continue;
> +
>                 if (amdgpu_sync_same_dev(adev, f)) {
>                         /* VM updates are only interesting
>                          * for other VM updates and moves.
>                          */
> -                       fence_owner = amdgpu_sync_get_owner(f);
>                         if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>                             (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>                             ((owner == AMDGPU_FENCE_OWNER_VM) !=
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e4bb435..c3f33d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -46,6 +46,7 @@
>  #include "amdgpu.h"
>  #include "amdgpu_object.h"
>  #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>  #include "bif/bif_4_1_d.h"
>
>  #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
> @@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>  {
>         unsigned long num_pages = bo->mem.num_pages;
>         struct drm_mm_node *node = bo->mem.mm_node;
> +       struct reservation_object_list *flist;
> +       struct dma_fence *f;
> +       int i;
> +
> +       /* If bo is a KFD BO, check if the bo belongs to the current process.
> +        * If true, then return false as any KFD process needs all its BOs to
> +        * be resident to run successfully
> +        */
> +       flist = reservation_object_get_list(bo->resv);
> +       if (flist) {
> +               for (i = 0; i < flist->shared_count; ++i) {
> +                       f = rcu_dereference_protected(flist->shared[i],
> +                               reservation_object_held(bo->resv));
> +                       if (amd_kfd_fence_check_mm(f, current->mm))
> +                               return false;
> +               }
> +       }
>
>         switch (bo->mem.mem_type) {
>         case TTM_PL_TT:
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 94eab548..9e35249 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -30,6 +30,7 @@
>
>  #include <linux/types.h>
>  #include <linux/bitmap.h>
> +#include <linux/dma-fence.h>
>
>  struct pci_dev;
>
> @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>   *
>   * @resume: Notifies amdkfd about a resume action done to a kgd device
>   *
> + * @schedule_evict_and_restore_process: Schedules work queue that will prepare
> + * for safe eviction of KFD BOs that belong to the specified process.
> + *
>   * This structure contains function callback pointers so the kgd driver
>   * will notify to the amdkfd about certain status changes.
>   *
> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>         void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>         void (*suspend)(struct kfd_dev *kfd);
>         int (*resume)(struct kfd_dev *kfd);
> +       int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
> +                       struct dma_fence *fence);
>  };
>
>  int kgd2kfd_init(unsigned interface_version,
> --
> 2.7.4
>

Hi Felix,
Do you object to me changing amd_kfd_ to amdkfd_ in the various
structures and functions ?
So far, we don't have anything with prefix of amd_kfd_ so I would like
to keep on consistency.

Other then that, this patch is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>


Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface
       [not found]     ` <1517967174-21709-6-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11 12:44       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-11 12:44 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

Patches 1-5 applied to -next
Thanks,
Oded

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> Reviewed-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  9 ---------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 ----------
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  2 --
>  3 files changed, 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> index b8be7b96..1362181 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> @@ -139,7 +139,6 @@ static uint32_t kgd_address_watch_get_offset(struct kgd_dev *kgd,
>  static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd, uint8_t vmid);
>  static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>                                                         uint8_t vmid);
> -static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
>
>  static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
>  static void set_scratch_backing_va(struct kgd_dev *kgd,
> @@ -196,7 +195,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
>         .address_watch_get_offset = kgd_address_watch_get_offset,
>         .get_atc_vmid_pasid_mapping_pasid = get_atc_vmid_pasid_mapping_pasid,
>         .get_atc_vmid_pasid_mapping_valid = get_atc_vmid_pasid_mapping_valid,
> -       .write_vmid_invalidate_request = write_vmid_invalidate_request,
>         .get_fw_version = get_fw_version,
>         .set_scratch_backing_va = set_scratch_backing_va,
>         .get_tile_config = get_tile_config,
> @@ -790,13 +788,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>         return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
>  }
>
> -static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
> -{
> -       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> -
> -       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> -}
> -
>  static void set_scratch_backing_va(struct kgd_dev *kgd,
>                                         uint64_t va, uint32_t vmid)
>  {
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> index 744c05b..5130eac 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> @@ -81,7 +81,6 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
>                                 uint32_t queue_id);
>  static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
>                                 unsigned int utimeout);
> -static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
>  static int kgd_address_watch_disable(struct kgd_dev *kgd);
>  static int kgd_address_watch_execute(struct kgd_dev *kgd,
>                                         unsigned int watch_point_id,
> @@ -99,7 +98,6 @@ static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
>                 uint8_t vmid);
>  static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>                 uint8_t vmid);
> -static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
>  static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
>  static void set_scratch_backing_va(struct kgd_dev *kgd,
>                                         uint64_t va, uint32_t vmid);
> @@ -157,7 +155,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
>                         get_atc_vmid_pasid_mapping_pasid,
>         .get_atc_vmid_pasid_mapping_valid =
>                         get_atc_vmid_pasid_mapping_valid,
> -       .write_vmid_invalidate_request = write_vmid_invalidate_request,
>         .get_fw_version = get_fw_version,
>         .set_scratch_backing_va = set_scratch_backing_va,
>         .get_tile_config = get_tile_config,
> @@ -707,13 +704,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>         return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
>  }
>
> -static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
> -{
> -       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> -
> -       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> -}
> -
>  static int kgd_address_watch_disable(struct kgd_dev *kgd)
>  {
>         return 0;
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index a6752bd..94eab548 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -258,8 +258,6 @@ struct kfd2kgd_calls {
>         uint16_t (*get_atc_vmid_pasid_mapping_pasid)(
>                                         struct kgd_dev *kgd,
>                                         uint8_t vmid);
> -       void (*write_vmid_invalidate_request)(struct kgd_dev *kgd,
> -                                       uint8_t vmid);
>
>         uint16_t (*get_fw_version)(struct kgd_dev *kgd,
>                                 enum kgd_engine_type type);
> --
> 2.7.4
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]     ` <1517967174-21709-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11 12:54       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-11 12:54 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Add GPUVM size and DRM render node. Also add function to query the
> VMID mask to avoid hard-coding it in multiple places later.
>
> v2:
> * Cut off GPUVM size at the VA hole
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 20 ++++++++++++++++++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>  3 files changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index c9f204d..25c2aed 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -30,6 +30,8 @@
>  const struct kgd2kfd_calls *kgd2kfd;
>  bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>
> +static const unsigned int compute_vmid_bitmap = 0xFF00;
> +
>  int amdgpu_amdkfd_init(void)
>  {
>         int ret;
> @@ -137,9 +139,13 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
>         int last_valid_bit;
>         if (adev->kfd) {
>                 struct kgd2kfd_shared_resources gpu_resources = {
> -                       .compute_vmid_bitmap = 0xFF00,
> +                       .compute_vmid_bitmap = compute_vmid_bitmap,
>                         .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
> -                       .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
> +                       .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
> +                       .gpuvm_size = min(adev->vm_manager.max_pfn
> +                                         << AMDGPU_GPU_PAGE_SHIFT,
> +                                         AMDGPU_VA_HOLE_START),
> +                       .drm_render_minor = adev->ddev->render->index
>                 };
>
>                 /* this is going to have a few of the MSBs set that we need to
> @@ -351,3 +357,13 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
>
>         return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
>  }
> +
> +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
> +{
> +       if (adev->kfd) {
> +               if ((1 << vmid) & compute_vmid_bitmap)
> +                       return true;
> +       }
> +
> +       return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 492c7af..9bed9fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -66,6 +66,8 @@ void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
>  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
>  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
>
> +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
> +
>  /* Shared API */
>  int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
>                         void **mem_obj, uint64_t *gpu_addr,
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 9e35249..36c706a 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -108,6 +108,12 @@ struct kgd2kfd_shared_resources {
>
>         /* Number of bytes at start of aperture reserved for KGD. */
>         size_t doorbell_start_offset;
> +
> +       /* GPUVM address space size in bytes */
> +       uint64_t gpuvm_size;
> +
> +       /* Minor device number of the render node */
> +       int drm_render_minor;
>  };
>
>  struct tile_config {
> --
> 2.7.4
>
This patch is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone
       [not found]     ` <1517967174-21709-9-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11 12:54       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-11 12:54 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Cloning a sync object is useful for waiting for a sync object
> without locking the original structure indefinitely, blocking
> other threads.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 35 ++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h |  1 +
>  2 files changed, 36 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index b8d3b87..2d6f5ec 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -322,6 +322,41 @@ struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit
>         return NULL;
>  }
>
> +/**
> + * amdgpu_sync_clone - clone a sync object
> + *
> + * @source: sync object to clone
> + * @clone: pointer to destination sync object
> + *
> + * Adds references to all unsignaled fences in @source to @clone. Also
> + * removes signaled fences from @source while at it.
> + */
> +int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone)
> +{
> +       struct amdgpu_sync_entry *e;
> +       struct hlist_node *tmp;
> +       struct dma_fence *f;
> +       int i, r;
> +
> +       hash_for_each_safe(source->fences, i, tmp, e, node) {
> +               f = e->fence;
> +               if (!dma_fence_is_signaled(f)) {
> +                       r = amdgpu_sync_fence(NULL, clone, f, e->explicit);
> +                       if (r)
> +                               return r;
> +               } else {
> +                       hash_del(&e->node);
> +                       dma_fence_put(f);
> +                       kmem_cache_free(amdgpu_sync_slab, e);
> +               }
> +       }
> +
> +       dma_fence_put(clone->last_vm_update);
> +       clone->last_vm_update = dma_fence_get(source->last_vm_update);
> +
> +       return 0;
> +}
> +
>  int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
>  {
>         struct amdgpu_sync_entry *e;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
> index 7aba38d..10cf23a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
> @@ -50,6 +50,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>  struct dma_fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync,
>                                      struct amdgpu_ring *ring);
>  struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit);
> +int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone);
>  int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr);
>  void amdgpu_sync_free(struct amdgpu_sync *sync);
>  int amdgpu_sync_init(void);
> --
> 2.7.4
>

This patch is:
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD
       [not found]     ` <1517967174-21709-10-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  8:42       ` Oded Gabbay
       [not found]         ` <CAFCwf10ThSfo8zphxPRH549LoyJ1H+XM89rpwpSNeJeuWYayAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  8:42 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> v2:
> * Removed unused flags from struct kgd_mem
> * Updated some comments
> * Added a check to unmap_memory_from_gpu whether BO was mapped
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile               |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        |    1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |   91 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |   66 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |   67 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 1501 +++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c        |    4 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h        |    2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c           |    7 +
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |   77 ++
>  10 files changed, 1813 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 43dc3f9..180b2a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -131,6 +131,7 @@ amdgpu-y += \
>  amdgpu-y += \
>          amdgpu_amdkfd.o \
>          amdgpu_amdkfd_fence.o \
> +        amdgpu_amdkfd_gpuvm.o \
>          amdgpu_amdkfd_gfx_v8.o
>
>  # add cgs
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index 25c2aed..01fb142 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -58,6 +58,7 @@ int amdgpu_amdkfd_init(void)
>  #else
>         ret = -ENOENT;
>  #endif
> +       amdgpu_amdkfd_gpuvm_init_mem_limits();
>
>         return ret;
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 9bed9fc..87fb4e6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -28,15 +28,41 @@
>  #include <linux/types.h>
>  #include <linux/mmu_context.h>
>  #include <kgd_kfd_interface.h>
> +#include <drm/ttm/ttm_execbuf_util.h>
> +#include "amdgpu_sync.h"
> +#include "amdgpu_vm.h"
>
>  extern const struct kgd2kfd_calls *kgd2kfd;
>
>  struct amdgpu_device;
>
> +struct kfd_bo_va_list {
> +       struct list_head bo_list;
> +       struct amdgpu_bo_va *bo_va;
> +       void *kgd_dev;
> +       bool is_mapped;
> +       uint64_t va;
> +       uint64_t pte_flags;
> +};
> +
>  struct kgd_mem {
> +       struct mutex lock;
>         struct amdgpu_bo *bo;
> -       uint64_t gpu_addr;
> -       void *cpu_ptr;
> +       struct list_head bo_va_list;
> +       /* protected by amdkfd_process_info.lock */
> +       struct ttm_validate_buffer validate_list;
> +       struct ttm_validate_buffer resv_list;
> +       uint32_t domain;
> +       unsigned int mapped_to_gpu_memory;
> +       uint64_t va;
> +
> +       uint32_t mapping_flags;
> +
> +       struct amdkfd_process_info *process_info;
> +
> +       struct amdgpu_sync sync;
> +
> +       bool aql_queue;
>  };
>
>  /* KFD Memory Eviction */
> @@ -52,6 +78,41 @@ struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>  bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
>  struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
>
> +struct amdkfd_process_info {
> +       /* List head of all VMs that belong to a KFD process */
> +       struct list_head vm_list_head;
> +       /* List head for all KFD BOs that belong to a KFD process. */
> +       struct list_head kfd_bo_list;
> +       /* Lock to protect kfd_bo_list */
> +       struct mutex lock;
> +
> +       /* Number of VMs */
> +       unsigned int n_vms;
> +       /* Eviction Fence */
> +       struct amdgpu_amdkfd_fence *eviction_fence;
> +};
> +
> +/* struct amdkfd_vm -
> + * For Memory Eviction KGD requires a mechanism to keep track of all KFD BOs
> + * belonging to a KFD process. All the VMs belonging to the same process point
> + * to the same amdkfd_process_info.
> + */
> +struct amdkfd_vm {
> +       /* Keep base as the first parameter for pointer compatibility between
> +        * amdkfd_vm and amdgpu_vm.
> +        */
> +       struct amdgpu_vm base;
> +
> +       /* List node in amdkfd_process_info.vm_list_head*/
> +       struct list_head vm_list_node;
> +
> +       struct amdgpu_device *adev;
> +       /* Points to the KFD process VM info*/
> +       struct amdkfd_process_info *process_info;
> +
> +       uint64_t pd_phys_addr;
> +};
> +
>  int amdgpu_amdkfd_init(void);
>  void amdgpu_amdkfd_fini(void);
>
> @@ -96,4 +157,30 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd);
>                 valid;                                                  \
>         })
>
> +/* GPUVM API */
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
> +                                         void **process_info,
> +                                         struct dma_fence **ef);
> +void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm);
> +uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm);
> +int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
> +               struct kgd_dev *kgd, uint64_t va, uint64_t size,
> +               void *vm, struct kgd_mem **mem,
> +               uint64_t *offset, uint32_t flags);
> +int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem);
> +int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
> +int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
> +int amdgpu_amdkfd_gpuvm_sync_memory(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
> +int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
> +               struct kgd_mem *mem, void **kptr, uint64_t *size);
> +int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
> +                                           struct dma_fence **ef);
> +
> +void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
> +void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo);
> +
>  #endif /* AMDGPU_AMDKFD_H_INCLUDED */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> index 1362181..65783d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
> @@ -143,6 +143,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>  static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
>  static void set_scratch_backing_va(struct kgd_dev *kgd,
>                                         uint64_t va, uint32_t vmid);
> +static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
> +               uint32_t page_table_base);
> +static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
> +static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
>
>  /* Because of REG_GET_FIELD() being used, we put this function in the
>   * asic specific file.
> @@ -199,7 +203,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
>         .set_scratch_backing_va = set_scratch_backing_va,
>         .get_tile_config = get_tile_config,
>         .get_cu_info = get_cu_info,
> -       .get_vram_usage = amdgpu_amdkfd_get_vram_usage
> +       .get_vram_usage = amdgpu_amdkfd_get_vram_usage,
> +       .create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
> +       .destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
> +       .get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
> +       .set_vm_context_page_table_base = set_vm_context_page_table_base,
> +       .alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
> +       .free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
> +       .map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
> +       .unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
> +       .sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
> +       .map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
> +       .restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
> +       .invalidate_tlbs = invalidate_tlbs,
> +       .invalidate_tlbs_vmid = invalidate_tlbs_vmid,
>  };
>
>  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
> @@ -855,3 +872,50 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
>         return hdr->common.ucode_version;
>  }
>
> +static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
> +                       uint32_t page_table_base)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +
> +       if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
> +               pr_err("trying to set page table base for wrong VMID\n");
> +               return;
> +       }
> +       WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
> +}
> +
> +static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
> +{
> +       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> +       int vmid;
> +       unsigned int tmp;
> +
> +       for (vmid = 0; vmid < 16; vmid++) {
> +               if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
> +                       continue;
> +
> +               tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
> +               if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
> +                       (tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
> +                       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> +                       RREG32(mmVM_INVALIDATE_RESPONSE);
> +                       break;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
> +{
> +       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> +
> +       if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
> +               pr_err("non kfd vmid\n");
> +               return 0;
> +       }
> +
> +       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> +       RREG32(mmVM_INVALIDATE_RESPONSE);
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> index 5130eac..1b5bf13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
> @@ -101,6 +101,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
>  static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
>  static void set_scratch_backing_va(struct kgd_dev *kgd,
>                                         uint64_t va, uint32_t vmid);
> +static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
> +               uint32_t page_table_base);
> +static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
> +static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
>
>  /* Because of REG_GET_FIELD() being used, we put this function in the
>   * asic specific file.
> @@ -159,7 +163,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
>         .set_scratch_backing_va = set_scratch_backing_va,
>         .get_tile_config = get_tile_config,
>         .get_cu_info = get_cu_info,
> -       .get_vram_usage = amdgpu_amdkfd_get_vram_usage
> +       .get_vram_usage = amdgpu_amdkfd_get_vram_usage,
> +       .create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
> +       .destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
> +       .get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
> +       .set_vm_context_page_table_base = set_vm_context_page_table_base,
> +       .alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
> +       .free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
> +       .map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
> +       .unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
> +       .sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
> +       .map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
> +       .restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
> +       .invalidate_tlbs = invalidate_tlbs,
> +       .invalidate_tlbs_vmid = invalidate_tlbs_vmid,
>  };
>
>  struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
> @@ -816,3 +833,51 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
>         /* Only 12 bit in use*/
>         return hdr->common.ucode_version;
>  }
> +
> +static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
> +               uint32_t page_table_base)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +
> +       if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
> +               pr_err("trying to set page table base for wrong VMID\n");
> +               return;
> +       }
> +       WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
> +}
> +
> +static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
> +{
> +       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> +       int vmid;
> +       unsigned int tmp;
> +
> +       for (vmid = 0; vmid < 16; vmid++) {
> +               if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
> +                       continue;
> +
> +               tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
> +               if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
> +                       (tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
> +                       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> +                       RREG32(mmVM_INVALIDATE_RESPONSE);
> +                       break;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
> +{
> +       struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
> +
> +       if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
> +               pr_err("non kfd vmid %d\n", vmid);
> +               return -EINVAL;
> +       }
> +
> +       WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
> +       RREG32(mmVM_INVALIDATE_RESPONSE);
> +       return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> new file mode 100644
> index 0000000..9703fd0
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -0,0 +1,1501 @@
> +/*
> + * Copyright 2014-2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#define pr_fmt(fmt) "kfd2kgd: " fmt
> +
> +#include <linux/list.h>
> +#include <drm/drmP.h>
> +#include "amdgpu_object.h"
> +#include "amdgpu_vm.h"
> +#include "amdgpu_amdkfd.h"
> +
> +/* Special VM and GART address alignment needed for VI pre-Fiji due to
> + * a HW bug.
> + */
> +#define VI_BO_SIZE_ALIGN (0x8000)
> +
> +/* Impose limit on how much memory KFD can use */
> +static struct {
> +       uint64_t max_system_mem_limit;
> +       int64_t system_mem_used;
> +       spinlock_t mem_limit_lock;
> +} kfd_mem_limit;
> +
> +/* Struct used for amdgpu_amdkfd_bo_validate */
> +struct amdgpu_vm_parser {
> +       uint32_t        domain;
> +       bool            wait;
> +};
> +
> +static const char * const domain_bit_to_string[] = {
> +               "CPU",
> +               "GTT",
> +               "VRAM",
> +               "GDS",
> +               "GWS",
> +               "OA"
> +};
> +
> +#define domain_string(domain) domain_bit_to_string[ffs(domain)-1]
> +
> +
> +
> +static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
> +{
> +       return (struct amdgpu_device *)kgd;
> +}
> +
> +static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
> +               struct kgd_mem *mem)
> +{
> +       struct kfd_bo_va_list *entry;
> +
> +       list_for_each_entry(entry, &mem->bo_va_list, bo_list)
> +               if (entry->bo_va->base.vm == avm)
> +                       return false;
> +
> +       return true;
> +}
> +
> +/* Set memory usage limits. Current, limits are
> + *  System (kernel) memory - 3/8th System RAM
> + */
> +void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
> +{
> +       struct sysinfo si;
> +       uint64_t mem;
> +
> +       si_meminfo(&si);
> +       mem = si.totalram - si.totalhigh;
> +       mem *= si.mem_unit;
> +
> +       spin_lock_init(&kfd_mem_limit.mem_limit_lock);
> +       kfd_mem_limit.max_system_mem_limit = (mem >> 1) - (mem >> 3);
> +       pr_debug("Kernel memory limit %lluM\n",
> +               (kfd_mem_limit.max_system_mem_limit >> 20));
> +}
> +
> +static int amdgpu_amdkfd_reserve_system_mem_limit(struct amdgpu_device *adev,
> +                                             uint64_t size, u32 domain)
> +{
> +       size_t acc_size;
> +       int ret = 0;
> +
> +       acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
> +                                      sizeof(struct amdgpu_bo));
> +
> +       spin_lock(&kfd_mem_limit.mem_limit_lock);
> +       if (domain == AMDGPU_GEM_DOMAIN_GTT) {
> +               if (kfd_mem_limit.system_mem_used + (acc_size + size) >
> +                       kfd_mem_limit.max_system_mem_limit) {
> +                       ret = -ENOMEM;
> +                       goto err_no_mem;
> +               }
> +               kfd_mem_limit.system_mem_used += (acc_size + size);
> +       }
> +err_no_mem:
> +       spin_unlock(&kfd_mem_limit.mem_limit_lock);
> +       return ret;
> +}
> +
> +static void unreserve_system_mem_limit(struct amdgpu_device *adev,
> +                                      uint64_t size, u32 domain)
> +{
> +       size_t acc_size;
> +
> +       acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
> +                                      sizeof(struct amdgpu_bo));
> +
> +       spin_lock(&kfd_mem_limit.mem_limit_lock);
> +       if (domain == AMDGPU_GEM_DOMAIN_GTT)
> +               kfd_mem_limit.system_mem_used -= (acc_size + size);
> +       WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
> +                 "kfd system memory accounting unbalanced");
> +
> +       spin_unlock(&kfd_mem_limit.mem_limit_lock);
> +}
> +
> +void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo)
> +{
> +       spin_lock(&kfd_mem_limit.mem_limit_lock);
> +
> +       if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT) {
> +               kfd_mem_limit.system_mem_used -=
> +                       (bo->tbo.acc_size + amdgpu_bo_size(bo));
> +       }
> +       WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
> +                 "kfd system memory accounting unbalanced");
> +
> +       spin_unlock(&kfd_mem_limit.mem_limit_lock);
> +}
> +
> +
> +/* amdgpu_amdkfd_remove_eviction_fence - Removes eviction fence(s) from BO's
> + *  reservation object.
> + *
> + * @bo: [IN] Remove eviction fence(s) from this BO
> + * @ef: [IN] If ef is specified, then this eviction fence is removed if it
> + *  is present in the shared list.
> + * @ef_list: [OUT] Returns list of eviction fences. These fences are removed
> + *  from BO's reservation object shared list.
> + * @ef_count: [OUT] Number of fences in ef_list.
> + *
> + * NOTE: If called with ef_list, then amdgpu_amdkfd_add_eviction_fence must be
> + *  called to restore the eviction fences and to avoid memory leak. This is
> + *  useful for shared BOs.
> + * NOTE: Must be called with BO reserved i.e. bo->tbo.resv->lock held.
> + */
> +static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
> +                                       struct amdgpu_amdkfd_fence *ef,
> +                                       struct amdgpu_amdkfd_fence ***ef_list,
> +                                       unsigned int *ef_count)
> +{
> +       struct reservation_object_list *fobj;
> +       struct reservation_object *resv;
> +       unsigned int i = 0, j = 0, k = 0, shared_count;
> +       unsigned int count = 0;
> +       struct amdgpu_amdkfd_fence **fence_list;
> +
> +       if (!ef && !ef_list)
> +               return -EINVAL;
> +
> +       if (ef_list) {
> +               *ef_list = NULL;
> +               *ef_count = 0;
> +       }
> +
> +       resv = bo->tbo.resv;
> +       fobj = reservation_object_get_list(resv);
> +
> +       if (!fobj)
> +               return 0;
> +
> +       preempt_disable();
> +       write_seqcount_begin(&resv->seq);
> +
> +       /* Go through all the shared fences in the resevation object. If
> +        * ef is specified and it exists in the list, remove it and reduce the
> +        * count. If ef is not specified, then get the count of eviction fences
> +        * present.
> +        */
> +       shared_count = fobj->shared_count;
> +       for (i = 0; i < shared_count; ++i) {
> +               struct dma_fence *f;
> +
> +               f = rcu_dereference_protected(fobj->shared[i],
> +                                             reservation_object_held(resv));
> +
> +               if (ef) {
> +                       if (f->context == ef->base.context) {
> +                               dma_fence_put(f);
> +                               fobj->shared_count--;
> +                       } else
> +                               RCU_INIT_POINTER(fobj->shared[j++], f);
> +
> +               } else if (to_amdgpu_amdkfd_fence(f))
> +                       count++;
> +       }
> +       write_seqcount_end(&resv->seq);
> +       preempt_enable();
> +
> +       if (ef || !count)
> +               return 0;
> +
> +       /* Alloc memory for count number of eviction fence pointers. Fill the
> +        * ef_list array and ef_count
> +        */
> +       fence_list = kcalloc(count, sizeof(struct amdgpu_amdkfd_fence *),
> +                            GFP_KERNEL);
> +       if (!fence_list)
> +               return -ENOMEM;
> +
> +       preempt_disable();
> +       write_seqcount_begin(&resv->seq);
> +
> +       j = 0;
> +       for (i = 0; i < shared_count; ++i) {
> +               struct dma_fence *f;
> +               struct amdgpu_amdkfd_fence *efence;
> +
> +               f = rcu_dereference_protected(fobj->shared[i],
> +                       reservation_object_held(resv));
> +
> +               efence = to_amdgpu_amdkfd_fence(f);
> +               if (efence) {
> +                       fence_list[k++] = efence;
> +                       fobj->shared_count--;
> +               } else
> +                       RCU_INIT_POINTER(fobj->shared[j++], f);
> +       }
> +
> +       write_seqcount_end(&resv->seq);
> +       preempt_enable();
> +
> +       *ef_list = fence_list;
> +       *ef_count = k;
> +
> +       return 0;
> +}
> +
> +/* amdgpu_amdkfd_add_eviction_fence - Adds eviction fence(s) back into BO's
> + *  reservation object.
> + *
> + * @bo: [IN] Add eviction fences to this BO
> + * @ef_list: [IN] List of eviction fences to be added
> + * @ef_count: [IN] Number of fences in ef_list.
> + *
> + * NOTE: Must call amdgpu_amdkfd_remove_eviction_fence before calling this
> + *  function.
> + */
> +static void amdgpu_amdkfd_add_eviction_fence(struct amdgpu_bo *bo,
> +                               struct amdgpu_amdkfd_fence **ef_list,
> +                               unsigned int ef_count)
> +{
> +       int i;
> +
> +       if (!ef_list || !ef_count)
> +               return;
> +
> +       for (i = 0; i < ef_count; i++) {
> +               amdgpu_bo_fence(bo, &ef_list[i]->base, true);
> +               /* Readding the fence takes an additional reference. Drop that
> +                * reference.
> +                */
> +               dma_fence_put(&ef_list[i]->base);
> +       }
> +
> +       kfree(ef_list);
> +}
> +
> +static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain,
> +                                    bool wait)
> +{
> +       struct ttm_operation_ctx ctx = { false, false };
> +       int ret;
> +
> +       if (WARN(amdgpu_ttm_tt_get_usermm(bo->tbo.ttm),
> +                "Called with userptr BO"))
> +               return -EINVAL;
> +
> +       amdgpu_ttm_placement_from_domain(bo, domain);
> +
> +       ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
> +       if (ret)
> +               goto validate_fail;
> +       if (wait) {
> +               struct amdgpu_amdkfd_fence **ef_list;
> +               unsigned int ef_count;
> +
> +               ret = amdgpu_amdkfd_remove_eviction_fence(bo, NULL, &ef_list,
> +                                                         &ef_count);
> +               if (ret)
> +                       goto validate_fail;
> +
> +               ttm_bo_wait(&bo->tbo, false, false);
> +               amdgpu_amdkfd_add_eviction_fence(bo, ef_list, ef_count);
> +       }
> +
> +validate_fail:
> +       return ret;
> +}
> +
> +static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo)
> +{
> +       struct amdgpu_vm_parser *p = param;
> +
> +       return amdgpu_amdkfd_bo_validate(bo, p->domain, p->wait);
> +}
> +
> +/* vm_validate_pt_pd_bos - Validate page table and directory BOs
> + *
> + * Page directories are not updated here because huge page handling
> + * during page table updates can invalidate page directory entries
> + * again. Page directories are only updated after updating page
> + * tables.
> + */
> +static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
> +{
> +       struct amdgpu_bo *pd = vm->base.root.base.bo;
> +       struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
> +       struct amdgpu_vm_parser param;
> +       uint64_t addr, flags = AMDGPU_PTE_VALID;
> +       int ret;
> +
> +       param.domain = AMDGPU_GEM_DOMAIN_VRAM;
> +       param.wait = false;
> +
> +       ret = amdgpu_vm_validate_pt_bos(adev, &vm->base, amdgpu_amdkfd_validate,
> +                                       &param);
> +       if (ret) {
> +               pr_err("amdgpu: failed to validate PT BOs\n");
> +               return ret;
> +       }
> +
> +       ret = amdgpu_amdkfd_validate(&param, pd);
> +       if (ret) {
> +               pr_err("amdgpu: failed to validate PD\n");
> +               return ret;
> +       }
> +
> +       addr = amdgpu_bo_gpu_offset(vm->base.root.base.bo);
> +       amdgpu_gart_get_vm_pde(adev, -1, &addr, &flags);
> +       vm->pd_phys_addr = addr;
> +
> +       if (vm->base.use_cpu_for_update) {
> +               ret = amdgpu_bo_kmap(pd, NULL);
> +               if (ret) {
> +                       pr_err("amdgpu: failed to kmap PD, ret=%d\n", ret);
> +                       return ret;
> +               }
> +       }
> +
> +       return 0;
> +}
> +
> +static int sync_vm_fence(struct amdgpu_device *adev, struct amdgpu_sync *sync,
> +                        struct dma_fence *f)
> +{
> +       int ret = amdgpu_sync_fence(adev, sync, f, false);
> +
> +       /* Sync objects can't handle multiple GPUs (contexts) updating
> +        * sync->last_vm_update. Fortunately we don't need it for
> +        * KFD's purposes, so we can just drop that fence.
> +        */
> +       if (sync->last_vm_update) {
> +               dma_fence_put(sync->last_vm_update);
> +               sync->last_vm_update = NULL;
> +       }
> +
> +       return ret;
> +}
> +
> +static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
> +{
> +       struct amdgpu_bo *pd = vm->root.base.bo;
> +       struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
> +       int ret;
> +
> +       ret = amdgpu_vm_update_directories(adev, vm);
> +       if (ret)
> +               return ret;
> +
> +       return sync_vm_fence(adev, sync, vm->last_update);
> +}
> +
> +/* add_bo_to_vm - Add a BO to a VM
> + *
> + * Everything that needs to bo done only once when a BO is first added
> + * to a VM. It can later be mapped and unmapped many times without
> + * repeating these steps.
> + *
> + * 1. Allocate and initialize BO VA entry data structure
> + * 2. Add BO to the VM
> + * 3. Determine ASIC-specific PTE flags
> + * 4. Alloc page tables and directories if needed
> + * 4a.  Validate new page tables and directories
> + */
> +static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
> +               struct amdgpu_vm *avm, bool is_aql,
> +               struct kfd_bo_va_list **p_bo_va_entry)
> +{
> +       int ret;
> +       struct kfd_bo_va_list *bo_va_entry;
> +       struct amdkfd_vm *kvm = container_of(avm,
> +                                            struct amdkfd_vm, base);
> +       struct amdgpu_bo *pd = avm->root.base.bo;
> +       struct amdgpu_bo *bo = mem->bo;
> +       uint64_t va = mem->va;
> +       struct list_head *list_bo_va = &mem->bo_va_list;
> +       unsigned long bo_size = bo->tbo.mem.size;
> +
> +       if (!va) {
> +               pr_err("Invalid VA when adding BO to VM\n");
> +               return -EINVAL;
> +       }
> +
> +       if (is_aql)
> +               va += bo_size;
> +
> +       bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
> +       if (!bo_va_entry)
> +               return -ENOMEM;
> +
> +       pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
> +                       va + bo_size, avm);
> +
> +       /* Add BO to VM internal data structures*/
> +       bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, avm, bo);
> +       if (!bo_va_entry->bo_va) {
> +               ret = -EINVAL;
> +               pr_err("Failed to add BO object to VM. ret == %d\n",
> +                               ret);
> +               goto err_vmadd;
> +       }
> +
> +       bo_va_entry->va = va;
> +       bo_va_entry->pte_flags = amdgpu_vm_get_pte_flags(adev,
> +                                                        mem->mapping_flags);
> +       bo_va_entry->kgd_dev = (void *)adev;
> +       list_add(&bo_va_entry->bo_list, list_bo_va);
> +
> +       if (p_bo_va_entry)
> +               *p_bo_va_entry = bo_va_entry;
> +
> +       /* Allocate new page tables if neeeded and validate
> +        * them. Clearing of new page tables and validate need to wait
> +        * on move fences. We don't want that to trigger the eviction
> +        * fence, so remove it temporarily.
> +        */
> +       amdgpu_amdkfd_remove_eviction_fence(pd,
> +                                       kvm->process_info->eviction_fence,
> +                                       NULL, NULL);
> +
> +       ret = amdgpu_vm_alloc_pts(adev, avm, va, amdgpu_bo_size(bo));
> +       if (ret) {
> +               pr_err("Failed to allocate pts, err=%d\n", ret);
> +               goto err_alloc_pts;
> +       }
> +
> +       ret = vm_validate_pt_pd_bos(kvm);
> +       if (ret) {
> +               pr_err("validate_pt_pd_bos() failed\n");
> +               goto err_alloc_pts;
> +       }
> +
> +       /* Add the eviction fence back */
> +       amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
> +
> +       return 0;
> +
> +err_alloc_pts:
> +       amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
> +       amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
> +       list_del(&bo_va_entry->bo_list);
> +err_vmadd:
> +       kfree(bo_va_entry);
> +       return ret;
> +}
> +
> +static void remove_bo_from_vm(struct amdgpu_device *adev,
> +               struct kfd_bo_va_list *entry, unsigned long size)
> +{
> +       pr_debug("\t remove VA 0x%llx - 0x%llx in entry %p\n",
> +                       entry->va,
> +                       entry->va + size, entry);
> +       amdgpu_vm_bo_rmv(adev, entry->bo_va);
> +       list_del(&entry->bo_list);
> +       kfree(entry);
> +}
> +
> +static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
> +                               struct amdkfd_process_info *process_info)
> +{
> +       struct ttm_validate_buffer *entry = &mem->validate_list;
> +       struct amdgpu_bo *bo = mem->bo;
> +
> +       INIT_LIST_HEAD(&entry->head);
> +       entry->shared = true;
> +       entry->bo = &bo->tbo;
> +       mutex_lock(&process_info->lock);
> +       list_add_tail(&entry->head, &process_info->kfd_bo_list);
> +       mutex_unlock(&process_info->lock);
> +}
> +
> +/* Reserving a BO and its page table BOs must happen atomically to
> + * avoid deadlocks. Some operations update multiple VMs at once. Track
> + * all the reservation info in a context structure. Optionally a sync
> + * object can track VM updates.
> + */
> +struct bo_vm_reservation_context {
> +       struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
> +       unsigned int n_vms;                 /* Number of VMs reserved       */
> +       struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
> +       struct ww_acquire_ctx ticket;       /* Reservation ticket           */
> +       struct list_head list, duplicates;  /* BO lists                     */
> +       struct amdgpu_sync *sync;           /* Pointer to sync object       */
> +       bool reserved;                      /* Whether BOs are reserved     */
> +};
> +
> +enum bo_vm_match {
> +       BO_VM_NOT_MAPPED = 0,   /* Match VMs where a BO is not mapped */
> +       BO_VM_MAPPED,           /* Match VMs where a BO is mapped     */
> +       BO_VM_ALL,              /* Match all VMs a BO was added to    */
> +};
> +
> +/**
> + * reserve_bo_and_vm - reserve a BO and a VM unconditionally.
> + * @mem: KFD BO structure.
> + * @vm: the VM to reserve.
> + * @ctx: the struct that will be used in unreserve_bo_and_vms().
> + */
> +static int reserve_bo_and_vm(struct kgd_mem *mem,
> +                             struct amdgpu_vm *vm,
> +                             struct bo_vm_reservation_context *ctx)
> +{
> +       struct amdgpu_bo *bo = mem->bo;
> +       int ret;
> +
> +       WARN_ON(!vm);
> +
> +       ctx->reserved = false;
> +       ctx->n_vms = 1;
> +       ctx->sync = &mem->sync;
> +
> +       INIT_LIST_HEAD(&ctx->list);
> +       INIT_LIST_HEAD(&ctx->duplicates);
> +
> +       ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
> +       if (!ctx->vm_pd)
> +               return -ENOMEM;
> +
> +       ctx->kfd_bo.robj = bo;
> +       ctx->kfd_bo.priority = 0;
> +       ctx->kfd_bo.tv.bo = &bo->tbo;
> +       ctx->kfd_bo.tv.shared = true;
> +       ctx->kfd_bo.user_pages = NULL;
> +       list_add(&ctx->kfd_bo.tv.head, &ctx->list);
> +
> +       amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
> +
> +       ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
> +                                    false, &ctx->duplicates);
> +       if (!ret)
> +               ctx->reserved = true;
> +       else {
> +               pr_err("Failed to reserve buffers in ttm\n");
> +               kfree(ctx->vm_pd);
> +               ctx->vm_pd = NULL;
> +       }
> +
> +       return ret;
> +}
> +
> +/**
> + * reserve_bo_and_cond_vms - reserve a BO and some VMs conditionally
> + * @mem: KFD BO structure.
> + * @vm: the VM to reserve. If NULL, then all VMs associated with the BO
> + * is used. Otherwise, a single VM associated with the BO.
> + * @map_type: the mapping status that will be used to filter the VMs.
> + * @ctx: the struct that will be used in unreserve_bo_and_vms().
> + *
> + * Returns 0 for success, negative for failure.
> + */
> +static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
> +                               struct amdgpu_vm *vm, enum bo_vm_match map_type,
> +                               struct bo_vm_reservation_context *ctx)
> +{
> +       struct amdgpu_bo *bo = mem->bo;
> +       struct kfd_bo_va_list *entry;
> +       unsigned int i;
> +       int ret;
> +
> +       ctx->reserved = false;
> +       ctx->n_vms = 0;
> +       ctx->vm_pd = NULL;
> +       ctx->sync = &mem->sync;
> +
> +       INIT_LIST_HEAD(&ctx->list);
> +       INIT_LIST_HEAD(&ctx->duplicates);
> +
> +       list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
> +               if ((vm && vm != entry->bo_va->base.vm) ||
> +                       (entry->is_mapped != map_type
> +                       && map_type != BO_VM_ALL))
> +                       continue;
> +
> +               ctx->n_vms++;
> +       }
> +
> +       if (ctx->n_vms != 0) {
> +               ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
> +                                    GFP_KERNEL);
> +               if (!ctx->vm_pd)
> +                       return -ENOMEM;
> +       }
> +
> +       ctx->kfd_bo.robj = bo;
> +       ctx->kfd_bo.priority = 0;
> +       ctx->kfd_bo.tv.bo = &bo->tbo;
> +       ctx->kfd_bo.tv.shared = true;
> +       ctx->kfd_bo.user_pages = NULL;
> +       list_add(&ctx->kfd_bo.tv.head, &ctx->list);
> +
> +       i = 0;
> +       list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
> +               if ((vm && vm != entry->bo_va->base.vm) ||
> +                       (entry->is_mapped != map_type
> +                       && map_type != BO_VM_ALL))
> +                       continue;
> +
> +               amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
> +                               &ctx->vm_pd[i]);
> +               i++;
> +       }
> +
> +       ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
> +                                    false, &ctx->duplicates);
> +       if (!ret)
> +               ctx->reserved = true;
> +       else
> +               pr_err("Failed to reserve buffers in ttm.\n");
> +
> +       if (ret) {
> +               kfree(ctx->vm_pd);
> +               ctx->vm_pd = NULL;
> +       }
> +
> +       return ret;
> +}
> +
> +/**
> + * unreserve_bo_and_vms - Unreserve BO and VMs from a reservation context
> + * @ctx: Reservation context to unreserve
> + * @wait: Optionally wait for a sync object representing pending VM updates
> + * @intr: Whether the wait is interruptible
> + *
> + * Also frees any resources allocated in
> + * reserve_bo_and_(cond_)vm(s). Returns the status from
> + * amdgpu_sync_wait.
> + */
> +static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
> +                                bool wait, bool intr)
> +{
> +       int ret = 0;
> +
> +       if (wait)
> +               ret = amdgpu_sync_wait(ctx->sync, intr);
> +
> +       if (ctx->reserved)
> +               ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
> +       kfree(ctx->vm_pd);
> +
> +       ctx->sync = NULL;
> +
> +       ctx->reserved = false;
> +       ctx->vm_pd = NULL;
> +
> +       return ret;
> +}
> +
> +static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
> +                               struct kfd_bo_va_list *entry,
> +                               struct amdgpu_sync *sync)
> +{
> +       struct amdgpu_bo_va *bo_va = entry->bo_va;
> +       struct amdgpu_vm *vm = bo_va->base.vm;
> +       struct amdkfd_vm *kvm = container_of(vm, struct amdkfd_vm, base);
> +       struct amdgpu_bo *pd = vm->root.base.bo;
> +
> +       /* Remove eviction fence from PD (and thereby from PTs too as
> +        * they share the resv. object). Otherwise during PT update
> +        * job (see amdgpu_vm_bo_update_mapping), eviction fence would
> +        * get added to job->sync object and job execution would
> +        * trigger the eviction fence.
> +        */
> +       amdgpu_amdkfd_remove_eviction_fence(pd,
> +                                           kvm->process_info->eviction_fence,
> +                                           NULL, NULL);
> +       amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
> +
> +       amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
> +
> +       /* Add the eviction fence back */
> +       amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
> +
> +       sync_vm_fence(adev, sync, bo_va->last_pt_update);
> +
> +       return 0;
> +}
> +
> +static int update_gpuvm_pte(struct amdgpu_device *adev,
> +               struct kfd_bo_va_list *entry,
> +               struct amdgpu_sync *sync)
> +{
> +       int ret;
> +       struct amdgpu_vm *vm;
> +       struct amdgpu_bo_va *bo_va;
> +       struct amdgpu_bo *bo;
> +
> +       bo_va = entry->bo_va;
> +       vm = bo_va->base.vm;
> +       bo = bo_va->base.bo;
> +
> +       /* Update the page tables  */
> +       ret = amdgpu_vm_bo_update(adev, bo_va, false);
> +       if (ret) {
> +               pr_err("amdgpu_vm_bo_update failed\n");
> +               return ret;
> +       }
> +
> +       return sync_vm_fence(adev, sync, bo_va->last_pt_update);
> +}
> +
> +static int map_bo_to_gpuvm(struct amdgpu_device *adev,
> +               struct kfd_bo_va_list *entry, struct amdgpu_sync *sync)
> +{
> +       int ret;
> +
> +       /* Set virtual address for the allocation */
> +       ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
> +                              amdgpu_bo_size(entry->bo_va->base.bo),
> +                              entry->pte_flags);
> +       if (ret) {
> +               pr_err("Failed to map VA 0x%llx in vm. ret %d\n",
> +                               entry->va, ret);
> +               return ret;
> +       }
> +
> +       ret = update_gpuvm_pte(adev, entry, sync);
> +       if (ret) {
> +               pr_err("update_gpuvm_pte() failed\n");
> +               goto update_gpuvm_pte_failed;
> +       }
> +
> +       return 0;
> +
> +update_gpuvm_pte_failed:
> +       unmap_bo_from_gpuvm(adev, entry, sync);
> +       return ret;
> +}
> +
> +static int process_validate_vms(struct amdkfd_process_info *process_info)
> +{
> +       struct amdkfd_vm *peer_vm;
> +       int ret;
> +
> +       list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +                           vm_list_node) {
> +               ret = vm_validate_pt_pd_bos(peer_vm);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +static int process_update_pds(struct amdkfd_process_info *process_info,
> +                             struct amdgpu_sync *sync)
> +{
> +       struct amdkfd_vm *peer_vm;
> +       int ret;
> +
> +       list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +                           vm_list_node) {
> +               ret = vm_update_pds(&peer_vm->base, sync);
> +               if (ret)
> +                       return ret;
> +       }
> +
> +       return 0;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
> +                                         void **process_info,
> +                                         struct dma_fence **ef)
> +{
> +       int ret;
> +       struct amdkfd_vm *new_vm;
> +       struct amdkfd_process_info *info;
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +
> +       new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
> +       if (!new_vm)
> +               return -ENOMEM;
> +
> +       /* Initialize the VM context, allocate the page directory and zero it */
> +       ret = amdgpu_vm_init(adev, &new_vm->base, AMDGPU_VM_CONTEXT_COMPUTE, 0);
> +       if (ret) {
> +               pr_err("Failed init vm ret %d\n", ret);
> +               goto vm_init_fail;
> +       }
> +       new_vm->adev = adev;
> +
> +       if (!*process_info) {
> +               info = kzalloc(sizeof(*info), GFP_KERNEL);
> +               if (!info) {
> +                       ret = -ENOMEM;
> +                       goto alloc_process_info_fail;
> +               }
> +
> +               mutex_init(&info->lock);
> +               INIT_LIST_HEAD(&info->vm_list_head);
> +               INIT_LIST_HEAD(&info->kfd_bo_list);
> +
> +               info->eviction_fence =
> +                       amdgpu_amdkfd_fence_create(dma_fence_context_alloc(1),
> +                                                  current->mm);
> +               if (!info->eviction_fence) {
> +                       pr_err("Failed to create eviction fence\n");
> +                       goto create_evict_fence_fail;
> +               }
> +
> +               *process_info = info;
> +               *ef = dma_fence_get(&info->eviction_fence->base);
> +       }
> +
> +       new_vm->process_info = *process_info;
> +
> +       mutex_lock(&new_vm->process_info->lock);
> +       list_add_tail(&new_vm->vm_list_node,
> +                       &(new_vm->process_info->vm_list_head));
> +       new_vm->process_info->n_vms++;
> +       mutex_unlock(&new_vm->process_info->lock);
> +
> +       *vm = (void *) new_vm;
> +
> +       pr_debug("Created process vm %p\n", *vm);
> +
> +       return ret;
> +
> +create_evict_fence_fail:
need to destroy mutex inside info

> +       kfree(info);
> +alloc_process_info_fail:
> +       amdgpu_vm_fini(adev, &new_vm->base);
> +vm_init_fail:
> +       kfree(new_vm);
> +       return ret;
> +
> +}
> +
> +void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +       struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *) vm;
> +       struct amdgpu_vm *avm = &kfd_vm->base;
> +       struct amdgpu_bo *pd;
> +       struct amdkfd_process_info *process_info;
> +
> +       if (WARN_ON(!kgd || !vm))
> +               return;
> +
> +       pr_debug("Destroying process vm %p\n", vm);
> +       /* Release eviction fence from PD */
> +       pd = avm->root.base.bo;
> +       amdgpu_bo_reserve(pd, false);
> +       amdgpu_bo_fence(pd, NULL, false);
> +       amdgpu_bo_unreserve(pd);
> +
> +       process_info = kfd_vm->process_info;
> +
> +       mutex_lock(&process_info->lock);
> +       process_info->n_vms--;
> +       list_del(&kfd_vm->vm_list_node);
> +       mutex_unlock(&process_info->lock);
> +
> +       /* Release per-process resources */
> +       if (!process_info->n_vms) {
> +               WARN_ON(!list_empty(&process_info->kfd_bo_list));
> +
> +               dma_fence_put(&process_info->eviction_fence->base);
I think we need to destroy the process_info mutex here before the free
> +               kfree(process_info);
> +       }
> +
> +       /* Release the VM context */
> +       amdgpu_vm_fini(adev, avm);
> +       kfree(vm);
> +}
> +
> +uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm)
> +{
> +       struct amdkfd_vm *avm = (struct amdkfd_vm *)vm;
> +
> +       return avm->pd_phys_addr >> AMDGPU_GPU_PAGE_SHIFT;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
> +               struct kgd_dev *kgd, uint64_t va, uint64_t size,
> +               void *vm, struct kgd_mem **mem,
> +               uint64_t *offset, uint32_t flags)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +       struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
> +       struct amdgpu_bo *bo;
> +       int byte_align;
> +       u32 alloc_domain;
> +       u64 alloc_flags;
> +       uint32_t mapping_flags;
> +       int ret;
> +
> +       /*
> +        * Check on which domain to allocate BO
> +        */
> +       if (flags & ALLOC_MEM_FLAGS_VRAM) {
> +               alloc_domain = AMDGPU_GEM_DOMAIN_VRAM;
> +               alloc_flags = AMDGPU_GEM_CREATE_VRAM_CLEARED;
> +               alloc_flags |= (flags & ALLOC_MEM_FLAGS_PUBLIC) ?
> +                       AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED :
> +                       AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
> +       } else if (flags & ALLOC_MEM_FLAGS_GTT) {
> +               alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
> +               alloc_flags = 0;
> +       } else {
> +               return -EINVAL;
> +       }
> +
> +       *mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
> +       if (!*mem)
> +               return -ENOMEM;
> +       INIT_LIST_HEAD(&(*mem)->bo_va_list);
> +       mutex_init(&(*mem)->lock);
> +       (*mem)->aql_queue     = !!(flags & ALLOC_MEM_FLAGS_AQL_QUEUE_MEM);
> +
> +       /* Workaround for AQL queue wraparound bug. Map the same
> +        * memory twice. That means we only actually allocate half
> +        * the memory.
> +        */
> +       if ((*mem)->aql_queue)
> +               size = size >> 1;
> +
> +       /* Workaround for TLB bug on older VI chips */
> +       byte_align = (adev->family == AMDGPU_FAMILY_VI &&
> +                       adev->asic_type != CHIP_FIJI &&
> +                       adev->asic_type != CHIP_POLARIS10 &&
> +                       adev->asic_type != CHIP_POLARIS11) ?
> +                       VI_BO_SIZE_ALIGN : 1;
> +
> +       mapping_flags = AMDGPU_VM_PAGE_READABLE;
> +       if (flags & ALLOC_MEM_FLAGS_WRITABLE)
> +               mapping_flags |= AMDGPU_VM_PAGE_WRITEABLE;
> +       if (flags & ALLOC_MEM_FLAGS_EXECUTABLE)
> +               mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
> +       if (flags & ALLOC_MEM_FLAGS_COHERENT)
> +               mapping_flags |= AMDGPU_VM_MTYPE_UC;
> +       else
> +               mapping_flags |= AMDGPU_VM_MTYPE_NC;
> +       (*mem)->mapping_flags = mapping_flags;
> +
> +       amdgpu_sync_create(&(*mem)->sync);
> +
> +       ret = amdgpu_amdkfd_reserve_system_mem_limit(adev, size, alloc_domain);
> +       if (ret) {
> +               pr_debug("Insufficient system memory\n");
> +               goto err_bo_create;
I suggest to change that to "goto err_reserve_system_mem" and add that
label after err_bo_create label

> +       }
> +
> +       pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n",
> +                       va, size, domain_string(alloc_domain));
> +
> +       ret = amdgpu_bo_create(adev, size, byte_align, false,
> +                               alloc_domain, alloc_flags, NULL, NULL, 0, &bo);
> +       if (ret) {
> +               pr_debug("Failed to create BO on domain %s. ret %d\n",
> +                               domain_string(alloc_domain), ret);
> +               unreserve_system_mem_limit(adev, size, alloc_domain);
Move the above line under "err_bo_create:" label

> +               goto err_bo_create;
> +       }
> +       bo->kfd_bo = *mem;
> +       (*mem)->bo = bo;
> +
> +       (*mem)->va = va;
> +       (*mem)->domain = alloc_domain;
> +       (*mem)->mapped_to_gpu_memory = 0;
> +       (*mem)->process_info = kfd_vm->process_info;
> +       add_kgd_mem_to_kfd_bo_list(*mem, kfd_vm->process_info);
> +
> +       if (offset)
> +               *offset = amdgpu_bo_mmap_offset(bo);
> +
> +       return 0;
> +
> +err_bo_create:
Need to destroy mutex before freeing mem

> +       kfree(*mem);
> +       return ret;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem)
> +{
> +       struct amdkfd_process_info *process_info = mem->process_info;
> +       unsigned long bo_size = mem->bo->tbo.mem.size;
> +       struct kfd_bo_va_list *entry, *tmp;
> +       struct bo_vm_reservation_context ctx;
> +       struct ttm_validate_buffer *bo_list_entry;
> +       int ret;
> +
> +       mutex_lock(&mem->lock);
> +
> +       if (mem->mapped_to_gpu_memory > 0) {
> +               pr_debug("BO VA 0x%llx size 0x%lx is still mapped.\n",
> +                               mem->va, bo_size);
> +               mutex_unlock(&mem->lock);
> +               return -EBUSY;
> +       }
> +
> +       mutex_unlock(&mem->lock);
> +       /* lock is not needed after this, since mem is unused and will
> +        * be freed anyway
> +        */
> +
> +       /* Make sure restore workers don't access the BO any more */
> +       bo_list_entry = &mem->validate_list;
> +       mutex_lock(&process_info->lock);
> +       list_del(&bo_list_entry->head);
> +       mutex_unlock(&process_info->lock);
> +
> +       ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
> +       if (unlikely(ret))
> +               return ret;
> +
> +       /* The eviction fence should be removed by the last unmap.
> +        * TODO: Log an error condition if the bo still has the eviction fence
> +        * attached
> +        */
> +       amdgpu_amdkfd_remove_eviction_fence(mem->bo,
> +                                       process_info->eviction_fence,
> +                                       NULL, NULL);
> +       pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
> +               mem->va + bo_size * (1 + mem->aql_queue));
> +
> +       /* Remove from VM internal data structures */
> +       list_for_each_entry_safe(entry, tmp, &mem->bo_va_list, bo_list)
> +               remove_bo_from_vm((struct amdgpu_device *)entry->kgd_dev,
> +                               entry, bo_size);
> +
> +       ret = unreserve_bo_and_vms(&ctx, false, false);
> +
> +       /* Free the sync object */
> +       amdgpu_sync_free(&mem->sync);
> +
> +       /* Free the BO*/
> +       amdgpu_bo_unref(&mem->bo);
Destroy the mem mutex before freeing the structure itself

> +       kfree(mem);
> +
> +       return ret;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +       struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
> +       int ret;
> +       struct amdgpu_bo *bo;
> +       uint32_t domain;
> +       struct kfd_bo_va_list *entry;
> +       struct bo_vm_reservation_context ctx;
> +       struct kfd_bo_va_list *bo_va_entry = NULL;
> +       struct kfd_bo_va_list *bo_va_entry_aql = NULL;
> +       unsigned long bo_size;
> +
> +       /* Make sure restore is not running concurrently.
> +        */
> +       mutex_lock(&mem->process_info->lock);
> +
> +       mutex_lock(&mem->lock);
> +
> +       bo = mem->bo;
> +
> +       if (!bo) {
> +               pr_err("Invalid BO when mapping memory to GPU\n");
> +               ret = -EINVAL;
> +               goto out;
> +       }
> +
> +       domain = mem->domain;
> +       bo_size = bo->tbo.mem.size;
> +
> +       pr_debug("Map VA 0x%llx - 0x%llx to vm %p domain %s\n",
> +                       mem->va,
> +                       mem->va + bo_size * (1 + mem->aql_queue),
> +                       vm, domain_string(domain));
> +
> +       ret = reserve_bo_and_vm(mem, vm, &ctx);
> +       if (unlikely(ret))
> +               goto out;
> +
> +       if (check_if_add_bo_to_vm((struct amdgpu_vm *)vm, mem)) {
> +               ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm, false,
> +                               &bo_va_entry);
> +               if (ret)
> +                       goto add_bo_to_vm_failed;
> +               if (mem->aql_queue) {
> +                       ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm,
> +                                       true, &bo_va_entry_aql);
> +                       if (ret)
> +                               goto add_bo_to_vm_failed_aql;
> +               }
> +       } else {
> +               ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
> +               if (unlikely(ret))
> +                       goto add_bo_to_vm_failed;
> +       }
> +
> +       if (mem->mapped_to_gpu_memory == 0) {
> +               /* Validate BO only once. The eviction fence gets added to BO
> +                * the first time it is mapped. Validate will wait for all
> +                * background evictions to complete.
> +                */
> +               ret = amdgpu_amdkfd_bo_validate(bo, domain, true);
> +               if (ret) {
> +                       pr_debug("Validate failed\n");
> +                       goto map_bo_to_gpuvm_failed;
> +               }
> +       }
> +
> +       list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
> +               if (entry->bo_va->base.vm == vm && !entry->is_mapped) {
> +                       pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
> +                                       entry->va, entry->va + bo_size,
> +                                       entry);
> +
> +                       ret = map_bo_to_gpuvm(adev, entry, ctx.sync);
> +                       if (ret) {
> +                               pr_err("Failed to map radeon bo to gpuvm\n");
> +                               goto map_bo_to_gpuvm_failed;
> +                       }
> +
> +                       ret = vm_update_pds(vm, ctx.sync);
> +                       if (ret) {
> +                               pr_err("Failed to update page directories\n");
> +                               goto map_bo_to_gpuvm_failed;
> +                       }
> +
> +                       entry->is_mapped = true;
> +                       mem->mapped_to_gpu_memory++;
> +                       pr_debug("\t INC mapping count %d\n",
> +                                       mem->mapped_to_gpu_memory);
> +               }
> +       }
> +
> +       if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->pin_count)
> +               amdgpu_bo_fence(bo,
> +                               &kfd_vm->process_info->eviction_fence->base,
> +                               true);
> +       ret = unreserve_bo_and_vms(&ctx, false, false);
> +
> +       goto out;
> +
> +map_bo_to_gpuvm_failed:
> +       if (bo_va_entry_aql)
> +               remove_bo_from_vm(adev, bo_va_entry_aql, bo_size);
> +add_bo_to_vm_failed_aql:
> +       if (bo_va_entry)
> +               remove_bo_from_vm(adev, bo_va_entry, bo_size);
> +add_bo_to_vm_failed:
> +       unreserve_bo_and_vms(&ctx, false, false);
> +
> +out:
> +       mutex_unlock(&mem->process_info->lock);
> +       mutex_unlock(&mem->lock);
> +       return ret;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
> +{
> +       struct amdgpu_device *adev = get_amdgpu_device(kgd);
> +       struct amdkfd_process_info *process_info =
> +               ((struct amdkfd_vm *)vm)->process_info;
> +       unsigned long bo_size = mem->bo->tbo.mem.size;
> +       struct kfd_bo_va_list *entry;
> +       struct bo_vm_reservation_context ctx;
> +       int ret;
> +
> +       mutex_lock(&mem->lock);
> +
> +       ret = reserve_bo_and_cond_vms(mem, vm, BO_VM_MAPPED, &ctx);
> +       if (unlikely(ret))
> +               goto out;
> +       /* If no VMs were reserved, it means the BO wasn't actually mapped */
> +       if (ctx.n_vms == 0) {
> +               ret = -EINVAL;
> +               goto unreserve_out;
> +       }
> +
> +       ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
> +       if (unlikely(ret))
> +               goto unreserve_out;
> +
> +       pr_debug("Unmap VA 0x%llx - 0x%llx from vm %p\n",
> +               mem->va,
> +               mem->va + bo_size * (1 + mem->aql_queue),
> +               vm);
> +
> +       list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
> +               if (entry->bo_va->base.vm == vm && entry->is_mapped) {
> +                       pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
> +                                       entry->va,
> +                                       entry->va + bo_size,
> +                                       entry);
> +
> +                       ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
> +                       if (ret == 0) {
> +                               entry->is_mapped = false;
> +                       } else {
> +                               pr_err("failed to unmap VA 0x%llx\n",
> +                                               mem->va);
> +                               goto unreserve_out;
> +                       }
> +
> +                       mem->mapped_to_gpu_memory--;
> +                       pr_debug("\t DEC mapping count %d\n",
> +                                       mem->mapped_to_gpu_memory);
> +               }
> +       }
> +
> +       /* If BO is unmapped from all VMs, unfence it. It can be evicted if
> +        * required.
> +        */
> +       if (mem->mapped_to_gpu_memory == 0 &&
> +           !amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && !mem->bo->pin_count)
> +               amdgpu_amdkfd_remove_eviction_fence(mem->bo,
> +                                               process_info->eviction_fence,
> +                                                   NULL, NULL);
> +
> +unreserve_out:
> +       unreserve_bo_and_vms(&ctx, false, false);
> +out:
> +       mutex_unlock(&mem->lock);
> +       return ret;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_sync_memory(
> +               struct kgd_dev *kgd, struct kgd_mem *mem, bool intr)
> +{
> +       struct amdgpu_sync sync;
> +       int ret;
> +
> +       amdgpu_sync_create(&sync);
> +
> +       mutex_lock(&mem->lock);
> +       amdgpu_sync_clone(&mem->sync, &sync);
> +       mutex_unlock(&mem->lock);
> +
> +       ret = amdgpu_sync_wait(&sync, intr);
> +       amdgpu_sync_free(&sync);
> +       return ret;
> +}
> +
> +int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
> +               struct kgd_mem *mem, void **kptr, uint64_t *size)
> +{
> +       int ret;
> +       struct amdgpu_bo *bo = mem->bo;
> +
> +       if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
> +               pr_err("userptr can't be mapped to kernel\n");
> +               return -EINVAL;
> +       }
> +
> +       /* delete kgd_mem from kfd_bo_list to avoid re-validating
> +        * this BO in BO's restoring after eviction.
> +        */
> +       mutex_lock(&mem->process_info->lock);
> +
> +       ret = amdgpu_bo_reserve(bo, true);
> +       if (ret) {
> +               pr_err("Failed to reserve bo. ret %d\n", ret);
> +               goto bo_reserve_failed;
> +       }
> +
> +       ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
> +       if (ret) {
> +               pr_err("Failed to pin bo. ret %d\n", ret);
> +               goto pin_failed;
> +       }
> +
> +       ret = amdgpu_bo_kmap(bo, kptr);
> +       if (ret) {
> +               pr_err("Failed to map bo to kernel. ret %d\n", ret);
> +               goto kmap_failed;
> +       }
> +
> +       amdgpu_amdkfd_remove_eviction_fence(
> +               bo, mem->process_info->eviction_fence, NULL, NULL);
> +       list_del_init(&mem->validate_list.head);
> +
> +       if (size)
> +               *size = amdgpu_bo_size(bo);
> +
> +       amdgpu_bo_unreserve(bo);
> +
> +       mutex_unlock(&mem->process_info->lock);
> +       return 0;
> +
> +kmap_failed:
> +       amdgpu_bo_unpin(bo);
> +pin_failed:
> +       amdgpu_bo_unreserve(bo);
> +bo_reserve_failed:
> +       mutex_unlock(&mem->process_info->lock);
> +
> +       return ret;
> +}
> +
> +/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
> + *   KFD process identified by process_info
> + *
> + * @process_info: amdkfd_process_info of the KFD process
> + *
> + * After memory eviction, restore thread calls this function. The function
> + * should be called when the Process is still valid. BO restore involves -
> + *
> + * 1.  Release old eviction fence and create new one
> + * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
> + * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
> + *     BOs that need to be reserved.
> + * 4.  Reserve all the BOs
> + * 5.  Validate of PD and PT BOs.
> + * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
> + * 7.  Add fence to all PD and PT BOs.
> + * 8.  Unreserve all BOs
> + */
> +int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
> +{
> +       struct amdgpu_bo_list_entry *pd_bo_list;
> +       struct amdkfd_process_info *process_info = info;
> +       struct amdkfd_vm *peer_vm;
> +       struct kgd_mem *mem;
> +       struct bo_vm_reservation_context ctx;
> +       struct amdgpu_amdkfd_fence *new_fence;
> +       int ret = 0, i;
> +       struct list_head duplicate_save;
> +       struct amdgpu_sync sync_obj;
> +
> +       INIT_LIST_HEAD(&duplicate_save);
> +       INIT_LIST_HEAD(&ctx.list);
> +       INIT_LIST_HEAD(&ctx.duplicates);
> +
> +       pd_bo_list = kcalloc(process_info->n_vms,
> +                            sizeof(struct amdgpu_bo_list_entry),
> +                            GFP_KERNEL);
> +       if (!pd_bo_list)
> +               return -ENOMEM;
> +
> +       i = 0;
> +       mutex_lock(&process_info->lock);
> +       list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +                       vm_list_node)
> +               amdgpu_vm_get_pd_bo(&peer_vm->base, &ctx.list,
> +                                   &pd_bo_list[i++]);
> +
> +       /* Reserve all BOs and page tables/directory. Add all BOs from
> +        * kfd_bo_list to ctx.list
> +        */
> +       list_for_each_entry(mem, &process_info->kfd_bo_list,
> +                           validate_list.head) {
> +
> +               list_add_tail(&mem->resv_list.head, &ctx.list);
> +               mem->resv_list.bo = mem->validate_list.bo;
> +               mem->resv_list.shared = mem->validate_list.shared;
> +       }
> +
> +       ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
> +                                    false, &duplicate_save);
> +       if (ret) {
> +               pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
> +               goto ttm_reserve_fail;
> +       }
> +
> +       amdgpu_sync_create(&sync_obj);
> +
> +       /* Validate PDs and PTs */
> +       ret = process_validate_vms(process_info);
> +       if (ret)
> +               goto validate_map_fail;
> +
> +       /* Wait for PD/PTs validate to finish */
> +       /* FIXME: I think this isn't needed */
> +       list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +                           vm_list_node) {
> +               struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
> +
> +               ttm_bo_wait(&bo->tbo, false, false);
> +       }
> +
> +       /* Validate BOs and map them to GPUVM (update VM page tables). */
> +       list_for_each_entry(mem, &process_info->kfd_bo_list,
> +                           validate_list.head) {
> +
> +               struct amdgpu_bo *bo = mem->bo;
> +               uint32_t domain = mem->domain;
> +               struct kfd_bo_va_list *bo_va_entry;
> +
> +               ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
> +               if (ret) {
> +                       pr_debug("Memory eviction: Validate BOs failed. Try again\n");
> +                       goto validate_map_fail;
> +               }
> +
> +               list_for_each_entry(bo_va_entry, &mem->bo_va_list,
> +                                   bo_list) {
> +                       ret = update_gpuvm_pte((struct amdgpu_device *)
> +                                             bo_va_entry->kgd_dev,
> +                                             bo_va_entry,
> +                                             &sync_obj);
> +                       if (ret) {
> +                               pr_debug("Memory eviction: update PTE failed. Try again\n");
> +                               goto validate_map_fail;
> +                       }
> +               }
> +       }
> +
> +       /* Update page directories */
> +       ret = process_update_pds(process_info, &sync_obj);
> +       if (ret) {
> +               pr_debug("Memory eviction: update PDs failed. Try again\n");
> +               goto validate_map_fail;
> +       }
> +
> +       amdgpu_sync_wait(&sync_obj, false);
> +
> +       /* Release old eviction fence and create new one, because fence only
> +        * goes from unsignaled to signaled, fence cannot be reused.
> +        * Use context and mm from the old fence.
> +        */
> +       new_fence = amdgpu_amdkfd_fence_create(
> +                               process_info->eviction_fence->base.context,
> +                               process_info->eviction_fence->mm);
> +       if (!new_fence) {
> +               pr_err("Failed to create eviction fence\n");
> +               ret = -ENOMEM;
> +               goto validate_map_fail;
> +       }
> +       dma_fence_put(&process_info->eviction_fence->base);
> +       process_info->eviction_fence = new_fence;
> +       *ef = dma_fence_get(&new_fence->base);
> +
> +       /* Wait for validate to finish and attach new eviction fence */
> +       list_for_each_entry(mem, &process_info->kfd_bo_list,
> +               validate_list.head)
> +               ttm_bo_wait(&mem->bo->tbo, false, false);
> +       list_for_each_entry(mem, &process_info->kfd_bo_list,
> +               validate_list.head)
> +               amdgpu_bo_fence(mem->bo,
> +                       &process_info->eviction_fence->base, true);
> +
> +       /* Attach eviction fence to PD / PT BOs */
> +       list_for_each_entry(peer_vm, &process_info->vm_list_head,
> +                           vm_list_node) {
> +               struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
> +
> +               amdgpu_bo_fence(bo, &process_info->eviction_fence->base, true);
> +       }
> +
> +validate_map_fail:
> +       ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
> +       amdgpu_sync_free(&sync_obj);
> +ttm_reserve_fail:
> +       mutex_unlock(&process_info->lock);
> +       kfree(pd_bo_list);
> +       return ret;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 5c4c3e0..f608ecf 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -36,6 +36,7 @@
>  #include <drm/drm_cache.h>
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>
>  static bool amdgpu_need_backup(struct amdgpu_device *adev)
>  {
> @@ -54,6 +55,9 @@ static void amdgpu_ttm_bo_destroy(struct ttm_buffer_object *tbo)
>         struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
>         struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
>
> +       if (bo->kfd_bo)
> +               amdgpu_amdkfd_unreserve_system_memory_limit(bo);
> +
>         amdgpu_bo_kunmap(bo);
>
>         drm_gem_object_release(&bo->gem_base);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 33615e2..ba5330a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -92,6 +92,8 @@ struct amdgpu_bo {
>                 struct list_head        mn_list;
>                 struct list_head        shadow_list;
>         };
> +
> +       struct kgd_mem                  *kfd_bo;
>  };
>
>  static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c3f33d3..76ee968 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -261,6 +261,13 @@ static int amdgpu_verify_access(struct ttm_buffer_object *bo, struct file *filp)
>  {
>         struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
>
> +       /*
> +        * Don't verify access for KFD BOs. They don't have a GEM
> +        * object associated with them.
> +        */
> +       if (abo->kfd_bo)
> +               return 0;
> +
>         if (amdgpu_ttm_tt_get_usermm(bo->ttm))
>                 return -EPERM;
>         return drm_vma_node_verify_access(&abo->gem_base.vma_node,
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 36c706a..5984fec 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -127,6 +127,25 @@ struct tile_config {
>         uint32_t num_ranks;
>  };
>
> +
> +/*
> + * Allocation flag domains
> + */
> +#define ALLOC_MEM_FLAGS_VRAM           (1 << 0)
> +#define ALLOC_MEM_FLAGS_GTT            (1 << 1)
> +#define ALLOC_MEM_FLAGS_USERPTR                (1 << 2) /* TODO */
> +#define ALLOC_MEM_FLAGS_DOORBELL       (1 << 3) /* TODO */
> +
> +/*
> + * Allocation flags attributes/access options.
> + */
> +#define ALLOC_MEM_FLAGS_WRITABLE       (1 << 31)
> +#define ALLOC_MEM_FLAGS_EXECUTABLE     (1 << 30)
> +#define ALLOC_MEM_FLAGS_PUBLIC         (1 << 29)
> +#define ALLOC_MEM_FLAGS_NO_SUBSTITUTE  (1 << 28) /* TODO */
> +#define ALLOC_MEM_FLAGS_AQL_QUEUE_MEM  (1 << 27)
> +#define ALLOC_MEM_FLAGS_COHERENT       (1 << 26) /* For GFXv9 or later */
> +
>  /**
>   * struct kfd2kgd_calls
>   *
> @@ -186,6 +205,41 @@ struct tile_config {
>   *
>   * @get_vram_usage: Returns current VRAM usage
>   *
> + * @create_process_vm: Create a VM address space for a given process and GPU
> + *
> + * @destroy_process_vm: Destroy a VM
> + *
> + * @get_process_page_dir: Get physical address of a VM page directory
> + *
> + * @set_vm_context_page_table_base: Program page table base for a VMID
> + *
> + * @alloc_memory_of_gpu: Allocate GPUVM memory
> + *
> + * @free_memory_of_gpu: Free GPUVM memory
> + *
> + * @map_memory_to_gpu: Map GPUVM memory into a specific VM address
> + * space. Allocates and updates page tables and page directories as
> + * needed. This function may return before all page table updates have
> + * completed. This allows multiple map operations (on multiple GPUs)
> + * to happen concurrently. Use sync_memory to synchronize with all
> + * pending updates.
> + *
> + * @unmap_memor_to_gpu: Unmap GPUVM memory from a specific VM address space
> + *
> + * @sync_memory: Wait for pending page table updates to complete
> + *
> + * @map_gtt_bo_to_kernel: Map a GTT BO for kernel access
> + * Pins the BO, maps it to kernel address space. Such BOs are never evicted.
> + * The kernel virtual address remains valid until the BO is freed.
> + *
> + * @restore_process_bos: Restore all BOs that belong to the
> + * process. This is intended for restoring memory mappings after a TTM
> + * eviction.
> + *
> + * @invalidate_tlbs: Invalidate TLBs for a specific PASID
> + *
> + * @invalidate_tlbs_vmid: Invalidate TLBs for a specific VMID
> + *
>   * This structure contains function pointers to services that the kgd driver
>   * provides to amdkfd driver.
>   *
> @@ -275,6 +329,29 @@ struct kfd2kgd_calls {
>         void (*get_cu_info)(struct kgd_dev *kgd,
>                         struct kfd_cu_info *cu_info);
>         uint64_t (*get_vram_usage)(struct kgd_dev *kgd);
> +
> +       int (*create_process_vm)(struct kgd_dev *kgd, void **vm,
> +                       void **process_info, struct dma_fence **ef);
> +       void (*destroy_process_vm)(struct kgd_dev *kgd, void *vm);
> +       uint32_t (*get_process_page_dir)(void *vm);
> +       void (*set_vm_context_page_table_base)(struct kgd_dev *kgd,
> +                       uint32_t vmid, uint32_t page_table_base);
> +       int (*alloc_memory_of_gpu)(struct kgd_dev *kgd, uint64_t va,
> +                       uint64_t size, void *vm,
> +                       struct kgd_mem **mem, uint64_t *offset,
> +                       uint32_t flags);
> +       int (*free_memory_of_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem);
> +       int (*map_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
> +                       void *vm);
> +       int (*unmap_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
> +                       void *vm);
> +       int (*sync_memory)(struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
> +       int (*map_gtt_bo_to_kernel)(struct kgd_dev *kgd, struct kgd_mem *mem,
> +                       void **kptr, uint64_t *size);
> +       int (*restore_process_bos)(void *process_info, struct dma_fence **ef);
> +
> +       int (*invalidate_tlbs)(struct kgd_dev *kgd, uint16_t pasid);
> +       int (*invalidate_tlbs_vmid)(struct kgd_dev *kgd, uint16_t vmid);
>  };
>
>  /**
> --
> 2.7.4
>

Hi Felix,
I wrote some minor comments. If you don't object to them, I'll just
add them to the patch to save you the trouble of re-sending.


This patch is:
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional
       [not found]                 ` <50866577-97a4-2786-18af-ddb60a435aea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-12  9:06                   ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:06 UTC (permalink / raw)
  To: Christian König; +Cc: Felix Kuehling, amd-gfx list

This patch is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>


On Thu, Feb 8, 2018 at 10:16 AM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 07.02.2018 um 21:51 schrieb Felix Kuehling:
>
> On 2018-02-07 06:20 AM, Christian König wrote:
>
> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>
> dGPUs work without IOMMUv2. Make IOMMUv2 initialization dependent on
> ASIC information. Also allow building KFD without IOMMUv2 support.
> This is still useful for dGPUs and prepares for enabling KFD on
> architectures that don't support AMD IOMMUv2.
>
> v2:
> * Centralize IOMMUv2 code to avoid #ifdefs in too many places
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdkfd/Kconfig        |   2 +-
>   drivers/gpu/drm/amd/amdkfd/Makefile       |   4 +
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c     |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c   | 127 +++--------
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c   |   3 +
>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.c    | 356
> ++++++++++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_iommu.h    |  78 +++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h     |  14 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c  | 138 +-----------
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c |  16 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h |   6 +-
>   11 files changed, 493 insertions(+), 265 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
>   create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/Kconfig
> b/drivers/gpu/drm/amd/amdkfd/Kconfig
> index bc5a294..5bbeb95 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Kconfig
> +++ b/drivers/gpu/drm/amd/amdkfd/Kconfig
> @@ -4,6 +4,6 @@
>     config HSA_AMD
>       tristate "HSA kernel driver for AMD GPU devices"
> -    depends on DRM_AMDGPU && AMD_IOMMU_V2 && X86_64
> +    depends on DRM_AMDGPU && X86_64
>
> You still need a weak dependency on AMD_IOMMU_V2 here, in other words
> add "imply AMD_IOMMU_V2".
>
> This prevents illegal combinations like linking amdkfd into the kernel
> while amd_iommu_v2 is a module.
>
> But it should still allow to completely disable amd_iommu_v2 and
> compile amdkfd without support for it.
>
> Thanks, that's good to know. An updated patch is attached (to avoid
> resending the whole series).
>
>
> Patch is Acked-by: Christian König <christian.koenig@amd.com>.
>
> Regards,
> Christian.
>
>
> Regards,
>   Felix
>
> Christian.
>
>       help
>         Enable this if you want to use HSA features on AMD GPU devices.
> diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile
> b/drivers/gpu/drm/amd/amdkfd/Makefile
> index a317e76..0d02422 100644
> --- a/drivers/gpu/drm/amd/amdkfd/Makefile
> +++ b/drivers/gpu/drm/amd/amdkfd/Makefile
> @@ -37,6 +37,10 @@ amdkfd-y    := kfd_module.o kfd_device.o
> kfd_chardev.o kfd_topology.o \
>           kfd_interrupt.o kfd_events.o cik_event_interrupt.o \
>           kfd_dbgdev.o kfd_dbgmgr.o kfd_crat.o
>   +ifneq ($(CONFIG_AMD_IOMMU_V2),)
> +amdkfd-y += kfd_iommu.o
> +endif
> +
>   amdkfd-$(CONFIG_DEBUG_FS) += kfd_debugfs.o
>     obj-$(CONFIG_HSA_AMD)    += amdkfd.o
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> index 2bc2816..7493f47 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> @@ -22,10 +22,10 @@
>     #include <linux/pci.h>
>   #include <linux/acpi.h>
> -#include <linux/amd-iommu.h>
>   #include "kfd_crat.h"
>   #include "kfd_priv.h"
>   #include "kfd_topology.h"
> +#include "kfd_iommu.h"
>     /* GPU Processor ID base for dGPUs for which VCRAT needs to be
> created.
>    * GPU processor ID are expressed with Bit[31]=1.
> @@ -1037,15 +1037,11 @@ static int kfd_create_vcrat_image_gpu(void
> *pcrat_image,
>       struct crat_subtype_generic *sub_type_hdr;
>       struct crat_subtype_computeunit *cu;
>       struct kfd_cu_info cu_info;
> -    struct amd_iommu_device_info iommu_info;
>       int avail_size = *size;
>       uint32_t total_num_of_cu;
>       int num_of_cache_entries = 0;
>       int cache_mem_filled = 0;
>       int ret = 0;
> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> -                     AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> -                     AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
>       struct kfd_local_mem_info local_mem_info;
>         if (!pcrat_image || avail_size < VCRAT_SIZE_FOR_GPU)
> @@ -1106,12 +1102,8 @@ static int kfd_create_vcrat_image_gpu(void
> *pcrat_image,
>       /* Check if this node supports IOMMU. During parsing this flag
> will
>        * translate to HSA_CAP_ATS_PRESENT
>        */
> -    iommu_info.flags = 0;
> -    if (amd_iommu_device_info(kdev->pdev, &iommu_info) == 0) {
> -        if ((iommu_info.flags & required_iommu_flags) ==
> -                required_iommu_flags)
> -            cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
> -    }
> +    if (!kfd_iommu_check_device(kdev))
> +        cu->hsa_capability |= CRAT_CU_FLAGS_IOMMU_PRESENT;
>         crat_table->length += sub_type_hdr->length;
>       crat_table->total_entries++;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 83d6f41..4ac2d61 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -20,7 +20,9 @@
>    * OTHER DEALINGS IN THE SOFTWARE.
>    */
>   +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) ||
> defined(CONFIG_AMD_IOMMU_V2)
>   #include <linux/amd-iommu.h>
> +#endif
>   #include <linux/bsearch.h>
>   #include <linux/pci.h>
>   #include <linux/slab.h>
> @@ -28,9 +30,11 @@
>   #include "kfd_device_queue_manager.h"
>   #include "kfd_pm4_headers_vi.h"
>   #include "cwsr_trap_handler_gfx8.asm"
> +#include "kfd_iommu.h"
>     #define MQD_SIZE_ALIGNED 768
>   +#ifdef KFD_SUPPORT_IOMMU_V2
>   static const struct kfd_device_info kaveri_device_info = {
>       .asic_family = CHIP_KAVERI,
>       .max_pasid_bits = 16,
> @@ -41,6 +45,7 @@ static const struct kfd_device_info
> kaveri_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = false,
> +    .needs_iommu_device = true,
>       .needs_pci_atomics = false,
>   };
>   @@ -54,8 +59,10 @@ static const struct kfd_device_info
> carrizo_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = true,
>       .needs_pci_atomics = false,
>   };
> +#endif
>     static const struct kfd_device_info hawaii_device_info = {
>       .asic_family = CHIP_HAWAII,
> @@ -67,6 +74,7 @@ static const struct kfd_device_info
> hawaii_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = false,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = false,
>   };
>   @@ -79,6 +87,7 @@ static const struct kfd_device_info
> tonga_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = false,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = true,
>   };
>   @@ -91,6 +100,7 @@ static const struct kfd_device_info
> tonga_vf_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = false,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = false,
>   };
>   @@ -103,6 +113,7 @@ static const struct kfd_device_info
> fiji_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = true,
>   };
>   @@ -115,6 +126,7 @@ static const struct kfd_device_info
> fiji_vf_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = false,
>   };
>   @@ -128,6 +140,7 @@ static const struct kfd_device_info
> polaris10_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = true,
>   };
>   @@ -140,6 +153,7 @@ static const struct kfd_device_info
> polaris10_vf_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = false,
>   };
>   @@ -152,6 +166,7 @@ static const struct kfd_device_info
> polaris11_device_info = {
>       .num_of_watch_points = 4,
>       .mqd_size_aligned = MQD_SIZE_ALIGNED,
>       .supports_cwsr = true,
> +    .needs_iommu_device = false,
>       .needs_pci_atomics = true,
>   };
>   @@ -162,6 +177,7 @@ struct kfd_deviceid {
>   };
>     static const struct kfd_deviceid supported_devices[] = {
> +#ifdef KFD_SUPPORT_IOMMU_V2
>       { 0x1304, &kaveri_device_info },    /* Kaveri */
>       { 0x1305, &kaveri_device_info },    /* Kaveri */
>       { 0x1306, &kaveri_device_info },    /* Kaveri */
> @@ -189,6 +205,7 @@ static const struct kfd_deviceid
> supported_devices[] = {
>       { 0x9875, &carrizo_device_info },    /* Carrizo */
>       { 0x9876, &carrizo_device_info },    /* Carrizo */
>       { 0x9877, &carrizo_device_info },    /* Carrizo */
> +#endif
>       { 0x67A0, &hawaii_device_info },    /* Hawaii */
>       { 0x67A1, &hawaii_device_info },    /* Hawaii */
>       { 0x67A2, &hawaii_device_info },    /* Hawaii */
> @@ -302,77 +319,6 @@ struct kfd_dev *kgd2kfd_probe(struct kgd_dev *kgd,
>       return kfd;
>   }
>   -static bool device_iommu_pasid_init(struct kfd_dev *kfd)
> -{
> -    const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> -                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> -                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> -
> -    struct amd_iommu_device_info iommu_info;
> -    unsigned int pasid_limit;
> -    int err;
> -
> -    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> -    if (err < 0) {
> -        dev_err(kfd_device,
> -            "error getting iommu info. is the iommu enabled?\n");
> -        return false;
> -    }
> -
> -    if ((iommu_info.flags & required_iommu_flags) !=
> required_iommu_flags) {
> -        dev_err(kfd_device, "error required iommu flags ats %i, pri
> %i, pasid %i\n",
> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
> -               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
> -                                    != 0);
> -        return false;
> -    }
> -
> -    pasid_limit = min_t(unsigned int,
> -            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
> -            iommu_info.max_pasids);
> -
> -    if (!kfd_set_pasid_limit(pasid_limit)) {
> -        dev_err(kfd_device, "error setting pasid limit\n");
> -        return false;
> -    }
> -
> -    return true;
> -}
> -
> -static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
> pasid)
> -{
> -    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
> -
> -    if (dev)
> -        kfd_process_iommu_unbind_callback(dev, pasid);
> -}
> -
> -/*
> - * This function called by IOMMU driver on PPR failure
> - */
> -static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
> -        unsigned long address, u16 flags)
> -{
> -    struct kfd_dev *dev;
> -
> -    dev_warn(kfd_device,
> -            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
> flags 0x%X",
> -            PCI_BUS_NUM(pdev->devfn),
> -            PCI_SLOT(pdev->devfn),
> -            PCI_FUNC(pdev->devfn),
> -            pasid,
> -            address,
> -            flags);
> -
> -    dev = kfd_device_by_pci_dev(pdev);
> -    if (!WARN_ON(!dev))
> -        kfd_signal_iommu_event(dev, pasid, address,
> -            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
> -
> -    return AMD_IOMMU_INV_PRI_RSP_INVALID;
> -}
> -
>   static void kfd_cwsr_init(struct kfd_dev *kfd)
>   {
>       if (cwsr_enable && kfd->device_info->supports_cwsr) {
> @@ -462,11 +408,9 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>           goto device_queue_manager_error;
>       }
>   -    if (!device_iommu_pasid_init(kfd)) {
> -        dev_err(kfd_device,
> -            "Error initializing iommuv2 for device %x:%x\n",
> -            kfd->pdev->vendor, kfd->pdev->device);
> -        goto device_iommu_pasid_error;
> +    if (kfd_iommu_device_init(kfd)) {
> +        dev_err(kfd_device, "Error initializing iommuv2\n");
> +        goto device_iommu_error;
>       }
>         kfd_cwsr_init(kfd);
> @@ -486,7 +430,7 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
>       goto out;
>     kfd_resume_error:
> -device_iommu_pasid_error:
> +device_iommu_error:
>       device_queue_manager_uninit(kfd->dqm);
>   device_queue_manager_error:
>       kfd_interrupt_exit(kfd);
> @@ -527,11 +471,7 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>         kfd->dqm->ops.stop(kfd->dqm);
>   -    kfd_unbind_processes_from_device(kfd);
> -
> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> -    amd_iommu_free_device(kfd->pdev);
> +    kfd_iommu_suspend(kfd);
>   }
>     int kgd2kfd_resume(struct kfd_dev *kfd)
> @@ -546,19 +486,14 @@ int kgd2kfd_resume(struct kfd_dev *kfd)
>   static int kfd_resume(struct kfd_dev *kfd)
>   {
>       int err = 0;
> -    unsigned int pasid_limit = kfd_get_pasid_limit();
> -
> -    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> -    if (err)
> -        return -ENXIO;
> -    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> -                    iommu_pasid_shutdown_callback);
> -    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
> -                     iommu_invalid_ppr_cb);
>   -    err = kfd_bind_processes_to_device(kfd);
> -    if (err)
> -        goto processes_bind_error;
> +    err = kfd_iommu_resume(kfd);
> +    if (err) {
> +        dev_err(kfd_device,
> +            "Failed to resume IOMMU for device %x:%x\n",
> +            kfd->pdev->vendor, kfd->pdev->device);
> +        return err;
> +    }
>         err = kfd->dqm->ops.start(kfd->dqm);
>       if (err) {
> @@ -571,9 +506,7 @@ static int kfd_resume(struct kfd_dev *kfd)
>       return err;
>     dqm_start_error:
> -processes_bind_error:
> -    amd_iommu_free_device(kfd->pdev);
> -
> +    kfd_iommu_suspend(kfd);
>       return err;
>   }
>   diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index 93aae5c..6fb9c0d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -30,6 +30,7 @@
>   #include <linux/memory.h>
>   #include "kfd_priv.h"
>   #include "kfd_events.h"
> +#include "kfd_iommu.h"
>   #include <linux/device.h>
>     /*
> @@ -837,6 +838,7 @@ static void
> lookup_events_by_type_and_signal(struct kfd_process *p,
>       }
>   }
>   +#ifdef KFD_SUPPORT_IOMMU_V2
>   void kfd_signal_iommu_event(struct kfd_dev *dev, unsigned int pasid,
>           unsigned long address, bool is_write_requested,
>           bool is_execute_requested)
> @@ -905,6 +907,7 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
> unsigned int pasid,
>       mutex_unlock(&p->event_mutex);
>       kfd_unref_process(p);
>   }
> +#endif /* KFD_SUPPORT_IOMMU_V2 */
>     void kfd_signal_hw_exception_event(unsigned int pasid)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> new file mode 100644
> index 0000000..81dee34
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.c
> @@ -0,0 +1,356 @@
> +/*
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person
> obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom
> the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
> USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/printk.h>
> +#include <linux/device.h>
> +#include <linux/slab.h>
> +#include <linux/pci.h>
> +#include <linux/amd-iommu.h>
> +#include "kfd_priv.h"
> +#include "kfd_dbgmgr.h"
> +#include "kfd_topology.h"
> +#include "kfd_iommu.h"
> +
> +static const u32 required_iommu_flags = AMD_IOMMU_DEVICE_FLAG_ATS_SUP |
> +                    AMD_IOMMU_DEVICE_FLAG_PRI_SUP |
> +                    AMD_IOMMU_DEVICE_FLAG_PASID_SUP;
> +
> +/** kfd_iommu_check_device - Check whether IOMMU is available for
> device
> + */
> +int kfd_iommu_check_device(struct kfd_dev *kfd)
> +{
> +    struct amd_iommu_device_info iommu_info;
> +    int err;
> +
> +    if (!kfd->device_info->needs_iommu_device)
> +        return -ENODEV;
> +
> +    iommu_info.flags = 0;
> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +    if (err)
> +        return err;
> +
> +    if ((iommu_info.flags & required_iommu_flags) !=
> required_iommu_flags)
> +        return -ENODEV;
> +
> +    return 0;
> +}
> +
> +/** kfd_iommu_device_init - Initialize IOMMU for device
> + */
> +int kfd_iommu_device_init(struct kfd_dev *kfd)
> +{
> +    struct amd_iommu_device_info iommu_info;
> +    unsigned int pasid_limit;
> +    int err;
> +
> +    if (!kfd->device_info->needs_iommu_device)
> +        return 0;
> +
> +    iommu_info.flags = 0;
> +    err = amd_iommu_device_info(kfd->pdev, &iommu_info);
> +    if (err < 0) {
> +        dev_err(kfd_device,
> +            "error getting iommu info. is the iommu enabled?\n");
> +        return -ENODEV;
> +    }
> +
> +    if ((iommu_info.flags & required_iommu_flags) !=
> required_iommu_flags) {
> +        dev_err(kfd_device, "error required iommu flags ats %i, pri
> %i, pasid %i\n",
> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_ATS_SUP) != 0,
> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PRI_SUP) != 0,
> +               (iommu_info.flags & AMD_IOMMU_DEVICE_FLAG_PASID_SUP)
> +                                    != 0);
> +        return -ENODEV;
> +    }
> +
> +    pasid_limit = min_t(unsigned int,
> +            (unsigned int)(1 << kfd->device_info->max_pasid_bits),
> +            iommu_info.max_pasids);
> +
> +    if (!kfd_set_pasid_limit(pasid_limit)) {
> +        dev_err(kfd_device, "error setting pasid limit\n");
> +        return -EBUSY;
> +    }
> +
> +    return 0;
> +}
> +
> +/** kfd_iommu_bind_process_to_device - Have the IOMMU bind a process
> + *
> + * Binds the given process to the given device using its PASID. This
> + * enables IOMMUv2 address translation for the process on the device.
> + *
> + * This function assumes that the process mutex is held.
> + */
> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd)
> +{
> +    struct kfd_dev *dev = pdd->dev;
> +    struct kfd_process *p = pdd->process;
> +    int err;
> +
> +    if (!dev->device_info->needs_iommu_device || pdd->bound ==
> PDD_BOUND)
> +        return 0;
> +
> +    if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
> +        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
> +        return -EINVAL;
> +    }
> +
> +    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
> +    if (!err)
> +        pdd->bound = PDD_BOUND;
> +
> +    return err;
> +}
> +
> +/** kfd_iommu_unbind_process - Unbind process from all devices
> + *
> + * This removes all IOMMU device bindings of the process. To be used
> + * before process termination.
> + */
> +void kfd_iommu_unbind_process(struct kfd_process *p)
> +{
> +    struct kfd_process_device *pdd;
> +
> +    list_for_each_entry(pdd, &p->per_device_data, per_device_list)
> +        if (pdd->bound == PDD_BOUND)
> +            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> +}
> +
> +/* Callback for process shutdown invoked by the IOMMU driver */
> +static void iommu_pasid_shutdown_callback(struct pci_dev *pdev, int
> pasid)
> +{
> +    struct kfd_dev *dev = kfd_device_by_pci_dev(pdev);
> +    struct kfd_process *p;
> +    struct kfd_process_device *pdd;
> +
> +    if (!dev)
> +        return;
> +
> +    /*
> +     * Look for the process that matches the pasid. If there is no such
> +     * process, we either released it in amdkfd's own notifier, or
> there
> +     * is a bug. Unfortunately, there is no way to tell...
> +     */
> +    p = kfd_lookup_process_by_pasid(pasid);
> +    if (!p)
> +        return;
> +
> +    pr_debug("Unbinding process %d from IOMMU\n", pasid);
> +
> +    mutex_lock(kfd_get_dbgmgr_mutex());
> +
> +    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> +        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> +            kfd_dbgmgr_destroy(dev->dbgmgr);
> +            dev->dbgmgr = NULL;
> +        }
> +    }
> +
> +    mutex_unlock(kfd_get_dbgmgr_mutex());
> +
> +    mutex_lock(&p->mutex);
> +
> +    pdd = kfd_get_process_device_data(dev, p);
> +    if (pdd)
> +        /* For GPU relying on IOMMU, we need to dequeue here
> +         * when PASID is still bound.
> +         */
> +        kfd_process_dequeue_from_device(pdd);
> +
> +    mutex_unlock(&p->mutex);
> +
> +    kfd_unref_process(p);
> +}
> +
> +/* This function called by IOMMU driver on PPR failure */
> +static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int pasid,
> +        unsigned long address, u16 flags)
> +{
> +    struct kfd_dev *dev;
> +
> +    dev_warn(kfd_device,
> +            "Invalid PPR device %x:%x.%x pasid %d address 0x%lX
> flags 0x%X",
> +            PCI_BUS_NUM(pdev->devfn),
> +            PCI_SLOT(pdev->devfn),
> +            PCI_FUNC(pdev->devfn),
> +            pasid,
> +            address,
> +            flags);
> +
> +    dev = kfd_device_by_pci_dev(pdev);
> +    if (!WARN_ON(!dev))
> +        kfd_signal_iommu_event(dev, pasid, address,
> +            flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
> +
> +    return AMD_IOMMU_INV_PRI_RSP_INVALID;
> +}
> +
> +/*
> + * Bind processes do the device that have been temporarily unbound
> + * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
> + */
> +static int kfd_bind_processes_to_device(struct kfd_dev *kfd)
> +{
> +    struct kfd_process_device *pdd;
> +    struct kfd_process *p;
> +    unsigned int temp;
> +    int err = 0;
> +
> +    int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +        mutex_lock(&p->mutex);
> +        pdd = kfd_get_process_device_data(kfd, p);
> +
> +        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
> +            mutex_unlock(&p->mutex);
> +            continue;
> +        }
> +
> +        err = amd_iommu_bind_pasid(kfd->pdev, p->pasid,
> +                p->lead_thread);
> +        if (err < 0) {
> +            pr_err("Unexpected pasid %d binding failure\n",
> +                    p->pasid);
> +            mutex_unlock(&p->mutex);
> +            break;
> +        }
> +
> +        pdd->bound = PDD_BOUND;
> +        mutex_unlock(&p->mutex);
> +    }
> +
> +    srcu_read_unlock(&kfd_processes_srcu, idx);
> +
> +    return err;
> +}
> +
> +/*
> + * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
> + * processes will be restored to PDD_BOUND state in
> + * kfd_bind_processes_to_device.
> + */
> +static void kfd_unbind_processes_from_device(struct kfd_dev *kfd)
> +{
> +    struct kfd_process_device *pdd;
> +    struct kfd_process *p;
> +    unsigned int temp;
> +
> +    int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +        mutex_lock(&p->mutex);
> +        pdd = kfd_get_process_device_data(kfd, p);
> +
> +        if (WARN_ON(!pdd)) {
> +            mutex_unlock(&p->mutex);
> +            continue;
> +        }
> +
> +        if (pdd->bound == PDD_BOUND)
> +            pdd->bound = PDD_BOUND_SUSPENDED;
> +        mutex_unlock(&p->mutex);
> +    }
> +
> +    srcu_read_unlock(&kfd_processes_srcu, idx);
> +}
> +
> +/** kfd_iommu_suspend - Prepare IOMMU for suspend
> + *
> + * This unbinds processes from the device and disables the IOMMU for
> + * the device.
> + */
> +void kfd_iommu_suspend(struct kfd_dev *kfd)
> +{
> +    if (!kfd->device_info->needs_iommu_device)
> +        return;
> +
> +    kfd_unbind_processes_from_device(kfd);
> +
> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> +    amd_iommu_free_device(kfd->pdev);
> +}
> +
> +/** kfd_iommu_resume - Restore IOMMU after resume
> + *
> + * This reinitializes the IOMMU for the device and re-binds previously
> + * suspended processes to the device.
> + */
> +int kfd_iommu_resume(struct kfd_dev *kfd)
> +{
> +    unsigned int pasid_limit;
> +    int err;
> +
> +    if (!kfd->device_info->needs_iommu_device)
> +        return 0;
> +
> +    pasid_limit = kfd_get_pasid_limit();
> +
> +    err = amd_iommu_init_device(kfd->pdev, pasid_limit);
> +    if (err)
> +        return -ENXIO;
> +
> +    amd_iommu_set_invalidate_ctx_cb(kfd->pdev,
> +                    iommu_pasid_shutdown_callback);
> +    amd_iommu_set_invalid_ppr_cb(kfd->pdev,
> +                     iommu_invalid_ppr_cb);
> +
> +    err = kfd_bind_processes_to_device(kfd);
> +    if (err) {
> +        amd_iommu_set_invalidate_ctx_cb(kfd->pdev, NULL);
> +        amd_iommu_set_invalid_ppr_cb(kfd->pdev, NULL);
> +        amd_iommu_free_device(kfd->pdev);
> +        return err;
> +    }
> +
> +    return 0;
> +}
> +
> +extern bool amd_iommu_pc_supported(void);
> +extern u8 amd_iommu_pc_get_max_banks(u16 devid);
> +extern u8 amd_iommu_pc_get_max_counters(u16 devid);
> +
> +/** kfd_iommu_add_perf_counters - Add IOMMU performance counters to
> topology
> + */
> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev)
> +{
> +    struct kfd_perf_properties *props;
> +
> +    if (!(kdev->node_props.capability & HSA_CAP_ATS_PRESENT))
> +        return 0;
> +
> +    if (!amd_iommu_pc_supported())
> +        return 0;
> +
> +    props = kfd_alloc_struct(props);
> +    if (!props)
> +        return -ENOMEM;
> +    strcpy(props->block_name, "iommu");
> +    props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
> +        amd_iommu_pc_get_max_counters(0); /* assume one iommu */
> +    list_add_tail(&props->list, &kdev->perf_props);
> +
> +    return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
> new file mode 100644
> index 0000000..dd23d9f
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_iommu.h
> @@ -0,0 +1,78 @@
> +/*
> + * Copyright 2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person
> obtaining a
> + * copy of this software and associated documentation files (the
> "Software"),
> + * to deal in the Software without restriction, including without
> limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom
> the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO
> EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE
> USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef __KFD_IOMMU_H__
> +#define __KFD_IOMMU_H__
> +
> +#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
> +
> +#define KFD_SUPPORT_IOMMU_V2
> +
> +int kfd_iommu_check_device(struct kfd_dev *kfd);
> +int kfd_iommu_device_init(struct kfd_dev *kfd);
> +
> +int kfd_iommu_bind_process_to_device(struct kfd_process_device *pdd);
> +void kfd_iommu_unbind_process(struct kfd_process *p);
> +
> +void kfd_iommu_suspend(struct kfd_dev *kfd);
> +int kfd_iommu_resume(struct kfd_dev *kfd);
> +
> +int kfd_iommu_add_perf_counters(struct kfd_topology_device *kdev);
> +
> +#else
> +
> +static inline int kfd_iommu_check_device(struct kfd_dev *kfd)
> +{
> +    return -ENODEV;
> +}
> +static inline int kfd_iommu_device_init(struct kfd_dev *kfd)
> +{
> +    return 0;
> +}
> +
> +static inline int kfd_iommu_bind_process_to_device(
> +    struct kfd_process_device *pdd)
> +{
> +    return 0;
> +}
> +static inline void kfd_iommu_unbind_process(struct kfd_process *p)
> +{
> +    /* empty */
> +}
> +
> +static inline void kfd_iommu_suspend(struct kfd_dev *kfd)
> +{
> +    /* empty */
> +}
> +static inline int kfd_iommu_resume(struct kfd_dev *kfd)
> +{
> +    return 0;
> +}
> +
> +static inline int kfd_iommu_add_perf_counters(struct
> kfd_topology_device *kdev)
> +{
> +    return 0;
> +}
> +
> +#endif /* defined(CONFIG_AMD_IOMMU_V2) */
> +
> +#endif /* __KFD_IOMMU_H__ */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 594f853..f12eb5d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -158,6 +158,7 @@ struct kfd_device_info {
>       uint8_t num_of_watch_points;
>       uint16_t mqd_size_aligned;
>       bool supports_cwsr;
> +    bool needs_iommu_device;
>       bool needs_pci_atomics;
>   };
>   @@ -517,15 +518,15 @@ struct kfd_process_device {
>       uint64_t scratch_base;
>       uint64_t scratch_limit;
>   -    /* Is this process/pasid bound to this device?
> (amd_iommu_bind_pasid) */
> -    enum kfd_pdd_bound bound;
> -
>       /* Flag used to tell the pdd has dequeued from the dqm.
>        * This is used to prevent dev->dqm->ops.process_termination()
> from
>        * being called twice when it is already called in IOMMU callback
>        * function.
>        */
>       bool already_dequeued;
> +
> +    /* Is this process/pasid bound to this device?
> (amd_iommu_bind_pasid) */
> +    enum kfd_pdd_bound bound;
>   };
>     #define qpd_to_pdd(x) container_of(x, struct kfd_process_device,
> qpd)
> @@ -590,6 +591,10 @@ struct kfd_process {
>       bool signal_event_limit_reached;
>   };
>   +#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> +extern DECLARE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
> +extern struct srcu_struct kfd_processes_srcu;
> +
>   /**
>    * Ioctl function type.
>    *
> @@ -617,9 +622,6 @@ void kfd_unref_process(struct kfd_process *p);
>     struct kfd_process_device *kfd_bind_process_to_device(struct
> kfd_dev *dev,
>                           struct kfd_process *p);
> -int kfd_bind_processes_to_device(struct kfd_dev *dev);
> -void kfd_unbind_processes_from_device(struct kfd_dev *dev);
> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
> int pasid);
>   struct kfd_process_device *kfd_get_process_device_data(struct
> kfd_dev *dev,
>                               struct kfd_process *p);
>   struct kfd_process_device *kfd_create_process_device_data(struct
> kfd_dev *dev,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index 4ff5f0f..e9aee76 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -35,16 +35,16 @@ struct mm_struct;
>     #include "kfd_priv.h"
>   #include "kfd_dbgmgr.h"
> +#include "kfd_iommu.h"
>     /*
>    * List of struct kfd_process (field kfd_process).
>    * Unique/indexed by mm_struct*
>    */
> -#define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> -static DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
> +DEFINE_HASHTABLE(kfd_processes_table, KFD_PROCESS_TABLE_SIZE);
>   static DEFINE_MUTEX(kfd_processes_mutex);
>   -DEFINE_STATIC_SRCU(kfd_processes_srcu);
> +DEFINE_SRCU(kfd_processes_srcu);
>     static struct workqueue_struct *kfd_process_wq;
>   @@ -173,14 +173,8 @@ static void kfd_process_wq_release(struct
> work_struct *work)
>   {
>       struct kfd_process *p = container_of(work, struct kfd_process,
>                            release_work);
> -    struct kfd_process_device *pdd;
>   -    pr_debug("Releasing process (pasid %d) in workqueue\n",
> p->pasid);
> -
> -    list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> -        if (pdd->bound == PDD_BOUND)
> -            amd_iommu_unbind_pasid(pdd->dev->pdev, p->pasid);
> -    }
> +    kfd_iommu_unbind_process(p);
>         kfd_process_destroy_pdds(p);
>   @@ -429,133 +423,13 @@ struct kfd_process_device
> *kfd_bind_process_to_device(struct kfd_dev *dev,
>           return ERR_PTR(-ENOMEM);
>       }
>   -    if (pdd->bound == PDD_BOUND) {
> -        return pdd;
> -    } else if (unlikely(pdd->bound == PDD_BOUND_SUSPENDED)) {
> -        pr_err("Binding PDD_BOUND_SUSPENDED pdd is unexpected!\n");
> -        return ERR_PTR(-EINVAL);
> -    }
> -
> -    err = amd_iommu_bind_pasid(dev->pdev, p->pasid, p->lead_thread);
> -    if (err < 0)
> +    err = kfd_iommu_bind_process_to_device(pdd);
> +    if (err)
>           return ERR_PTR(err);
>   -    pdd->bound = PDD_BOUND;
> -
>       return pdd;
>   }
>   -/*
> - * Bind processes do the device that have been temporarily unbound
> - * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
> - */
> -int kfd_bind_processes_to_device(struct kfd_dev *dev)
> -{
> -    struct kfd_process_device *pdd;
> -    struct kfd_process *p;
> -    unsigned int temp;
> -    int err = 0;
> -
> -    int idx = srcu_read_lock(&kfd_processes_srcu);
> -
> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> -        mutex_lock(&p->mutex);
> -        pdd = kfd_get_process_device_data(dev, p);
> -
> -        if (WARN_ON(!pdd) || pdd->bound != PDD_BOUND_SUSPENDED) {
> -            mutex_unlock(&p->mutex);
> -            continue;
> -        }
> -
> -        err = amd_iommu_bind_pasid(dev->pdev, p->pasid,
> -                p->lead_thread);
> -        if (err < 0) {
> -            pr_err("Unexpected pasid %d binding failure\n",
> -                    p->pasid);
> -            mutex_unlock(&p->mutex);
> -            break;
> -        }
> -
> -        pdd->bound = PDD_BOUND;
> -        mutex_unlock(&p->mutex);
> -    }
> -
> -    srcu_read_unlock(&kfd_processes_srcu, idx);
> -
> -    return err;
> -}
> -
> -/*
> - * Mark currently bound processes as PDD_BOUND_SUSPENDED. These
> - * processes will be restored to PDD_BOUND state in
> - * kfd_bind_processes_to_device.
> - */
> -void kfd_unbind_processes_from_device(struct kfd_dev *dev)
> -{
> -    struct kfd_process_device *pdd;
> -    struct kfd_process *p;
> -    unsigned int temp;
> -
> -    int idx = srcu_read_lock(&kfd_processes_srcu);
> -
> -    hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> -        mutex_lock(&p->mutex);
> -        pdd = kfd_get_process_device_data(dev, p);
> -
> -        if (WARN_ON(!pdd)) {
> -            mutex_unlock(&p->mutex);
> -            continue;
> -        }
> -
> -        if (pdd->bound == PDD_BOUND)
> -            pdd->bound = PDD_BOUND_SUSPENDED;
> -        mutex_unlock(&p->mutex);
> -    }
> -
> -    srcu_read_unlock(&kfd_processes_srcu, idx);
> -}
> -
> -void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned
> int pasid)
> -{
> -    struct kfd_process *p;
> -    struct kfd_process_device *pdd;
> -
> -    /*
> -     * Look for the process that matches the pasid. If there is no such
> -     * process, we either released it in amdkfd's own notifier, or
> there
> -     * is a bug. Unfortunately, there is no way to tell...
> -     */
> -    p = kfd_lookup_process_by_pasid(pasid);
> -    if (!p)
> -        return;
> -
> -    pr_debug("Unbinding process %d from IOMMU\n", pasid);
> -
> -    mutex_lock(kfd_get_dbgmgr_mutex());
> -
> -    if (dev->dbgmgr && dev->dbgmgr->pasid == p->pasid) {
> -        if (!kfd_dbgmgr_unregister(dev->dbgmgr, p)) {
> -            kfd_dbgmgr_destroy(dev->dbgmgr);
> -            dev->dbgmgr = NULL;
> -        }
> -    }
> -
> -    mutex_unlock(kfd_get_dbgmgr_mutex());
> -
> -    mutex_lock(&p->mutex);
> -
> -    pdd = kfd_get_process_device_data(dev, p);
> -    if (pdd)
> -        /* For GPU relying on IOMMU, we need to dequeue here
> -         * when PASID is still bound.
> -         */
> -        kfd_process_dequeue_from_device(pdd);
> -
> -    mutex_unlock(&p->mutex);
> -
> -    kfd_unref_process(p);
> -}
> -
>   struct kfd_process_device *kfd_get_first_process_device_data(
>                           struct kfd_process *p)
>   {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> index 7783250..2506155 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
> @@ -35,6 +35,7 @@
>   #include "kfd_crat.h"
>   #include "kfd_topology.h"
>   #include "kfd_device_queue_manager.h"
> +#include "kfd_iommu.h"
>     /* topology_device_list - Master list of all topology devices */
>   static struct list_head topology_device_list;
> @@ -875,19 +876,8 @@ static void find_system_memory(const struct
> dmi_header *dm,
>    */
>   static int kfd_add_perf_to_topology(struct kfd_topology_device *kdev)
>   {
> -    struct kfd_perf_properties *props;
> -
> -    if (amd_iommu_pc_supported()) {
> -        props = kfd_alloc_struct(props);
> -        if (!props)
> -            return -ENOMEM;
> -        strcpy(props->block_name, "iommu");
> -        props->max_concurrent = amd_iommu_pc_get_max_banks(0) *
> -            amd_iommu_pc_get_max_counters(0); /* assume one iommu */
> -        list_add_tail(&props->list, &kdev->perf_props);
> -    }
> -
> -    return 0;
> +    /* These are the only counters supported so far */
> +    return kfd_iommu_add_perf_counters(kdev);
>   }
>     /* kfd_add_non_crat_information - Add information that is not
> currently
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> index 53fca1f..c0be2be 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
> @@ -25,7 +25,7 @@
>     #include <linux/types.h>
>   #include <linux/list.h>
> -#include "kfd_priv.h"
> +#include "kfd_crat.h"
>     #define KFD_TOPOLOGY_PUBLIC_NAME_SIZE 128
>   @@ -183,8 +183,4 @@ struct kfd_topology_device
> *kfd_create_topology_device(
>           struct list_head *device_list);
>   void kfd_release_topology_device_list(struct list_head *device_list);
>   -extern bool amd_iommu_pc_supported(void);
> -extern u8 amd_iommu_pc_get_max_banks(u16 devid);
> -extern u8 amd_iommu_pc_get_max_counters(u16 devid);
> -
>   #endif /* __KFD_TOPOLOGY_H__ */
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 12/25] drm/amdkfd: Use per-device sched_policy
       [not found]     ` <1517967174-21709-13-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  9:07       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:07 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> This was missed in a previous commit that made the scheduler policy
> a per-device setting.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index dca6257..47d493e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1296,7 +1296,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
>                 dqm->ops.process_termination = process_termination_nocpsch;
>                 break;
>         default:
> -               pr_err("Invalid scheduling policy %d\n", sched_policy);
> +               pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
>                 goto out_free;
>         }
>
> --
> 2.7.4
>

Squashed to original patch that changes the policy to per-device.
Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 13/25] drm/amdkfd: Remove unaligned memory access
       [not found]     ` <1517967174-21709-14-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  9:11       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:11 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
>
> Unaligned atomic operations can cause problems on some CPU
> architectures. Use simpler bitmask operations instead. Atomic bit
> manipulations are not necessary since dqm->lock is held during these
> operations.
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 25 ++++++++--------------
>  1 file changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 47d493e..1a28dc2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -118,9 +118,8 @@ static int allocate_vmid(struct device_queue_manager *dqm,
>         if (dqm->vmid_bitmap == 0)
>                 return -ENOMEM;
>
> -       bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap,
> -                               dqm->dev->vm_info.vmid_num_kfd);
> -       clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
> +       bit = ffs(dqm->vmid_bitmap) - 1;
> +       dqm->vmid_bitmap &= ~(1 << bit);
>
>         allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
>         pr_debug("vmid allocation %d\n", allocated_vmid);
> @@ -142,7 +141,7 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
>         /* Release the vmid mapping */
>         set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
>
> -       set_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
> +       dqm->vmid_bitmap |= (1 << bit);
>         qpd->vmid = 0;
>         q->properties.vmid = 0;
>  }
> @@ -223,12 +222,8 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
>                         continue;
>
>                 if (dqm->allocated_queues[pipe] != 0) {
> -                       bit = find_first_bit(
> -                               (unsigned long *)&dqm->allocated_queues[pipe],
> -                               get_queues_per_pipe(dqm));
> -
> -                       clear_bit(bit,
> -                               (unsigned long *)&dqm->allocated_queues[pipe]);
> +                       bit = ffs(dqm->allocated_queues[pipe]) - 1;
> +                       dqm->allocated_queues[pipe] &= ~(1 << bit);
>                         q->pipe = pipe;
>                         q->queue = bit;
>                         set = true;
> @@ -249,7 +244,7 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
>  static inline void deallocate_hqd(struct device_queue_manager *dqm,
>                                 struct queue *q)
>  {
> -       set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
> +       dqm->allocated_queues[q->pipe] |= (1 << q->queue);
>  }
>
>  static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
> @@ -589,10 +584,8 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
>         if (dqm->sdma_bitmap == 0)
>                 return -ENOMEM;
>
> -       bit = find_first_bit((unsigned long *)&dqm->sdma_bitmap,
> -                               CIK_SDMA_QUEUES);
> -
> -       clear_bit(bit, (unsigned long *)&dqm->sdma_bitmap);
> +       bit = ffs(dqm->sdma_bitmap) - 1;
> +       dqm->sdma_bitmap &= ~(1 << bit);
>         *sdma_queue_id = bit;
>
>         return 0;
> @@ -603,7 +596,7 @@ static void deallocate_sdma_queue(struct device_queue_manager *dqm,
>  {
>         if (sdma_queue_id >= CIK_SDMA_QUEUES)
>                 return;
> -       set_bit(sdma_queue_id, (unsigned long *)&dqm->sdma_bitmap);
> +       dqm->sdma_bitmap |= (1 << sdma_queue_id);
>  }
>
>  static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
> --
> 2.7.4
>

This patch is:
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD
       [not found]     ` <1517967174-21709-16-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  9:16       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:16 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Create/destroy the GPUVM context during PDD creation/destruction.
> Get VM page table base and program it during process registration
> (HWS) or VMID allocation (non-HWS).
>
> v2:
> * Used dev instead of pdd->dev in kfd_flush_tlb
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 20 +++++++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              | 13 +++++++++
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 33 ++++++++++++++++++++++
>  3 files changed, 66 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 1a28dc2..b7d0639 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -129,6 +129,15 @@ static int allocate_vmid(struct device_queue_manager *dqm,
>         set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
>         program_sh_mem_settings(dqm, qpd);
>
> +       /* qpd->page_table_base is set earlier when register_process()
> +        * is called, i.e. when the first queue is created.
> +        */
> +       dqm->dev->kfd2kgd->set_vm_context_page_table_base(dqm->dev->kgd,
> +                       qpd->vmid,
> +                       qpd->page_table_base);
> +       /* invalidate the VM context after pasid and vmid mapping is set up */
> +       kfd_flush_tlb(qpd_to_pdd(qpd));
> +
>         return 0;
>  }
>
> @@ -138,6 +147,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
>  {
>         int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
>
> +       kfd_flush_tlb(qpd_to_pdd(qpd));
> +
>         /* Release the vmid mapping */
>         set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
>
> @@ -450,6 +461,8 @@ static int register_process(struct device_queue_manager *dqm,
>                                         struct qcm_process_device *qpd)
>  {
>         struct device_process_node *n;
> +       struct kfd_process_device *pdd;
> +       uint32_t pd_base;
>         int retval;
>
>         n = kzalloc(sizeof(*n), GFP_KERNEL);
> @@ -458,9 +471,16 @@ static int register_process(struct device_queue_manager *dqm,
>
>         n->qpd = qpd;
>
> +       pdd = qpd_to_pdd(qpd);
> +       /* Retrieve PD base */
> +       pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
> +
>         mutex_lock(&dqm->lock);
>         list_add(&n->list, &dqm->queues);
>
> +       /* Update PD Base in QPD */
> +       qpd->page_table_base = pd_base;
> +
>         retval = dqm->asic_ops.update_qpd(dqm, qpd);
>
>         dqm->processes_count++;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index f12eb5d..56c2e36 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -518,6 +518,9 @@ struct kfd_process_device {
>         uint64_t scratch_base;
>         uint64_t scratch_limit;
>
> +       /* VM context for GPUVM allocations */
> +       void *vm;
> +
>         /* Flag used to tell the pdd has dequeued from the dqm.
>          * This is used to prevent dev->dqm->ops.process_termination() from
>          * being called twice when it is already called in IOMMU callback
> @@ -589,6 +592,14 @@ struct kfd_process {
>         size_t signal_mapped_size;
>         size_t signal_event_count;
>         bool signal_event_limit_reached;
> +
> +       /* Information used for memory eviction */
> +       void *kgd_process_info;
> +       /* Eviction fence that is attached to all the BOs of this process. The
> +        * fence will be triggered during eviction and new one will be created
> +        * during restore
> +        */
> +       struct dma_fence *ef;
>  };
>
>  #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> @@ -802,6 +813,8 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
>                      uint64_t *event_page_offset, uint32_t *event_slot_index);
>  int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
>
> +void kfd_flush_tlb(struct kfd_process_device *pdd);
> +
>  int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process *p);
>
>  /* Debugfs */
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index e9aee76..cf4fa25 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -34,6 +34,7 @@
>  struct mm_struct;
>
>  #include "kfd_priv.h"
> +#include "kfd_device_queue_manager.h"
>  #include "kfd_dbgmgr.h"
>  #include "kfd_iommu.h"
>
> @@ -154,6 +155,10 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
>                 pr_debug("Releasing pdd (topology id %d) for process (pasid %d)\n",
>                                 pdd->dev->id, p->pasid);
>
> +               if (pdd->vm)
> +                       pdd->dev->kfd2kgd->destroy_process_vm(
> +                               pdd->dev->kgd, pdd->vm);
> +
>                 list_del(&pdd->per_device_list);
>
>                 if (pdd->qpd.cwsr_kaddr)
> @@ -177,6 +182,7 @@ static void kfd_process_wq_release(struct work_struct *work)
>         kfd_iommu_unbind_process(p);
>
>         kfd_process_destroy_pdds(p);
> +       dma_fence_put(p->ef);
>
>         kfd_event_free_process(p);
>
> @@ -401,7 +407,18 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>         pdd->already_dequeued = false;
>         list_add(&pdd->per_device_list, &p->per_device_data);
>
> +       /* Create the GPUVM context for this specific device */
> +       if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
> +                                           &p->kgd_process_info, &p->ef)) {
> +               pr_err("Failed to create process VM object\n");
> +               goto err_create_pdd;
> +       }
>         return pdd;
> +
> +err_create_pdd:
> +       list_del(&pdd->per_device_list);
> +       kfree(pdd);
> +       return NULL;
>  }
>
>  /*
> @@ -507,6 +524,22 @@ int kfd_reserved_mem_mmap(struct kfd_process *process,
>                                KFD_CWSR_TBA_TMA_SIZE, vma->vm_page_prot);
>  }
>
> +void kfd_flush_tlb(struct kfd_process_device *pdd)
> +{
> +       struct kfd_dev *dev = pdd->dev;
> +       const struct kfd2kgd_calls *f2g = dev->kfd2kgd;
> +
> +       if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
> +               /* Nothing to flush until a VMID is assigned, which
> +                * only happens when the first queue is created.
> +                */
> +               if (pdd->qpd.vmid)
> +                       f2g->invalidate_tlbs_vmid(dev->kgd, pdd->qpd.vmid);
> +       } else {
> +               f2g->invalidate_tlbs(dev->kgd, pdd->process->pasid);
> +       }
> +}
> +
>  #if defined(CONFIG_DEBUG_FS)
>
>  int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data)
> --
> 2.7.4
>


This patch is:
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore
       [not found]     ` <1517967174-21709-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  9:36       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:36 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> When the TTM memory manager in KGD evicts BOs, all user mode queues
> potentially accessing these BOs must be evicted temporarily. Once
> user mode queues are evicted, the eviction fence is signaled,
> allowing the migration of the BO to proceed.
>
> A delayed worker is scheduled to restore all the BOs belonging to
> the evicted process and restart its queues.
>
> During suspend/resume of the GPU we also evict all processes to allow
> KGD to save BOs in system memory, since VRAM will be lost.
>
> v2:
> * Account for eviction when updating of q->is_active in MQD manager
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  65 +++++-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 219 ++++++++++++++++++++-
>  .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   9 +
>  drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   2 +
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   9 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   6 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  32 ++-
>  drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 213 ++++++++++++++++++++
>  8 files changed, 547 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> index 4ac2d61..334669996 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
> @@ -33,6 +33,7 @@
>  #include "kfd_iommu.h"
>
>  #define MQD_SIZE_ALIGNED 768
> +static atomic_t kfd_device_suspended = ATOMIC_INIT(0);
>
>  #ifdef KFD_SUPPORT_IOMMU_V2
>  static const struct kfd_device_info kaveri_device_info = {
> @@ -469,6 +470,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>         if (!kfd->init_complete)
>                 return;
>
> +       /* For first KFD device suspend all the KFD processes */
> +       if (atomic_inc_return(&kfd_device_suspended) == 1)
> +               kfd_suspend_all_processes();
> +
>         kfd->dqm->ops.stop(kfd->dqm);
>
>         kfd_iommu_suspend(kfd);
> @@ -476,11 +481,21 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
>
>  int kgd2kfd_resume(struct kfd_dev *kfd)
>  {
> +       int ret, count;
> +
>         if (!kfd->init_complete)
>                 return 0;
>
> -       return kfd_resume(kfd);
> +       ret = kfd_resume(kfd);
> +       if (ret)
> +               return ret;
> +
> +       count = atomic_dec_return(&kfd_device_suspended);
> +       WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
> +       if (count == 0)
> +               ret = kfd_resume_all_processes();
>
> +       return ret;
>  }
>
>  static int kfd_resume(struct kfd_dev *kfd)
> @@ -526,6 +541,54 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
>         spin_unlock(&kfd->interrupt_lock);
>  }
>
> +/** kgd2kfd_schedule_evict_and_restore_process - Schedules work queue that will
> + *   prepare for safe eviction of KFD BOs that belong to the specified
> + *   process.
> + *
> + * @mm: mm_struct that identifies the specified KFD process
> + * @fence: eviction fence attached to KFD process BOs
> + *
> + */
> +int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
> +                                              struct dma_fence *fence)
> +{
> +       struct kfd_process *p;
> +       unsigned long active_time;
> +       unsigned long delay_jiffies = msecs_to_jiffies(PROCESS_ACTIVE_TIME_MS);
> +
> +       if (!fence)
> +               return -EINVAL;
> +
> +       if (dma_fence_is_signaled(fence))
> +               return 0;
> +
> +       p = kfd_lookup_process_by_mm(mm);
> +       if (!p)
> +               return -ENODEV;
> +
> +       if (fence->seqno == p->last_eviction_seqno)
> +               goto out;
> +
> +       p->last_eviction_seqno = fence->seqno;
> +
> +       /* Avoid KFD process starvation. Wait for at least
> +        * PROCESS_ACTIVE_TIME_MS before evicting the process again
> +        */
> +       active_time = get_jiffies_64() - p->last_restore_timestamp;
> +       if (delay_jiffies > active_time)
> +               delay_jiffies -= active_time;
> +       else
> +               delay_jiffies = 0;
> +
> +       /* During process initialization eviction_work.dwork is initialized
> +        * to kfd_evict_bo_worker
> +        */
> +       schedule_delayed_work(&p->eviction_work, delay_jiffies);
> +out:
> +       kfd_unref_process(p);
> +       return 0;
> +}
> +
>  static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
>                                 unsigned int chunk_size)
>  {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index b7d0639..b3b6dab 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -21,10 +21,11 @@
>   *
>   */
>
> +#include <linux/ratelimit.h>
> +#include <linux/printk.h>
>  #include <linux/slab.h>
>  #include <linux/list.h>
>  #include <linux/types.h>
> -#include <linux/printk.h>
>  #include <linux/bitops.h>
>  #include <linux/sched.h>
>  #include "kfd_priv.h"
> @@ -180,6 +181,14 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
>                         goto out_unlock;
>         }
>         q->properties.vmid = qpd->vmid;
> +       /*
> +        * Eviction state logic: we only mark active queues as evicted
> +        * to avoid the overhead of restoring inactive queues later
> +        */
> +       if (qpd->evicted)
> +               q->properties.is_evicted = (q->properties.queue_size > 0 &&
> +                                           q->properties.queue_percent > 0 &&
> +                                           q->properties.queue_address != 0);
>
>         q->properties.tba_addr = qpd->tba_addr;
>         q->properties.tma_addr = qpd->tma_addr;
> @@ -377,15 +386,29 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
>  {
>         int retval;
>         struct mqd_manager *mqd;
> +       struct kfd_process_device *pdd;
>         bool prev_active = false;
>
>         mutex_lock(&dqm->lock);
> +       pdd = kfd_get_process_device_data(q->device, q->process);
> +       if (!pdd) {
> +               retval = -ENODEV;
> +               goto out_unlock;
> +       }
>         mqd = dqm->ops.get_mqd_manager(dqm,
>                         get_mqd_type_from_queue_type(q->properties.type));
>         if (!mqd) {
>                 retval = -ENOMEM;
>                 goto out_unlock;
>         }
> +       /*
> +        * Eviction state logic: we only mark active queues as evicted
> +        * to avoid the overhead of restoring inactive queues later
> +        */
> +       if (pdd->qpd.evicted)
> +               q->properties.is_evicted = (q->properties.queue_size > 0 &&
> +                                           q->properties.queue_percent > 0 &&
> +                                           q->properties.queue_address != 0);
>
>         /* Save previous activity state for counters */
>         prev_active = q->properties.is_active;
> @@ -457,6 +480,187 @@ static struct mqd_manager *get_mqd_manager(
>         return mqd;
>  }
>
> +static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
> +                                       struct qcm_process_device *qpd)
> +{
> +       struct queue *q;
> +       struct mqd_manager *mqd;
> +       struct kfd_process_device *pdd;
> +       int retval = 0;
> +
> +       mutex_lock(&dqm->lock);
> +       if (qpd->evicted++ > 0) /* already evicted, do nothing */
> +               goto out;
> +
> +       pdd = qpd_to_pdd(qpd);
> +       pr_info_ratelimited("Evicting PASID %u queues\n",
> +                           pdd->process->pasid);
> +
> +       /* unactivate all active queues on the qpd */
> +       list_for_each_entry(q, &qpd->queues_list, list) {
> +               if (!q->properties.is_active)
> +                       continue;
> +               mqd = dqm->ops.get_mqd_manager(dqm,
> +                       get_mqd_type_from_queue_type(q->properties.type));
> +               if (!mqd) { /* should not be here */
> +                       pr_err("Cannot evict queue, mqd mgr is NULL\n");
> +                       retval = -ENOMEM;
> +                       goto out;
> +               }
> +               q->properties.is_evicted = true;
> +               q->properties.is_active = false;
> +               retval = mqd->destroy_mqd(mqd, q->mqd,
> +                               KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,
> +                               KFD_UNMAP_LATENCY_MS, q->pipe, q->queue);
> +               if (retval)
> +                       goto out;
> +               dqm->queue_count--;
> +       }
> +
> +out:
> +       mutex_unlock(&dqm->lock);
> +       return retval;
> +}
> +
> +static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
> +                                     struct qcm_process_device *qpd)
> +{
> +       struct queue *q;
> +       struct kfd_process_device *pdd;
> +       int retval = 0;
> +
> +       mutex_lock(&dqm->lock);
> +       if (qpd->evicted++ > 0) /* already evicted, do nothing */
> +               goto out;
> +
> +       pdd = qpd_to_pdd(qpd);
> +       pr_info_ratelimited("Evicting PASID %u queues\n",
> +                           pdd->process->pasid);
> +
> +       /* unactivate all active queues on the qpd */
> +       list_for_each_entry(q, &qpd->queues_list, list) {
> +               if (!q->properties.is_active)
> +                       continue;
> +               q->properties.is_evicted = true;
> +               q->properties.is_active = false;
> +               dqm->queue_count--;
> +       }
> +       retval = execute_queues_cpsch(dqm,
> +                               qpd->is_debug ?
> +                               KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :
> +                               KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
> +
> +out:
> +       mutex_unlock(&dqm->lock);
> +       return retval;
> +}
> +
> +static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
> +                                         struct qcm_process_device *qpd)
> +{
> +       struct queue *q;
> +       struct mqd_manager *mqd;
> +       struct kfd_process_device *pdd;
> +       uint32_t pd_base;
> +       int retval = 0;
> +
> +       pdd = qpd_to_pdd(qpd);
> +       /* Retrieve PD base */
> +       pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
> +
> +       mutex_lock(&dqm->lock);
> +       if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
> +               goto out;
> +       if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
> +               qpd->evicted--;
> +               goto out;
> +       }
> +
> +       pr_info_ratelimited("Restoring PASID %u queues\n",
> +                           pdd->process->pasid);
> +
> +       /* Update PD Base in QPD */
> +       qpd->page_table_base = pd_base;
> +       pr_debug("Updated PD address to 0x%08x\n", pd_base);
> +
> +       if (!list_empty(&qpd->queues_list)) {
> +               dqm->dev->kfd2kgd->set_vm_context_page_table_base(
> +                               dqm->dev->kgd,
> +                               qpd->vmid,
> +                               qpd->page_table_base);
> +               kfd_flush_tlb(pdd);
> +       }
> +
> +       /* activate all active queues on the qpd */
> +       list_for_each_entry(q, &qpd->queues_list, list) {
> +               if (!q->properties.is_evicted)
> +                       continue;
> +               mqd = dqm->ops.get_mqd_manager(dqm,
> +                       get_mqd_type_from_queue_type(q->properties.type));
> +               if (!mqd) { /* should not be here */
> +                       pr_err("Cannot restore queue, mqd mgr is NULL\n");
> +                       retval = -ENOMEM;
> +                       goto out;
> +               }
> +               q->properties.is_evicted = false;
> +               q->properties.is_active = true;
> +               retval = mqd->load_mqd(mqd, q->mqd, q->pipe,
> +                                      q->queue, &q->properties,
> +                                      q->process->mm);
> +               if (retval)
> +                       goto out;
> +               dqm->queue_count++;
> +       }
> +       qpd->evicted = 0;
> +out:
> +       mutex_unlock(&dqm->lock);
> +       return retval;
> +}
> +
> +static int restore_process_queues_cpsch(struct device_queue_manager *dqm,
> +                                       struct qcm_process_device *qpd)
> +{
> +       struct queue *q;
> +       struct kfd_process_device *pdd;
> +       uint32_t pd_base;
> +       int retval = 0;
> +
> +       pdd = qpd_to_pdd(qpd);
> +       /* Retrieve PD base */
> +       pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
> +
> +       mutex_lock(&dqm->lock);
> +       if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
> +               goto out;
> +       if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
> +               qpd->evicted--;
> +               goto out;
> +       }
> +
> +       pr_info_ratelimited("Restoring PASID %u queues\n",
> +                           pdd->process->pasid);
> +
> +       /* Update PD Base in QPD */
> +       qpd->page_table_base = pd_base;
> +       pr_debug("Updated PD address to 0x%08x\n", pd_base);
> +
> +       /* activate all active queues on the qpd */
> +       list_for_each_entry(q, &qpd->queues_list, list) {
> +               if (!q->properties.is_evicted)
> +                       continue;
> +               q->properties.is_evicted = false;
> +               q->properties.is_active = true;
> +               dqm->queue_count++;
> +       }
> +       retval = execute_queues_cpsch(dqm,
> +                               KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
> +       if (!retval)
> +               qpd->evicted = 0;
> +out:
> +       mutex_unlock(&dqm->lock);
> +       return retval;
> +}
> +
>  static int register_process(struct device_queue_manager *dqm,
>                                         struct qcm_process_device *qpd)
>  {
> @@ -853,6 +1057,14 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
>                 retval = -ENOMEM;
>                 goto out;
>         }
> +       /*
> +        * Eviction state logic: we only mark active queues as evicted
> +        * to avoid the overhead of restoring inactive queues later
> +        */
> +       if (qpd->evicted)
> +               q->properties.is_evicted = (q->properties.queue_size > 0 &&
> +                                           q->properties.queue_percent > 0 &&
> +                                           q->properties.queue_address != 0);
>
>         dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
>
> @@ -1291,6 +1503,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
>                 dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
>                 dqm->ops.set_trap_handler = set_trap_handler;
>                 dqm->ops.process_termination = process_termination_cpsch;
> +               dqm->ops.evict_process_queues = evict_process_queues_cpsch;
> +               dqm->ops.restore_process_queues = restore_process_queues_cpsch;
>                 break;
>         case KFD_SCHED_POLICY_NO_HWS:
>                 /* initialize dqm for no cp scheduling */
> @@ -1307,6 +1521,9 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
>                 dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
>                 dqm->ops.set_trap_handler = set_trap_handler;
>                 dqm->ops.process_termination = process_termination_nocpsch;
> +               dqm->ops.evict_process_queues = evict_process_queues_nocpsch;
> +               dqm->ops.restore_process_queues =
> +                       restore_process_queues_nocpsch;
>                 break;
>         default:
>                 pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> index 68be0aa..412beff 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
> @@ -79,6 +79,10 @@ struct device_process_node {
>   *
>   * @process_termination: Clears all process queues belongs to that device.
>   *
> + * @evict_process_queues: Evict all active queues of a process
> + *
> + * @restore_process_queues: Restore all evicted queues queues of a process
> + *
>   */
>
>  struct device_queue_manager_ops {
> @@ -129,6 +133,11 @@ struct device_queue_manager_ops {
>
>         int (*process_termination)(struct device_queue_manager *dqm,
>                         struct qcm_process_device *qpd);
> +
> +       int (*evict_process_queues)(struct device_queue_manager *dqm,
> +                                   struct qcm_process_device *qpd);
> +       int (*restore_process_queues)(struct device_queue_manager *dqm,
> +                                     struct qcm_process_device *qpd);
>  };
>
>  struct device_queue_manager_asic_ops {
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> index 3ac72be..65574c6 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
> @@ -43,6 +43,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
>         .interrupt      = kgd2kfd_interrupt,
>         .suspend        = kgd2kfd_suspend,
>         .resume         = kgd2kfd_resume,
> +       .schedule_evict_and_restore_process =
> +                         kgd2kfd_schedule_evict_and_restore_process,
>  };
>
>  int sched_policy = KFD_SCHED_POLICY_HWS;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> index fbe3f83..c00c325 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
> @@ -202,7 +202,8 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd,
>
>         q->is_active = (q->queue_size > 0 &&
>                         q->queue_address != 0 &&
> -                       q->queue_percent > 0);
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
>
>         return 0;
>  }
> @@ -245,7 +246,8 @@ static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
>
>         q->is_active = (q->queue_size > 0 &&
>                         q->queue_address != 0 &&
> -                       q->queue_percent > 0);
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
>
>         return 0;
>  }
> @@ -377,7 +379,8 @@ static int update_mqd_hiq(struct mqd_manager *mm, void *mqd,
>
>         q->is_active = (q->queue_size > 0 &&
>                         q->queue_address != 0 &&
> -                       q->queue_percent > 0);
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
>
>         return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> index 58221c1..89e4242 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
> @@ -198,7 +198,8 @@ static int __update_mqd(struct mqd_manager *mm, void *mqd,
>
>         q->is_active = (q->queue_size > 0 &&
>                         q->queue_address != 0 &&
> -                       q->queue_percent > 0);
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
>
>         return 0;
>  }
> @@ -342,7 +343,8 @@ static int update_mqd_sdma(struct mqd_manager *mm, void *mqd,
>
>         q->is_active = (q->queue_size > 0 &&
>                         q->queue_address != 0 &&
> -                       q->queue_percent > 0);
> +                       q->queue_percent > 0 &&
> +                       !q->is_evicted);
>
>         return 0;
>  }
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> index 56c2e36..cac7aa2 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
> @@ -335,7 +335,11 @@ enum kfd_queue_format {
>   * @is_interop: Defines if this is a interop queue. Interop queue means that
>   * the queue can access both graphics and compute resources.
>   *
> - * @is_active: Defines if the queue is active or not.
> + * @is_evicted: Defines if the queue is evicted. Only active queues
> + * are evicted, rendering them inactive.
> + *
> + * @is_active: Defines if the queue is active or not. @is_active and
> + * @is_evicted are protected by the DQM lock.
>   *
>   * @vmid: If the scheduling mode is no cp scheduling the field defines the vmid
>   * of the queue.
> @@ -357,6 +361,7 @@ struct queue_properties {
>         uint32_t __iomem *doorbell_ptr;
>         uint32_t doorbell_off;
>         bool is_interop;
> +       bool is_evicted;
>         bool is_active;
>         /* Not relevant for user mode queues in cp scheduling */
>         unsigned int vmid;
> @@ -460,6 +465,7 @@ struct qcm_process_device {
>         unsigned int queue_count;
>         unsigned int vmid;
>         bool is_debug;
> +       unsigned int evicted; /* eviction counter, 0=active */
>
>         /* This flag tells if we should reset all wavefronts on
>          * process termination
> @@ -486,6 +492,17 @@ struct qcm_process_device {
>         uint64_t tma_addr;
>  };
>
> +/* KFD Memory Eviction */
> +
> +/* Approx. wait time before attempting to restore evicted BOs */
> +#define PROCESS_RESTORE_TIME_MS 100
> +/* Approx. back off time if restore fails due to lack of memory */
> +#define PROCESS_BACK_OFF_TIME_MS 100
> +/* Approx. time before evicting the process again */
> +#define PROCESS_ACTIVE_TIME_MS 10
> +
> +int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
> +                                              struct dma_fence *fence);
>
>  enum kfd_pdd_bound {
>         PDD_UNBOUND = 0,
> @@ -600,6 +617,16 @@ struct kfd_process {
>          * during restore
>          */
>         struct dma_fence *ef;
> +
> +       /* Work items for evicting and restoring BOs */
> +       struct delayed_work eviction_work;
> +       struct delayed_work restore_work;
> +       /* seqno of the last scheduled eviction */
> +       unsigned int last_eviction_seqno;
> +       /* Approx. the last timestamp (in jiffies) when the process was
> +        * restored after an eviction
> +        */
> +       unsigned long last_restore_timestamp;
>  };
>
>  #define KFD_PROCESS_TABLE_SIZE 5 /* bits: 32 entries */
> @@ -629,7 +656,10 @@ void kfd_process_destroy_wq(void);
>  struct kfd_process *kfd_create_process(struct file *filep);
>  struct kfd_process *kfd_get_process(const struct task_struct *);
>  struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
> +struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
>  void kfd_unref_process(struct kfd_process *p);
> +void kfd_suspend_all_processes(void);
> +int kfd_resume_all_processes(void);
>
>  struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
>                                                 struct kfd_process *p);
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> index cf4fa25..18b2b86 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
> @@ -55,6 +55,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
>                                         struct file *filep);
>  static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep);
>
> +static void evict_process_worker(struct work_struct *work);
> +static void restore_process_worker(struct work_struct *work);
> +
>
>  void kfd_process_create_wq(void)
>  {
> @@ -230,6 +233,9 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
>         mutex_unlock(&kfd_processes_mutex);
>         synchronize_srcu(&kfd_processes_srcu);
>
> +       cancel_delayed_work_sync(&p->eviction_work);
> +       cancel_delayed_work_sync(&p->restore_work);
> +
>         mutex_lock(&p->mutex);
>
>         /* Iterate over all process device data structures and if the
> @@ -351,6 +357,10 @@ static struct kfd_process *create_process(const struct task_struct *thread,
>         if (err != 0)
>                 goto err_init_apertures;
>
> +       INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
> +       INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
> +       process->last_restore_timestamp = get_jiffies_64();
> +
>         err = kfd_process_init_cwsr(process, filep);
>         if (err)
>                 goto err_init_cwsr;
> @@ -402,6 +412,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
>         INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
>         pdd->qpd.dqm = dev->dqm;
>         pdd->qpd.pqm = &p->pqm;
> +       pdd->qpd.evicted = 0;
>         pdd->process = p;
>         pdd->bound = PDD_UNBOUND;
>         pdd->already_dequeued = false;
> @@ -490,6 +501,208 @@ struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
>         return ret_p;
>  }
>
> +/* This increments the process->ref counter. */
> +struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
> +{
> +       struct kfd_process *p;
> +
> +       int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +       p = find_process_by_mm(mm);
> +       if (p)
> +               kref_get(&p->ref);
> +
> +       srcu_read_unlock(&kfd_processes_srcu, idx);
> +
> +       return p;
> +}
> +
> +/* process_evict_queues - Evict all user queues of a process
> + *
> + * Eviction is reference-counted per process-device. This means multiple
> + * evictions from different sources can be nested safely.
> + */
> +static int process_evict_queues(struct kfd_process *p)
> +{
> +       struct kfd_process_device *pdd;
> +       int r = 0;
> +       unsigned int n_evicted = 0;
> +
> +       list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> +               r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
> +                                                           &pdd->qpd);
> +               if (r) {
> +                       pr_err("Failed to evict process queues\n");
> +                       goto fail;
> +               }
> +               n_evicted++;
> +       }
> +
> +       return r;
> +
> +fail:
> +       /* To keep state consistent, roll back partial eviction by
> +        * restoring queues
> +        */
> +       list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> +               if (n_evicted == 0)
> +                       break;
> +               if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
> +                                                             &pdd->qpd))
> +                       pr_err("Failed to restore queues\n");
> +
> +               n_evicted--;
> +       }
> +
> +       return r;
> +}
> +
> +/* process_restore_queues - Restore all user queues of a process */
> +static  int process_restore_queues(struct kfd_process *p)
> +{
> +       struct kfd_process_device *pdd;
> +       int r, ret = 0;
> +
> +       list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
> +               r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
> +                                                             &pdd->qpd);
> +               if (r) {
> +                       pr_err("Failed to restore process queues\n");
> +                       if (!ret)
> +                               ret = r;
> +               }
> +       }
> +
> +       return ret;
> +}
> +
> +static void evict_process_worker(struct work_struct *work)
> +{
> +       int ret;
> +       struct kfd_process *p;
> +       struct delayed_work *dwork;
> +
> +       dwork = to_delayed_work(work);
> +
> +       /* Process termination destroys this worker thread. So during the
> +        * lifetime of this thread, kfd_process p will be valid
> +        */
> +       p = container_of(dwork, struct kfd_process, eviction_work);
> +       WARN_ONCE(p->last_eviction_seqno != p->ef->seqno,
> +                 "Eviction fence mismatch\n");
> +
> +       /* Narrow window of overlap between restore and evict work
> +        * item is possible. Once amdgpu_amdkfd_gpuvm_restore_process_bos
> +        * unreserves KFD BOs, it is possible to evicted again. But
> +        * restore has few more steps of finish. So lets wait for any
> +        * previous restore work to complete
> +        */
> +       flush_delayed_work(&p->restore_work);
> +
> +       pr_debug("Started evicting pasid %d\n", p->pasid);
> +       ret = process_evict_queues(p);
> +       if (!ret) {
> +               dma_fence_signal(p->ef);
> +               dma_fence_put(p->ef);
> +               p->ef = NULL;
> +               schedule_delayed_work(&p->restore_work,
> +                               msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
> +
> +               pr_debug("Finished evicting pasid %d\n", p->pasid);
> +       } else
> +               pr_err("Failed to evict queues of pasid %d\n", p->pasid);
> +}
> +
> +static void restore_process_worker(struct work_struct *work)
> +{
> +       struct delayed_work *dwork;
> +       struct kfd_process *p;
> +       struct kfd_process_device *pdd;
> +       int ret = 0;
> +
> +       dwork = to_delayed_work(work);
> +
> +       /* Process termination destroys this worker thread. So during the
> +        * lifetime of this thread, kfd_process p will be valid
> +        */
> +       p = container_of(dwork, struct kfd_process, restore_work);
> +
> +       /* Call restore_process_bos on the first KGD device. This function
> +        * takes care of restoring the whole process including other devices.
> +        * Restore can fail if enough memory is not available. If so,
> +        * reschedule again.
> +        */
> +       pdd = list_first_entry(&p->per_device_data,
> +                              struct kfd_process_device,
> +                              per_device_list);
> +
> +       pr_debug("Started restoring pasid %d\n", p->pasid);
> +
> +       /* Setting last_restore_timestamp before successful restoration.
> +        * Otherwise this would have to be set by KGD (restore_process_bos)
> +        * before KFD BOs are unreserved. If not, the process can be evicted
> +        * again before the timestamp is set.
> +        * If restore fails, the timestamp will be set again in the next
> +        * attempt. This would mean that the minimum GPU quanta would be
> +        * PROCESS_ACTIVE_TIME_MS - (time to execute the following two
> +        * functions)
> +        */
> +
> +       p->last_restore_timestamp = get_jiffies_64();
> +       ret = pdd->dev->kfd2kgd->restore_process_bos(p->kgd_process_info,
> +                                                    &p->ef);
> +       if (ret) {
> +               pr_debug("Failed to restore BOs of pasid %d, retry after %d ms\n",
> +                        p->pasid, PROCESS_BACK_OFF_TIME_MS);
> +               ret = schedule_delayed_work(&p->restore_work,
> +                               msecs_to_jiffies(PROCESS_BACK_OFF_TIME_MS));
> +               WARN(!ret, "reschedule restore work failed\n");
> +               return;
> +       }
> +
> +       ret = process_restore_queues(p);
> +       if (!ret)
> +               pr_debug("Finished restoring pasid %d\n", p->pasid);
> +       else
> +               pr_err("Failed to restore queues of pasid %d\n", p->pasid);
> +}
> +
> +void kfd_suspend_all_processes(void)
> +{
> +       struct kfd_process *p;
> +       unsigned int temp;
> +       int idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +       hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +               cancel_delayed_work_sync(&p->eviction_work);
> +               cancel_delayed_work_sync(&p->restore_work);
> +
> +               if (process_evict_queues(p))
> +                       pr_err("Failed to suspend process %d\n", p->pasid);
> +               dma_fence_signal(p->ef);
> +               dma_fence_put(p->ef);
> +               p->ef = NULL;
> +       }
> +       srcu_read_unlock(&kfd_processes_srcu, idx);
> +}
> +
> +int kfd_resume_all_processes(void)
> +{
> +       struct kfd_process *p;
> +       unsigned int temp;
> +       int ret = 0, idx = srcu_read_lock(&kfd_processes_srcu);
> +
> +       hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
> +               if (!schedule_delayed_work(&p->restore_work, 0)) {
> +                       pr_err("Restore process %d failed during resume\n",
> +                              p->pasid);
> +                       ret = -EFAULT;
> +               }
> +       }
> +       srcu_read_unlock(&kfd_processes_srcu, idx);
> +       return ret;
> +}
> +
>  int kfd_reserved_mem_mmap(struct kfd_process *process,
>                           struct vm_area_struct *vma)
>  {
> --
> 2.7.4
>

This patch is:
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 17/25] uapi: Fix type used in ioctl parameter structures
       [not found]     ` <1517967174-21709-18-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12  9:41       ` Oded Gabbay
  0 siblings, 0 replies; 71+ messages in thread
From: Oded Gabbay @ 2018-02-12  9:41 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> Use __u32 and __u64 instead of POSIX types that may not be defined
> in user mode builds.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  include/uapi/linux/kfd_ioctl.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
> index f4cab5b..111d73b 100644
> --- a/include/uapi/linux/kfd_ioctl.h
> +++ b/include/uapi/linux/kfd_ioctl.h
> @@ -263,10 +263,10 @@ struct kfd_ioctl_get_tile_config_args {
>  };
>
>  struct kfd_ioctl_set_trap_handler_args {
> -       uint64_t tba_addr;              /* to KFD */
> -       uint64_t tma_addr;              /* to KFD */
> -       uint32_t gpu_id;                /* to KFD */
> -       uint32_t pad;
> +       __u64 tba_addr;         /* to KFD */
> +       __u64 tma_addr;         /* to KFD */
> +       __u32 gpu_id;           /* to KFD */
> +       __u32 pad;
>  };
>
>  #define AMDKFD_IOCTL_BASE 'K'
> --
> 2.7.4
>
This patch is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                 ` <dd476550-9a48-3adc-30e6-8a94bd04833b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-02-12 16:57                   ` Felix Kuehling
       [not found]                     ` <d11c598a-b51f-b957-7dae-485025a1ad34-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-12 16:57 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-11 04:55 AM, Christian König wrote:
> Am 09.02.2018 um 21:31 schrieb Felix Kuehling:
>> On 2018-02-09 07:34 AM, Christian König wrote:
>>> I really wonder if sharing the GPUVM instance from a render node file
>>> descriptor wouldn't be easier.
>>>
>>> You could just use the existing IOCTL for allocating and mapping
>>> memory on a render node and then give the prepared fd to kfd to use.
>> In amd-kfd-staging we have an ioctl to import a DMABuf into the KFD VM
>> for interoperability between graphics and compute. I'm planning to
>> upstream that later.
>
> Could you move that around and upstream it first?

Sure. I assume with "first" you mean before userptr and Vega10 support.
I was going to leave it for later because graphics interoperability not
an essential feature. Userptr is more important to get existing ROCm
user mode to work properly.

>
>> It still depends on the KFD calls for managing the
>> GPU mappings. The KFD GPU mapping calls need to interact with the
>> eviction logic. TLB flushing involves the HIQ/KIQ for figuring out the
>> VMID/PASID mapping, which is managed asynchronously by the HWS, not
>> amdgpu_ids.c. User pointers need a different MMU notifier. AQL queue
>> buffers on some GPUs need a double-mapping workaround for a HW
>> wraparound bug. I could go on. So this is not easy to retrofit into
>> amdgpu_gem and amdgpu_cs. Also, KFD VMs are created differently
>> (AMDGPU_VM_CONTEXT_COMPUTE, amdkfd_vm wrapper structure).
>
> Well, that is actually the reason why I'm asking about it.
>
> First of all kernel development is not use case driver, e.g. we should
> not implement something because userspace needs it in a specific way,
> but rather because the hardware supports it in a specific way.
>
> This means that in theory we should have the fixes for the HW problems
> in both interfaces. That doesn't make sense at the moment because we
> don't support user space queue through the render node file
> descriptor, but people are starting to ask for that as well.
>> What's more, buffers allocated through amdgpu_gem calls create GEM
>> objects that we don't need.
>
> I see that as an advantage rather than a problem, cause it fixes a
> couple of problems with the KFD where the address space of the inode
> is not managed correctly as far as I can see.

I don't think GEM is involved in the management of address space. That's
handled inside TTM and drm_vma_manager, both of which we are using. We
have tested CPU mappings with evictions and this is working correctly now.

We had problems with this before, when we were CPU-mapping our buffers
through the KFD device FD.

>
> That implementation issues never caused problems right now because you
> never tried to unmap doorbells. But with the new eviction code that is
> about to change, isn't it?

I don't understand this comment. What does this have to do with doorbells?

Doorbells are never unmapped. When queues are evicted doorbells remain
mapped to the process. User mode can continue writing to doorbells,
though they won't have any immediate effect while the corresponding
queues are unmapped from the HQDs.

>
>> And exporting and importing DMABufs adds
>> more overhead and a potential to run into the process file-descriptor
>> limit (maybe the FD could be discarded after importing).
>
> Closing the file descriptor is a must have after importing, so that
> isn't an issue.
>
> But I agree that exporting and reimporting the file descriptor adds
> some additional overhead.
>
>> I honestly thought about whether this would be feasible when we
>> implemented the CPU mapping through the DRM FD. But it would be nothing
>> short of a complete redesign of the KFD memory management code. It would
>> be months of work, more instability, for questionable benefits. I don't
>> think it would be in the interest of end users and customers.
>
> Yeah, agree on that.
>
>> I just thought of a slightly different approach I would consider more
>> realistic, without having thought through all the details: Adding
>> KFD-specific memory management ioctls to the amdgpu device. Basically
>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead
>> of KFD ioctl functions. But we'd still have KFD ioctls for other things,
>> and the new amdgpu ioctls would be KFD-specific and not useful for
>> graphics applications. It's probably still several weeks of work, but
>> shouldn't have major teething issues because the internal logic and
>> functionality would be basically unchanged. It would just move the
>> ioctls from one device to another.
>
> My thinking went into a similar direction. But instead of exposing the
> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>
> And then use the DRM FD in the KFD for things like the buffer provider
> of a device, e.g. no separate IDR for BOs in the KFD but rather a
> reference to the DRM FD.

I'm not sure I understand this. With "DRM FD" I think you mean the
device file descriptor. Then have KFD call file operation on the FD to
call AMDGPU ioctls? Is there any precedent for this that I can look at
for an example?

Or do you mean a DMABuf FD exported by GEM? After importing the buffer
and closing the FD, we still need a way to refer to the buffer for
managing the GPU mappings and for releasing the reference. So we still
need handles that are currently provided by our own IDR. I think we'll
still need that.

Finally, the kfd_ioctl_alloc_memory_of_gpu will later also be used for
creating userptr BOs. Those can't be exported as DMABufs, so we still
need our own ioctl for creating that type of BO. That was going to be my
next patch series.

Regards,
  Felix

>
> We can still manage the VM through KFD IOCTL, but the BOs and the VM
> are actually provided by the DRM FD.
>
> Regards,
> Christian.
>
>> Regards,
>>    Felix
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 07.02.2018 um 02:32 schrieb Felix Kuehling:
>>>> From: Oak Zeng <Oak.Zeng@amd.com>
>>>>
>>>> Populate DRM render device minor in kfd topology
>>>>
>>>> Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
>>>>    drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
>>>>    2 files changed, 5 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> index 2506155..ac28abc 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
>>>> @@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj,
>>>> struct attribute *attr,
>>>>                dev->node_props.device_id);
>>>>        sysfs_show_32bit_prop(buffer, "location_id",
>>>>                dev->node_props.location_id);
>>>> +    sysfs_show_32bit_prop(buffer, "drm_render_minor",
>>>> +            dev->node_props.drm_render_minor);
>>>>          if (dev->gpu) {
>>>>            log_max_watch_addr =
>>>> @@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
>>>>           
>>>> dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
>>>>        dev->node_props.max_engine_clk_ccompute =
>>>>            cpufreq_quick_get_max(0) / 1000;
>>>> +    dev->node_props.drm_render_minor =
>>>> +        gpu->shared_resources.drm_render_minor;
>>>>          kfd_fill_mem_clk_max_info(dev);
>>>>        kfd_fill_iolink_non_crat_info(dev);
>>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> index c0be2be..eb54cfc 100644
>>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
>>>> @@ -71,6 +71,7 @@ struct kfd_node_properties {
>>>>        uint32_t location_id;
>>>>        uint32_t max_engine_clk_fcompute;
>>>>        uint32_t max_engine_clk_ccompute;
>>>> +    int32_t  drm_render_minor;
>>>>        uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
>>>>    };
>>>>    
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
  2018-02-11 12:42       ` Oded Gabbay
@ 2018-02-12 19:19         ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-12 19:19 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Harish Kasiviswanathan, amd-gfx list

On 2018-02-11 07:42 AM, Oded Gabbay wrote:
> Hi Felix,
> Do you object to me changing amd_kfd_ to amdkfd_ in the various
> structures and functions ?
> So far, we don't have anything with prefix of amd_kfd_ so I would like
> to keep on consistency.

We use the prefix amdgpu_amdkfd_ throughout the KFD-related amdgpu code.
Not sure why we did something different here. I'm OK with changing it
for consistency.

Thanks,
  Felix

>
> Other then that, this patch is:
> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
>
>
> Oded

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD
       [not found]         ` <CAFCwf10ThSfo8zphxPRH549LoyJ1H+XM89rpwpSNeJeuWYayAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-02-12 19:20           ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-12 19:20 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: amd-gfx list

On 2018-02-12 03:42 AM, Oded Gabbay wrote:
> Hi Felix,
> I wrote some minor comments. If you don't object to them, I'll just
> add them to the patch to save you the trouble of re-sending.

Thanks for going through this. I agree with your comments.

Regards,
  Felix

>
>
> This patch is:
> Acked-by: Oded Gabbay <oded.gabbay@gmail.com>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                     ` <d11c598a-b51f-b957-7dae-485025a1ad34-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-12 23:23                       ` Felix Kuehling
       [not found]                         ` <ce14b4cd-2bb7-8f19-b464-ddf9f68f45ad-5C7GfCeVMHo@public.gmane.org>
  2018-02-13 10:46                       ` Christian König
  1 sibling, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-12 23:23 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng


On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>> I just thought of a slightly different approach I would consider more
>>> realistic, without having thought through all the details: Adding
>>> KFD-specific memory management ioctls to the amdgpu device. Basically
>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead
>>> of KFD ioctl functions. But we'd still have KFD ioctls for other things,
>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>> graphics applications. It's probably still several weeks of work, but
>>> shouldn't have major teething issues because the internal logic and
>>> functionality would be basically unchanged. It would just move the
>>> ioctls from one device to another.
>> My thinking went into a similar direction. But instead of exposing the
>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>
>> And then use the DRM FD in the KFD for things like the buffer provider
>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>> reference to the DRM FD.
> I'm not sure I understand this. With "DRM FD" I think you mean the
> device file descriptor. Then have KFD call file operation on the FD to
> call AMDGPU ioctls? Is there any precedent for this that I can look at
> for an example?

OK, I think this could work for finding a VM from a DRM file descriptor:

    struct file *filp = fget(fd);
    struct drm_file *drm_priv = filp->private_data;
    struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
    struct amdgpu_vm *vm = &drv_priv->vm;

That would let us use the DRM VM instead of creating our own and would
avoid wasting a perfectly good VM that gets created by opening the DRM
device. We'd need a new ioctl to import VMs into KFD. But that's about
as far as I would take it.

We'd still need to add KFD-specific data to the VM. If we use an
existing VM, we can't wrap it in our own structure any more. We'd need
to come up with a different solution or extend struct amdgpu_vm with
what we need.

And we still need our own BO creation ioctl. Importing DMABufs adds
extra overhead (more system calls, file descriptors, GEM objects) and
doesn't cover some of our use cases (userptr). We also need our own idr
for managing buffer handles that are used with all our memory and VM
management functions.

Regards,
  Felix


>
> Or do you mean a DMABuf FD exported by GEM? After importing the buffer
> and closing the FD, we still need a way to refer to the buffer for
> managing the GPU mappings and for releasing the reference. So we still
> need handles that are currently provided by our own IDR. I think we'll
> still need that.
>
> Finally, the kfd_ioctl_alloc_memory_of_gpu will later also be used for
> creating userptr BOs. Those can't be exported as DMABufs, so we still
> need our own ioctl for creating that type of BO. That was going to be my
> next patch series.
>
> Regards,
>   Felix
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                         ` <ce14b4cd-2bb7-8f19-b464-ddf9f68f45ad-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 10:25                           ` Christian König
       [not found]                             ` <cbd18308-c464-125e-ef9f-180c12a9926a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-13 10:25 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 13.02.2018 um 00:23 schrieb Felix Kuehling:
> On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>>> I just thought of a slightly different approach I would consider more
>>>> realistic, without having thought through all the details: Adding
>>>> KFD-specific memory management ioctls to the amdgpu device. Basically
>>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions instead
>>>> of KFD ioctl functions. But we'd still have KFD ioctls for other things,
>>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>>> graphics applications. It's probably still several weeks of work, but
>>>> shouldn't have major teething issues because the internal logic and
>>>> functionality would be basically unchanged. It would just move the
>>>> ioctls from one device to another.
>>> My thinking went into a similar direction. But instead of exposing the
>>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>>
>>> And then use the DRM FD in the KFD for things like the buffer provider
>>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>>> reference to the DRM FD.
>> I'm not sure I understand this. With "DRM FD" I think you mean the
>> device file descriptor. Then have KFD call file operation on the FD to
>> call AMDGPU ioctls? Is there any precedent for this that I can look at
>> for an example?
> OK, I think this could work for finding a VM from a DRM file descriptor:
>
>      struct file *filp = fget(fd);
>      struct drm_file *drm_priv = filp->private_data;
>      struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
>      struct amdgpu_vm *vm = &drv_priv->vm;
>
> That would let us use the DRM VM instead of creating our own and would
> avoid wasting a perfectly good VM that gets created by opening the DRM
> device. We'd need a new ioctl to import VMs into KFD. But that's about
> as far as I would take it.
>
> We'd still need to add KFD-specific data to the VM. If we use an
> existing VM, we can't wrap it in our own structure any more. We'd need
> to come up with a different solution or extend struct amdgpu_vm with
> what we need.

Well feel free to extend the VM structure with a pointer for KFD data.

> And we still need our own BO creation ioctl. Importing DMABufs adds
> extra overhead (more system calls, file descriptors, GEM objects) and
> doesn't cover some of our use cases (userptr). We also need our own idr
> for managing buffer handles that are used with all our memory and VM
> management functions.

Yeah, well that is the second part of that idea.

When you have the drm_file for the VM, you can also use it to call 
drm_gem_object_lookup().

And when you need to iterate over all the BOs you can just use the 
object_idr or the VM structure as well.

Regards,
Christian.

>
> Regards,
>    Felix
>
>
>> Or do you mean a DMABuf FD exported by GEM? After importing the buffer
>> and closing the FD, we still need a way to refer to the buffer for
>> managing the GPU mappings and for releasing the reference. So we still
>> need handles that are currently provided by our own IDR. I think we'll
>> still need that.
>>
>> Finally, the kfd_ioctl_alloc_memory_of_gpu will later also be used for
>> creating userptr BOs. Those can't be exported as DMABufs, so we still
>> need our own ioctl for creating that type of BO. That was going to be my
>> next patch series.
>>
>> Regards,
>>    Felix
>>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                     ` <d11c598a-b51f-b957-7dae-485025a1ad34-5C7GfCeVMHo@public.gmane.org>
  2018-02-12 23:23                       ` Felix Kuehling
@ 2018-02-13 10:46                       ` Christian König
       [not found]                         ` <d5499f91-6ebf-94ae-f933-d57cd953e01d-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-13 10:46 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 12.02.2018 um 17:57 schrieb Felix Kuehling:
>> [SNIP]
>> I see that as an advantage rather than a problem, cause it fixes a
>> couple of problems with the KFD where the address space of the inode
>> is not managed correctly as far as I can see.
> I don't think GEM is involved in the management of address space. That's
> handled inside TTM and drm_vma_manager, both of which we are using. We
> have tested CPU mappings with evictions and this is working correctly now.

Ok, correct. I was thinking about that as one large functionality. When 
you see it like this GEM basically only implements looking up BOs and 
reference counting them, but both are still useful features.

> We had problems with this before, when we were CPU-mapping our buffers
> through the KFD device FD.
>
>> That implementation issues never caused problems right now because you
>> never tried to unmap doorbells. But with the new eviction code that is
>> about to change, isn't it?
> I don't understand this comment. What does this have to do with doorbells?

I was assuming that doorbells are now unmapped to figure out when to 
start a process again.

See what I'm missing is how does userspace figures out which address to 
use to map the doorbell? E.g. I don't see a call to 
drm_vma_offset_manager_init() or something like that.

> Doorbells are never unmapped. When queues are evicted doorbells remain
> mapped to the process. User mode can continue writing to doorbells,
> though they won't have any immediate effect while the corresponding
> queues are unmapped from the HQDs.

Do you simply assume that after evicting a process it always needs to be 
restarted without checking if it actually does something? Or how does 
that work?

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                             ` <cbd18308-c464-125e-ef9f-180c12a9926a-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 16:42                               ` Felix Kuehling
       [not found]                                 ` <a2e39184-8db8-407d-6608-6ae211563459-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 16:42 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng


On 2018-02-13 05:25 AM, Christian König wrote:
> Am 13.02.2018 um 00:23 schrieb Felix Kuehling:
>> On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>>>> I just thought of a slightly different approach I would consider more
>>>>> realistic, without having thought through all the details: Adding
>>>>> KFD-specific memory management ioctls to the amdgpu device. Basically
>>>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions
>>>>> instead
>>>>> of KFD ioctl functions. But we'd still have KFD ioctls for other
>>>>> things,
>>>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>>>> graphics applications. It's probably still several weeks of work, but
>>>>> shouldn't have major teething issues because the internal logic and
>>>>> functionality would be basically unchanged. It would just move the
>>>>> ioctls from one device to another.
>>>> My thinking went into a similar direction. But instead of exposing the
>>>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>>>
>>>> And then use the DRM FD in the KFD for things like the buffer provider
>>>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>>>> reference to the DRM FD.
>>> I'm not sure I understand this. With "DRM FD" I think you mean the
>>> device file descriptor. Then have KFD call file operation on the FD to
>>> call AMDGPU ioctls? Is there any precedent for this that I can look at
>>> for an example?
>> OK, I think this could work for finding a VM from a DRM file descriptor:
>>
>>      struct file *filp = fget(fd);
>>      struct drm_file *drm_priv = filp->private_data;
>>      struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
>>      struct amdgpu_vm *vm = &drv_priv->vm;
>>
>> That would let us use the DRM VM instead of creating our own and would
>> avoid wasting a perfectly good VM that gets created by opening the DRM
>> device. We'd need a new ioctl to import VMs into KFD. But that's about
>> as far as I would take it.
>>
>> We'd still need to add KFD-specific data to the VM. If we use an
>> existing VM, we can't wrap it in our own structure any more. We'd need
>> to come up with a different solution or extend struct amdgpu_vm with
>> what we need.
>
> Well feel free to extend the VM structure with a pointer for KFD data.

Some more thoughts about that: Currently the lifetime of the VM is tied
to the file descriptor. If we import it into KFD, we either have to make
struct amdgpu_vm reference counted, or we have to keep a reference to
the file descriptor in KFD just to keep the VM alive until we drop our
reference to it.

>
>> And we still need our own BO creation ioctl. Importing DMABufs adds
>> extra overhead (more system calls, file descriptors, GEM objects) and
>> doesn't cover some of our use cases (userptr). We also need our own idr
>> for managing buffer handles that are used with all our memory and VM
>> management functions.
>
> Yeah, well that is the second part of that idea.
>
> When you have the drm_file for the VM, you can also use it to call
> drm_gem_object_lookup().

KFD only has one device, so we need a buffer ID that's globally unique,
not per-device. Or we'd need to identify buffers by a pair of a GEM
handle and a device ID. Our BO handles are 64-bit, so we could pack both
into a single handle.

Either way, that still requires a GEM object, which adds not just a
minor bit of overhead. It includes a struct file * which points to a
shared memory file that gets created for each BO in drm_gem_object_init.

>
> And when you need to iterate over all the BOs you can just use the
> object_idr or the VM structure as well.

object_idr only gives us BOs allocated from a particular device. The VM
structure has various BO lists, but they only contain BOs that were
added to the VM. The same BO can be added multiple times or to multiple
VMs, or it may not be in any VM at all.

In later patch series we also need to track additional information for
KFD buffers in KFD itself. At that point we change the KFD IDR to point
to our own structure, which in turn points to the BO. For example our
BOs get their VA assigned at allocation time, and we later do reverse
lookups from the address to the BO using an interval tree.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> Or do you mean a DMABuf FD exported by GEM? After importing the buffer
>>> and closing the FD, we still need a way to refer to the buffer for
>>> managing the GPU mappings and for releasing the reference. So we still
>>> need handles that are currently provided by our own IDR. I think we'll
>>> still need that.
>>>
>>> Finally, the kfd_ioctl_alloc_memory_of_gpu will later also be used for
>>> creating userptr BOs. Those can't be exported as DMABufs, so we still
>>> need our own ioctl for creating that type of BO. That was going to
>>> be my
>>> next patch series.
>>>
>>> Regards,
>>>    Felix
>>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                         ` <d5499f91-6ebf-94ae-f933-d57cd953e01d-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 16:56                           ` Felix Kuehling
       [not found]                             ` <a776f882-2612-35a1-431b-2e939cd36f29-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 16:56 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng


On 2018-02-13 05:46 AM, Christian König wrote:
> Am 12.02.2018 um 17:57 schrieb Felix Kuehling:
>>> [SNIP]
>>> I see that as an advantage rather than a problem, cause it fixes a
>>> couple of problems with the KFD where the address space of the inode
>>> is not managed correctly as far as I can see.
>> I don't think GEM is involved in the management of address space. That's
>> handled inside TTM and drm_vma_manager, both of which we are using. We
>> have tested CPU mappings with evictions and this is working correctly
>> now.
>
> Ok, correct. I was thinking about that as one large functionality.
> When you see it like this GEM basically only implements looking up BOs
> and reference counting them, but both are still useful features.
>
>> We had problems with this before, when we were CPU-mapping our buffers
>> through the KFD device FD.
>>
>>> That implementation issues never caused problems right now because you
>>> never tried to unmap doorbells. But with the new eviction code that is
>>> about to change, isn't it?
>> I don't understand this comment. What does this have to do with
>> doorbells?
>
> I was assuming that doorbells are now unmapped to figure out when to
> start a process again.

Restart is done based on a timer.

>
> See what I'm missing is how does userspace figures out which address
> to use to map the doorbell? E.g. I don't see a call to
> drm_vma_offset_manager_init() or something like that.

Each process gets a whole page of the doorbell aperture assigned to it.
The assumption is that amdgpu only uses the first page of the doorbell
aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
used as the offset into the doorbell page. On GFX9 the hardware does
some engine-specific doorbell routing, so we added another layer of
doorbell management that's decoupled from the queue ID.

Either way, an entire doorbell page gets mapped into user mode and user
mode knows the offset of the doorbells for specific queues. The mapping
is currently handled by kfd_mmap in kfd_chardev.c. A later patch will
add the ability to map doorbells into GPUVM address space so that GPUs
can dispatch work to themselves (or each other) by creating a special
doorbell BO that can be mapped both to the GPU and the CPU. It works by
creating an amdgpu_bo with an SG describing the doorbell page of the
process.

>
>> Doorbells are never unmapped. When queues are evicted doorbells remain
>> mapped to the process. User mode can continue writing to doorbells,
>> though they won't have any immediate effect while the corresponding
>> queues are unmapped from the HQDs.
>
> Do you simply assume that after evicting a process it always needs to
> be restarted without checking if it actually does something? Or how
> does that work?

Exactly. With later addition of GPU self-dispatch a page-fault based
mechanism wouldn't work any more. We have to restart the queues blindly
with a timer. See evict_process_worker, which schedules the restore with
a delayed worker.

The user mode queue ABI specifies that user mode update both the
doorbell and a WPTR in memory. When we restart queues we (or the CP
firmware) use the WPTR to make sure we catch up with any work that was
submitted while the queues were unmapped.

Regards,
  Felix

>
> Regards,
> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                 ` <a2e39184-8db8-407d-6608-6ae211563459-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 17:06                                   ` Christian König
       [not found]                                     ` <8424282f-d196-3cb6-9a6e-a26f8be7d198-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-13 17:06 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 13.02.2018 um 17:42 schrieb Felix Kuehling:
> On 2018-02-13 05:25 AM, Christian König wrote:
>> Am 13.02.2018 um 00:23 schrieb Felix Kuehling:
>>> On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>>>>> I just thought of a slightly different approach I would consider more
>>>>>> realistic, without having thought through all the details: Adding
>>>>>> KFD-specific memory management ioctls to the amdgpu device. Basically
>>>>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions
>>>>>> instead
>>>>>> of KFD ioctl functions. But we'd still have KFD ioctls for other
>>>>>> things,
>>>>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>>>>> graphics applications. It's probably still several weeks of work, but
>>>>>> shouldn't have major teething issues because the internal logic and
>>>>>> functionality would be basically unchanged. It would just move the
>>>>>> ioctls from one device to another.
>>>>> My thinking went into a similar direction. But instead of exposing the
>>>>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>>>>
>>>>> And then use the DRM FD in the KFD for things like the buffer provider
>>>>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>>>>> reference to the DRM FD.
>>>> I'm not sure I understand this. With "DRM FD" I think you mean the
>>>> device file descriptor. Then have KFD call file operation on the FD to
>>>> call AMDGPU ioctls? Is there any precedent for this that I can look at
>>>> for an example?
>>> OK, I think this could work for finding a VM from a DRM file descriptor:
>>>
>>>       struct file *filp = fget(fd);
>>>       struct drm_file *drm_priv = filp->private_data;
>>>       struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
>>>       struct amdgpu_vm *vm = &drv_priv->vm;
>>>
>>> That would let us use the DRM VM instead of creating our own and would
>>> avoid wasting a perfectly good VM that gets created by opening the DRM
>>> device. We'd need a new ioctl to import VMs into KFD. But that's about
>>> as far as I would take it.
>>>
>>> We'd still need to add KFD-specific data to the VM. If we use an
>>> existing VM, we can't wrap it in our own structure any more. We'd need
>>> to come up with a different solution or extend struct amdgpu_vm with
>>> what we need.
>> Well feel free to extend the VM structure with a pointer for KFD data.
> Some more thoughts about that: Currently the lifetime of the VM is tied
> to the file descriptor. If we import it into KFD, we either have to make
> struct amdgpu_vm reference counted, or we have to keep a reference to
> the file descriptor in KFD just to keep the VM alive until we drop our
> reference to it.
>
>>> And we still need our own BO creation ioctl. Importing DMABufs adds
>>> extra overhead (more system calls, file descriptors, GEM objects) and
>>> doesn't cover some of our use cases (userptr). We also need our own idr
>>> for managing buffer handles that are used with all our memory and VM
>>> management functions.
>> Yeah, well that is the second part of that idea.
>>
>> When you have the drm_file for the VM, you can also use it to call
>> drm_gem_object_lookup().
> KFD only has one device, so we need a buffer ID that's globally unique,
> not per-device. Or we'd need to identify buffers by a pair of a GEM
> handle and a device ID. Our BO handles are 64-bit, so we could pack both
> into a single handle.

Ah, yeah that is also a point I wanted to to talk about with you.

The approach of using the same buffer object with multiple amdgpu 
devices doesn't work in general.

We need separate TTM object for each BO in each device or otherwise we 
break A+A laptops.

That is also the reason we had to disable this feature again in the 
hybrid branches.

So we need BO handles per device here or some other solution.

> Either way, that still requires a GEM object, which adds not just a
> minor bit of overhead. It includes a struct file * which points to a
> shared memory file that gets created for each BO in drm_gem_object_init.

HUI WHAT? Why the heck do we still do this?

Sorry something went wrong here and yes that shmen file is completely 
superfluous even for the gfx side.

>> And when you need to iterate over all the BOs you can just use the
>> object_idr or the VM structure as well.
> object_idr only gives us BOs allocated from a particular device. The VM
> structure has various BO lists, but they only contain BOs that were
> added to the VM. The same BO can be added multiple times or to multiple
> VMs, or it may not be in any VM at all.

Ok, yeah that probably isn't sufficient for the BO handling like you 
need it for eviction.

> In later patch series we also need to track additional information for
> KFD buffers in KFD itself. At that point we change the KFD IDR to point
> to our own structure, which in turn points to the BO. For example our
> BOs get their VA assigned at allocation time, and we later do reverse
> lookups from the address to the BO using an interval tree.

You can do this with the standard VM structure as well, that is needed 
for UVD and VCE anyway. See amdgpu_vm_bo_lookup_mapping.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                     ` <8424282f-d196-3cb6-9a6e-a26f8be7d198-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 17:18                                       ` Felix Kuehling
       [not found]                                         ` <ebf4d6d7-2424-764f-0bc0-615240c82483-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 17:18 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-13 12:06 PM, Christian König wrote:
> Am 13.02.2018 um 17:42 schrieb Felix Kuehling:
>> On 2018-02-13 05:25 AM, Christian König wrote:
>>> Am 13.02.2018 um 00:23 schrieb Felix Kuehling:
>>>> On 2018-02-12 11:57 AM, Felix Kuehling wrote:
>>>>>>> I just thought of a slightly different approach I would consider
>>>>>>> more
>>>>>>> realistic, without having thought through all the details: Adding
>>>>>>> KFD-specific memory management ioctls to the amdgpu device.
>>>>>>> Basically
>>>>>>> call amdgpu_amdkfd_gpuvm functions from amdgpu ioctl functions
>>>>>>> instead
>>>>>>> of KFD ioctl functions. But we'd still have KFD ioctls for other
>>>>>>> things,
>>>>>>> and the new amdgpu ioctls would be KFD-specific and not useful for
>>>>>>> graphics applications. It's probably still several weeks of
>>>>>>> work, but
>>>>>>> shouldn't have major teething issues because the internal logic and
>>>>>>> functionality would be basically unchanged. It would just move the
>>>>>>> ioctls from one device to another.
>>>>>> My thinking went into a similar direction. But instead of
>>>>>> exposing the
>>>>>> KFD IOCTLs through the DRM FD, I would let the KFD import a DRM FD.
>>>>>>
>>>>>> And then use the DRM FD in the KFD for things like the buffer
>>>>>> provider
>>>>>> of a device, e.g. no separate IDR for BOs in the KFD but rather a
>>>>>> reference to the DRM FD.
>>>>> I'm not sure I understand this. With "DRM FD" I think you mean the
>>>>> device file descriptor. Then have KFD call file operation on the
>>>>> FD to
>>>>> call AMDGPU ioctls? Is there any precedent for this that I can
>>>>> look at
>>>>> for an example?
>>>> OK, I think this could work for finding a VM from a DRM file
>>>> descriptor:
>>>>
>>>>       struct file *filp = fget(fd);
>>>>       struct drm_file *drm_priv = filp->private_data;
>>>>       struct amdgpu_fpriv *drv_priv = file_priv->driver_priv;
>>>>       struct amdgpu_vm *vm = &drv_priv->vm;
>>>>
>>>> That would let us use the DRM VM instead of creating our own and would
>>>> avoid wasting a perfectly good VM that gets created by opening the DRM
>>>> device. We'd need a new ioctl to import VMs into KFD. But that's about
>>>> as far as I would take it.
>>>>
>>>> We'd still need to add KFD-specific data to the VM. If we use an
>>>> existing VM, we can't wrap it in our own structure any more. We'd need
>>>> to come up with a different solution or extend struct amdgpu_vm with
>>>> what we need.
>>> Well feel free to extend the VM structure with a pointer for KFD data.
>> Some more thoughts about that: Currently the lifetime of the VM is tied
>> to the file descriptor. If we import it into KFD, we either have to make
>> struct amdgpu_vm reference counted, or we have to keep a reference to
>> the file descriptor in KFD just to keep the VM alive until we drop our
>> reference to it.
>>
>>>> And we still need our own BO creation ioctl. Importing DMABufs adds
>>>> extra overhead (more system calls, file descriptors, GEM objects) and
>>>> doesn't cover some of our use cases (userptr). We also need our own
>>>> idr
>>>> for managing buffer handles that are used with all our memory and VM
>>>> management functions.
>>> Yeah, well that is the second part of that idea.
>>>
>>> When you have the drm_file for the VM, you can also use it to call
>>> drm_gem_object_lookup().
>> KFD only has one device, so we need a buffer ID that's globally unique,
>> not per-device. Or we'd need to identify buffers by a pair of a GEM
>> handle and a device ID. Our BO handles are 64-bit, so we could pack both
>> into a single handle.
>
> Ah, yeah that is also a point I wanted to to talk about with you.
>
> The approach of using the same buffer object with multiple amdgpu
> devices doesn't work in general.
>
> We need separate TTM object for each BO in each device or otherwise we
> break A+A laptops.

I think it broke for VRAM BOs because we enabled P2P on systems that
didn't support it properly. But at least system memory BOs can be shared
quite well between devices and we do it all the time. I don't see how
you can have separate TTM objects referring to the same memory.

>
> That is also the reason we had to disable this feature again in the
> hybrid branches.

What you disabled on hybrid branches was P2P, which only affects
large-BAR systems. Sharing of system memory BOs between devices still
works fine.

>
> So we need BO handles per device here or some other solution.
>
>> Either way, that still requires a GEM object, which adds not just a
>> minor bit of overhead. It includes a struct file * which points to a
>> shared memory file that gets created for each BO in drm_gem_object_init.
>
> HUI WHAT? Why the heck do we still do this?
>
> Sorry something went wrong here and yes that shmen file is completely
> superfluous even for the gfx side.
>
>>> And when you need to iterate over all the BOs you can just use the
>>> object_idr or the VM structure as well.
>> object_idr only gives us BOs allocated from a particular device. The VM
>> structure has various BO lists, but they only contain BOs that were
>> added to the VM. The same BO can be added multiple times or to multiple
>> VMs, or it may not be in any VM at all.
>
> Ok, yeah that probably isn't sufficient for the BO handling like you
> need it for eviction.
>
>> In later patch series we also need to track additional information for
>> KFD buffers in KFD itself. At that point we change the KFD IDR to point
>> to our own structure, which in turn points to the BO. For example our
>> BOs get their VA assigned at allocation time, and we later do reverse
>> lookups from the address to the BO using an interval tree.
>
> You can do this with the standard VM structure as well, that is needed
> for UVD and VCE anyway. See amdgpu_vm_bo_lookup_mapping.

Then we need to know which VM to search. If all we have is a pointer,
we'd have to try potentially all VMs.

Regards,
  Felix

>
> Regards,
> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                         ` <ebf4d6d7-2424-764f-0bc0-615240c82483-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 18:15                                           ` Christian König
       [not found]                                             ` <9f078a60-0cca-ba43-3e1c-c67c2b758988-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-13 18:15 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
> On 2018-02-13 12:06 PM, Christian König wrote:
>> [SNIP]
>> Ah, yeah that is also a point I wanted to to talk about with you.
>>
>> The approach of using the same buffer object with multiple amdgpu
>> devices doesn't work in general.
>>
>> We need separate TTM object for each BO in each device or otherwise we
>> break A+A laptops.
> I think it broke for VRAM BOs because we enabled P2P on systems that
> didn't support it properly. But at least system memory BOs can be shared
> quite well between devices and we do it all the time.

Sharing VRAM BOs is one issue, but the problem goes deeper than just that.

Starting with Carizzo we can scanout from system memory to avoid the 
extra copy on A+A laptops. For this to work we need the BO mapped to 
GART (and I mean a real VMID0 mapping, not just in the GTT domain). And 
for this to work in turn we need a TTM object per device and not a 
global one.

> I don't see how you can have separate TTM objects referring to the same memory.

Well that is trivial, we do this all the time with prime and I+A laptops.

>> That is also the reason we had to disable this feature again in the
>> hybrid branches.
> What you disabled on hybrid branches was P2P, which only affects
> large-BAR systems. Sharing of system memory BOs between devices still
> works fine.

No, it doesn't. It completely breaks any scanout on Carizzo, Stoney and 
Raven. Additional to that we found that it breaks some aspects of the 
user space interface.

So end result is that we probably need to revert it and find a different 
solution. I'm already working on this for a couple of weeks now and 
should have something ready after I'm done with the PASID handling.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                             ` <a776f882-2612-35a1-431b-2e939cd36f29-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 18:45                               ` Christian König
       [not found]                                 ` <6c9d2b9e-7ae9-099a-9d02-bc2a4985a95a-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-13 18:45 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
> [SNIP]
> Each process gets a whole page of the doorbell aperture assigned to it.
> The assumption is that amdgpu only uses the first page of the doorbell
> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
> used as the offset into the doorbell page. On GFX9 the hardware does
> some engine-specific doorbell routing, so we added another layer of
> doorbell management that's decoupled from the queue ID.
>
> Either way, an entire doorbell page gets mapped into user mode and user
> mode knows the offset of the doorbells for specific queues. The mapping
> is currently handled by kfd_mmap in kfd_chardev.c.

Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost looks 
like you map different doorbells with the same offset depending on which 
process is calling this.

Is that correct? If yes then that would be illegal and a problem if I'm 
not completely mistaken.

>> Do you simply assume that after evicting a process it always needs to
>> be restarted without checking if it actually does something? Or how
>> does that work?
> Exactly.

Ok, understood. Well that limits the usefulness of the whole eviction 
drastically.

> With later addition of GPU self-dispatch a page-fault based
> mechanism wouldn't work any more. We have to restart the queues blindly
> with a timer. See evict_process_worker, which schedules the restore with
> a delayed worker.
> which was send either by the GPU o
> The user mode queue ABI specifies that user mode update both the
> doorbell and a WPTR in memory. When we restart queues we (or the CP
> firmware) use the WPTR to make sure we catch up with any work that was
> submitted while the queues were unmapped.

Putting cross process work dispatch aside for a moment GPU self-dispatch 
works only when there is work on the GPU running.

So you can still check if there are some work pending after you unmapped 
everything and only restart the queues when there is new work based on 
the page fault.

In other words either there is work pending and it doesn't matter if it 
was send by the GPU or by the CPU or there is no work pending and we can 
delay restarting everything until there is.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> Regards,
>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                             ` <9f078a60-0cca-ba43-3e1c-c67c2b758988-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 19:18                                               ` Felix Kuehling
       [not found]                                                 ` <c991529e-2489-169c-cc34-96ed5bb94a12-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 19:18 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-13 01:15 PM, Christian König wrote:
> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>> On 2018-02-13 12:06 PM, Christian König wrote:
>>> [SNIP]
>>> Ah, yeah that is also a point I wanted to to talk about with you.
>>>
>>> The approach of using the same buffer object with multiple amdgpu
>>> devices doesn't work in general.
>>>
>>> We need separate TTM object for each BO in each device or otherwise we
>>> break A+A laptops.
>> I think it broke for VRAM BOs because we enabled P2P on systems that
>> didn't support it properly. But at least system memory BOs can be shared
>> quite well between devices and we do it all the time.
>
> Sharing VRAM BOs is one issue, but the problem goes deeper than just
> that.
>
> Starting with Carizzo we can scanout from system memory to avoid the
> extra copy on A+A laptops. For this to work we need the BO mapped to
> GART (and I mean a real VMID0 mapping, not just in the GTT domain).
> And for this to work in turn we need a TTM object per device and not a
> global one.

I still don't understand. I think what you're talking about applies only
to BOs used for scan-out. Every BO is allocated from a specific device
and can only be GART-mapped on that device. What we do is map the same
BO in VMs on other devices. It has no effect on GART mappings.

>
>> I don't see how you can have separate TTM objects referring to the
>> same memory.
>
> Well that is trivial, we do this all the time with prime and I+A laptops.

As I understand it, you use DMABuf to export/import buffers on multiple
devices. I believe all devices share a single amdgpu_bo, which contains
the ttm_buffer_object. The only way you can have a new TTM buffer object
per device is by using SG tables and pinning the BOs. But I think we
want to avoid pinning BOs.

What we do does not involve pinning of BOs, even when they're shared
between multiple devices' VMs.

>
>>> That is also the reason we had to disable this feature again in the
>>> hybrid branches.
>> What you disabled on hybrid branches was P2P, which only affects
>> large-BAR systems. Sharing of system memory BOs between devices still
>> works fine.
>
> No, it doesn't. It completely breaks any scanout on Carizzo, Stoney
> and Raven. Additional to that we found that it breaks some aspects of
> the user space interface.

Let me check that with my current patch series. The patches I submitted
here shouldn't include anything that breaks the use cases you describe.
But I'm quite sure it will support sharing BOs between multiple devices'
VMs.

Regards,
  Felix

>
> So end result is that we probably need to revert it and find a
> different solution. I'm already working on this for a couple of weeks
> now and should have something ready after I'm done with the PASID
> handling.
>
> Regards,
> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                 ` <6c9d2b9e-7ae9-099a-9d02-bc2a4985a95a-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 19:22                                   ` Felix Kuehling
       [not found]                                     ` <b6af300e-be33-b802-385a-20980d95545d-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 19:22 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-13 01:45 PM, Christian König wrote:
> Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
>> [SNIP]
>> Each process gets a whole page of the doorbell aperture assigned to it.
>> The assumption is that amdgpu only uses the first page of the doorbell
>> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
>> used as the offset into the doorbell page. On GFX9 the hardware does
>> some engine-specific doorbell routing, so we added another layer of
>> doorbell management that's decoupled from the queue ID.
>>
>> Either way, an entire doorbell page gets mapped into user mode and user
>> mode knows the offset of the doorbells for specific queues. The mapping
>> is currently handled by kfd_mmap in kfd_chardev.c.
>
> Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost
> looks like you map different doorbells with the same offset depending
> on which process is calling this.
>
> Is that correct? If yes then that would be illegal and a problem if
> I'm not completely mistaken.

Why is that a problem. Each process has its own file descriptor. The
mapping is done using io_remap_pfn_range in kfd_doorbell_mmap. This is
nothing new. It's been done like this forever even on Kaveri and Carrizo.

>
>>> Do you simply assume that after evicting a process it always needs to
>>> be restarted without checking if it actually does something? Or how
>>> does that work?
>> Exactly.
>
> Ok, understood. Well that limits the usefulness of the whole eviction
> drastically.
>
>> With later addition of GPU self-dispatch a page-fault based
>> mechanism wouldn't work any more. We have to restart the queues blindly
>> with a timer. See evict_process_worker, which schedules the restore with
>> a delayed worker.
>> which was send either by the GPU o
>> The user mode queue ABI specifies that user mode update both the
>> doorbell and a WPTR in memory. When we restart queues we (or the CP
>> firmware) use the WPTR to make sure we catch up with any work that was
>> submitted while the queues were unmapped.
>
> Putting cross process work dispatch aside for a moment GPU
> self-dispatch works only when there is work on the GPU running.
>
> So you can still check if there are some work pending after you
> unmapped everything and only restart the queues when there is new work
> based on the page fault.
>
> In other words either there is work pending and it doesn't matter if
> it was send by the GPU or by the CPU or there is no work pending and
> we can delay restarting everything until there is.

That sounds like a useful optimization.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>> Regards,
>>> Christian.
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                 ` <c991529e-2489-169c-cc34-96ed5bb94a12-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-13 23:17                                                   ` Felix Kuehling
       [not found]                                                     ` <b28b0e4c-f16b-0751-7957-45196c26da82-5C7GfCeVMHo@public.gmane.org>
  2018-02-14  8:50                                                   ` Michel Dänzer
  1 sibling, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-13 23:17 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-13 02:18 PM, Felix Kuehling wrote:
> On 2018-02-13 01:15 PM, Christian König wrote:
>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>> On 2018-02-13 12:06 PM, Christian König wrote:
>>>> [SNIP]
>>>> Ah, yeah that is also a point I wanted to to talk about with you.
>>>>
>>>> The approach of using the same buffer object with multiple amdgpu
>>>> devices doesn't work in general.
>>>>
>>>> We need separate TTM object for each BO in each device or otherwise we
>>>> break A+A laptops.
>>> I think it broke for VRAM BOs because we enabled P2P on systems that
>>> didn't support it properly. But at least system memory BOs can be shared
>>> quite well between devices and we do it all the time.
>> Sharing VRAM BOs is one issue, but the problem goes deeper than just
>> that.
>>
>> Starting with Carizzo we can scanout from system memory to avoid the
>> extra copy on A+A laptops. For this to work we need the BO mapped to
>> GART (and I mean a real VMID0 mapping, not just in the GTT domain).
>> And for this to work in turn we need a TTM object per device and not a
>> global one.
> I still don't understand. I think what you're talking about applies only
> to BOs used for scan-out. Every BO is allocated from a specific device
> and can only be GART-mapped on that device. What we do is map the same
> BO in VMs on other devices. It has no effect on GART mappings.
>
>>> I don't see how you can have separate TTM objects referring to the
>>> same memory.
>> Well that is trivial, we do this all the time with prime and I+A laptops.
> As I understand it, you use DMABuf to export/import buffers on multiple
> devices. I believe all devices share a single amdgpu_bo, which contains
> the ttm_buffer_object. The only way you can have a new TTM buffer object
> per device is by using SG tables and pinning the BOs. But I think we
> want to avoid pinning BOs.
>
> What we do does not involve pinning of BOs, even when they're shared
> between multiple devices' VMs.
>
>>>> That is also the reason we had to disable this feature again in the
>>>> hybrid branches.
>>> What you disabled on hybrid branches was P2P, which only affects
>>> large-BAR systems. Sharing of system memory BOs between devices still
>>> works fine.
>> No, it doesn't. It completely breaks any scanout on Carizzo, Stoney
>> and Raven. Additional to that we found that it breaks some aspects of
>> the user space interface.
> Let me check that with my current patch series. The patches I submitted
> here shouldn't include anything that breaks the use cases you describe.
> But I'm quite sure it will support sharing BOs between multiple devices'
> VMs.

Confirmed with the current patch series. I can allocate buffers on GPU 1
and map them into a VM on GPU 2.

BTW, this is the normal mode of operation for system memory BOs on
multi-GPU systems. Logically system memory BOs belong to the CPU node in
the KFD topology. So the Thunk API isn't told which GPU to allocate the
system memory BO from. It just picks the first one. Then we can map
those BOs on any GPUs we want. If we want to use them on GPU2, that's
where they get mapped.

So I just put two GPUs in a system and ran a test on GPU2. The system
memory buffers are allocated from the GPU1 device, but mapped into the
GPU2 VM. The tests work normally.

If this is enabled by any changes that break existing buffer sharing for
A+A or A+I systems, please point it out to me. I'm not aware that this
patch series does anything to that effect.

Regards,
  Felix

>
> Regards,
>   Felix
>
>> So end result is that we probably need to revert it and find a
>> different solution. I'm already working on this for a couple of weeks
>> now and should have something ready after I'm done with the PASID
>> handling.
>>
>> Regards,
>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                     ` <b28b0e4c-f16b-0751-7957-45196c26da82-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14  7:42                                                       ` Christian König
       [not found]                                                         ` <ca973468-f1af-b510-a6db-af29e279f5ca-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-14  7:42 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 14.02.2018 um 00:17 schrieb Felix Kuehling:
> On 2018-02-13 02:18 PM, Felix Kuehling wrote:
>> On 2018-02-13 01:15 PM, Christian König wrote:
>>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>>> On 2018-02-13 12:06 PM, Christian König wrote:
>>>>> [SNIP]
>>>>> Ah, yeah that is also a point I wanted to to talk about with you.
>>>>>
>>>>> The approach of using the same buffer object with multiple amdgpu
>>>>> devices doesn't work in general.
>>>>>
>>>>> We need separate TTM object for each BO in each device or otherwise we
>>>>> break A+A laptops.
>>>> I think it broke for VRAM BOs because we enabled P2P on systems that
>>>> didn't support it properly. But at least system memory BOs can be shared
>>>> quite well between devices and we do it all the time.
>>> Sharing VRAM BOs is one issue, but the problem goes deeper than just
>>> that.
>>>
>>> Starting with Carizzo we can scanout from system memory to avoid the
>>> extra copy on A+A laptops. For this to work we need the BO mapped to
>>> GART (and I mean a real VMID0 mapping, not just in the GTT domain).
>>> And for this to work in turn we need a TTM object per device and not a
>>> global one.
>> I still don't understand. I think what you're talking about applies only
>> to BOs used for scan-out. Every BO is allocated from a specific device
>> and can only be GART-mapped on that device.

Exactly that assumption is incorrect. BOs can be GART mapped into any 
device.

>>   What we do is map the same
>> BO in VMs on other devices. It has no effect on GART mappings.

Correct VM mapping is unaffected here.

>>>> I don't see how you can have separate TTM objects referring to the
>>>> same memory.
>>> Well that is trivial, we do this all the time with prime and I+A laptops.
>> As I understand it, you use DMABuf to export/import buffers on multiple
>> devices. I believe all devices share a single amdgpu_bo, which contains
>> the ttm_buffer_object.

That's incorrect as well. Normally multiple devices have multiple 
ttm_buffer_object, one for each device.

Going a bit higher that actually makes sense because the status of each 
BO is deferent for each device. E.g. one device could have the BO in 
access while it could be idle on another device.

Same is true for placement of the BO. E.g. a VRAM placement of one 
device is actually a GTT placement for another.

>>   The only way you can have a new TTM buffer object
>> per device is by using SG tables and pinning the BOs. But I think we
>> want to avoid pinning BOs.
>>
>> What we do does not involve pinning of BOs, even when they're shared
>> between multiple devices' VMs.
>>
>>>>> That is also the reason we had to disable this feature again in the
>>>>> hybrid branches.
>>>> What you disabled on hybrid branches was P2P, which only affects
>>>> large-BAR systems. Sharing of system memory BOs between devices still
>>>> works fine.
>>> No, it doesn't. It completely breaks any scanout on Carizzo, Stoney
>>> and Raven. Additional to that we found that it breaks some aspects of
>>> the user space interface.
>> Let me check that with my current patch series. The patches I submitted
>> here shouldn't include anything that breaks the use cases you describe.
>> But I'm quite sure it will support sharing BOs between multiple devices'
>> VMs.
> Confirmed with the current patch series. I can allocate buffers on GPU 1
> and map them into a VM on GPU 2.

As I said VM mapping is not the problem here.

> BTW, this is the normal mode of operation for system memory BOs on
> multi-GPU systems. Logically system memory BOs belong to the CPU node in
> the KFD topology. So the Thunk API isn't told which GPU to allocate the
> system memory BO from. It just picks the first one. Then we can map
> those BOs on any GPUs we want. If we want to use them on GPU2, that's
> where they get mapped.

Well keeping NUMA in mind that actually sounds like a design problem to me.

On NUMA system some parts of the system memory can be "closer" to a GPU 
than other parts.

> So I just put two GPUs in a system and ran a test on GPU2. The system
> memory buffers are allocated from the GPU1 device, but mapped into the
> GPU2 VM. The tests work normally.
>
> If this is enabled by any changes that break existing buffer sharing for
> A+A or A+I systems, please point it out to me. I'm not aware that this
> patch series does anything to that effect.

As I said it completely breaks scanout with A+I systems.

Over all it looks like the change causes more problems than it solves. 
So I'm going to block upstreaming it until we have found a way to make 
it work for everybody.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> Regards,
>>    Felix
>>
>>> So end result is that we probably need to revert it and find a
>>> different solution. I'm already working on this for a couple of weeks
>>> now and should have something ready after I'm done with the PASID
>>> handling.
>>>
>>> Regards,
>>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                     ` <b6af300e-be33-b802-385a-20980d95545d-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14  7:49                                       ` Christian König
  0 siblings, 0 replies; 71+ messages in thread
From: Christian König @ 2018-02-14  7:49 UTC (permalink / raw)
  To: Felix Kuehling, Christian König,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 13.02.2018 um 20:22 schrieb Felix Kuehling:
> On 2018-02-13 01:45 PM, Christian König wrote:
>> Am 13.02.2018 um 17:56 schrieb Felix Kuehling:
>>> [SNIP]
>>> Each process gets a whole page of the doorbell aperture assigned to it.
>>> The assumption is that amdgpu only uses the first page of the doorbell
>>> aperture, so KFD uses all the rest. On GFX8 and before, the queue ID is
>>> used as the offset into the doorbell page. On GFX9 the hardware does
>>> some engine-specific doorbell routing, so we added another layer of
>>> doorbell management that's decoupled from the queue ID.
>>>
>>> Either way, an entire doorbell page gets mapped into user mode and user
>>> mode knows the offset of the doorbells for specific queues. The mapping
>>> is currently handled by kfd_mmap in kfd_chardev.c.
>> Ok, wait a second. Taking a look at kfd_doorbell_mmap() it almost
>> looks like you map different doorbells with the same offset depending
>> on which process is calling this.
>>
>> Is that correct? If yes then that would be illegal and a problem if
>> I'm not completely mistaken.
> Why is that a problem. Each process has its own file descriptor. The
> mapping is done using io_remap_pfn_range in kfd_doorbell_mmap.

Yeah, but all share the same file address space from the inode, doesn't 
they?

E.g. imagine that you map an offset from a normal file into process A 
and the same offset from the same file into process B, what do you get? 
Exactly, the same page mapped into both processes.

Now take a look at the KFD interface, you map the same offset from the 
same file (or rather inode) into two different processes and get two 
different things.

> This is nothing new. It's been done like this forever even on Kaveri and Carrizo.

It most likely doesn't cause any problems of hand or otherwise we would 
have noticed already, but at bare minimum that totally confuses the 
reverse mapping code.

Need to think about it if that is really an issue or not and if yes how 
to fix/mitigate it.

Regards,
Christian.

>
>>>> Do you simply assume that after evicting a process it always needs to
>>>> be restarted without checking if it actually does something? Or how
>>>> does that work?
>>> Exactly.
>> Ok, understood. Well that limits the usefulness of the whole eviction
>> drastically.
>>
>>> With later addition of GPU self-dispatch a page-fault based
>>> mechanism wouldn't work any more. We have to restart the queues blindly
>>> with a timer. See evict_process_worker, which schedules the restore with
>>> a delayed worker.
>>> which was send either by the GPU o
>>> The user mode queue ABI specifies that user mode update both the
>>> doorbell and a WPTR in memory. When we restart queues we (or the CP
>>> firmware) use the WPTR to make sure we catch up with any work that was
>>> submitted while the queues were unmapped.
>> Putting cross process work dispatch aside for a moment GPU
>> self-dispatch works only when there is work on the GPU running.
>>
>> So you can still check if there are some work pending after you
>> unmapped everything and only restart the queues when there is new work
>> based on the page fault.
>>
>> In other words either there is work pending and it doesn't matter if
>> it was send by the GPU or by the CPU or there is no work pending and
>> we can delay restarting everything until there is.
> That sounds like a useful optimization.
>
> Regards,
>    Felix
>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>     Felix
>>>
>>>> Regards,
>>>> Christian.
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                 ` <c991529e-2489-169c-cc34-96ed5bb94a12-5C7GfCeVMHo@public.gmane.org>
  2018-02-13 23:17                                                   ` Felix Kuehling
@ 2018-02-14  8:50                                                   ` Michel Dänzer
       [not found]                                                     ` <255915a6-f101-554a-9087-1fd792ee1de3-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Michel Dänzer @ 2018-02-14  8:50 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-02-13 08:18 PM, Felix Kuehling wrote:
> On 2018-02-13 01:15 PM, Christian König wrote:
>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>
>>> I don't see how you can have separate TTM objects referring to the
>>> same memory.
>>
>> Well that is trivial, we do this all the time with prime and I+A laptops.
> 
> As I understand it, you use DMABuf to export/import buffers on multiple
> devices. I believe all devices share a single amdgpu_bo, which contains
> the ttm_buffer_object.

The dma-buf exporter and importer can be different drivers, so this is
not possible.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                         ` <ca973468-f1af-b510-a6db-af29e279f5ca-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 16:35                                                           ` Felix Kuehling
       [not found]                                                             ` <50befcd2-6a1a-534e-1699-8556c4977b76-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 16:35 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng



On 2018-02-14 02:42 AM, Christian König wrote:
> Am 14.02.2018 um 00:17 schrieb Felix Kuehling:
>> On 2018-02-13 02:18 PM, Felix Kuehling wrote:
>>> On 2018-02-13 01:15 PM, Christian König wrote:
>>>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>>>> On 2018-02-13 12:06 PM, Christian König wrote:
>>>>>> [SNIP]
>>>>>> Ah, yeah that is also a point I wanted to to talk about with you.
>>>>>>
>>>>>> The approach of using the same buffer object with multiple amdgpu
>>>>>> devices doesn't work in general.
>>>>>>
>>>>>> We need separate TTM object for each BO in each device or
>>>>>> otherwise we
>>>>>> break A+A laptops.
>>>>> I think it broke for VRAM BOs because we enabled P2P on systems that
>>>>> didn't support it properly. But at least system memory BOs can be
>>>>> shared
>>>>> quite well between devices and we do it all the time.
>>>> Sharing VRAM BOs is one issue, but the problem goes deeper than just
>>>> that.
>>>>
>>>> Starting with Carizzo we can scanout from system memory to avoid the
>>>> extra copy on A+A laptops. For this to work we need the BO mapped to
>>>> GART (and I mean a real VMID0 mapping, not just in the GTT domain).
>>>> And for this to work in turn we need a TTM object per device and not a
>>>> global one.
>>> I still don't understand. I think what you're talking about applies
>>> only
>>> to BOs used for scan-out. Every BO is allocated from a specific device
>>> and can only be GART-mapped on that device.
>
> Exactly that assumption is incorrect. BOs can be GART mapped into any
> device.

Fine. My point is, we're not doing that.

>
>>>   What we do is map the same
>>> BO in VMs on other devices. It has no effect on GART mappings.
>
> Correct VM mapping is unaffected here.

Great.

>
>>>>> I don't see how you can have separate TTM objects referring to the
>>>>> same memory.
>>>> Well that is trivial, we do this all the time with prime and I+A
>>>> laptops.
>>> As I understand it, you use DMABuf to export/import buffers on multiple
>>> devices. I believe all devices share a single amdgpu_bo, which contains
>>> the ttm_buffer_object.
>
> That's incorrect as well. Normally multiple devices have multiple
> ttm_buffer_object, one for each device.
> Going a bit higher that actually makes sense because the status of
> each BO is deferent for each device. E.g. one device could have the BO
> in access while it could be idle on another device.

Can you point me where this is done? I'm looking at
amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
a different AMDGPU device. It creates a new GEM object, with a reference
to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
very much like the same amdgpu_bo, and cosequently the same TTM BO being
shared by two GEM objects and two devices.

>
> Same is true for placement of the BO. E.g. a VRAM placement of one
> device is actually a GTT placement for another.
>
>>>   The only way you can have a new TTM buffer object
>>> per device is by using SG tables and pinning the BOs. But I think we
>>> want to avoid pinning BOs.
>>>
>>> What we do does not involve pinning of BOs, even when they're shared
>>> between multiple devices' VMs.
>>>
>>>>>> That is also the reason we had to disable this feature again in the
>>>>>> hybrid branches.
>>>>> What you disabled on hybrid branches was P2P, which only affects
>>>>> large-BAR systems. Sharing of system memory BOs between devices still
>>>>> works fine.
>>>> No, it doesn't. It completely breaks any scanout on Carizzo, Stoney
>>>> and Raven. Additional to that we found that it breaks some aspects of
>>>> the user space interface.
>>> Let me check that with my current patch series. The patches I submitted
>>> here shouldn't include anything that breaks the use cases you describe.
>>> But I'm quite sure it will support sharing BOs between multiple
>>> devices'
>>> VMs.
>> Confirmed with the current patch series. I can allocate buffers on GPU 1
>> and map them into a VM on GPU 2.
>
> As I said VM mapping is not the problem here.

Great.

>
>> BTW, this is the normal mode of operation for system memory BOs on
>> multi-GPU systems. Logically system memory BOs belong to the CPU node in
>> the KFD topology. So the Thunk API isn't told which GPU to allocate the
>> system memory BO from. It just picks the first one. Then we can map
>> those BOs on any GPUs we want. If we want to use them on GPU2, that's
>> where they get mapped.
>
> Well keeping NUMA in mind that actually sounds like a design problem
> to me.

We're not doing NUMA with TTM because TTM is not NUMA aware. We have
some prototype NUMA code in the Thunk that uses userptr to map memory
allocated with NUMA awareness to the GPU.

>
> On NUMA system some parts of the system memory can be "closer" to a
> GPU than other parts.

Yes, I understand that.

>
>> So I just put two GPUs in a system and ran a test on GPU2. The system
>> memory buffers are allocated from the GPU1 device, but mapped into the
>> GPU2 VM. The tests work normally.
>>
>> If this is enabled by any changes that break existing buffer sharing for
>> A+A or A+I systems, please point it out to me. I'm not aware that this
>> patch series does anything to that effect.
>
> As I said it completely breaks scanout with A+I systems.

Please tell me what "it" is. What in the changes I have posted is
breaking A+I systems. I don't see it.

>
> Over all it looks like the change causes more problems than it solves.
> So I'm going to block upstreaming it until we have found a way to make
> it work for everybody.

Again, I don't know what "it" is.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>> Regards,
>>>    Felix
>>>
>>>> So end result is that we probably need to revert it and find a
>>>> different solution. I'm already working on this for a couple of weeks
>>>> now and should have something ready after I'm done with the PASID
>>>> handling.
>>>>
>>>> Regards,
>>>> Christian.
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                     ` <255915a6-f101-554a-9087-1fd792ee1de3-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-02-14 16:39                                                       ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 16:39 UTC (permalink / raw)
  To: Michel Dänzer, Christian König,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


On 2018-02-14 03:50 AM, Michel Dänzer wrote:
> On 2018-02-13 08:18 PM, Felix Kuehling wrote:
>> On 2018-02-13 01:15 PM, Christian König wrote:
>>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>>
>>>> I don't see how you can have separate TTM objects referring to the
>>>> same memory.
>>> Well that is trivial, we do this all the time with prime and I+A laptops.
>> As I understand it, you use DMABuf to export/import buffers on multiple
>> devices. I believe all devices share a single amdgpu_bo, which contains
>> the ttm_buffer_object.
> The dma-buf exporter and importer can be different drivers, so this is
> not possible.

Yes. In the general case this is handled by SG tables. However, GEM and
AMDGPU have some special cases for importing buffers from the same driver.

The discussion here is about sharing BOs between different AMDGPU
devices (with or without involving DMABufs). I'm not talking about
sharing BOs with different drivers.

Regards
  Felix

>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                             ` <50befcd2-6a1a-534e-1699-8556c4977b76-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 16:50                                                               ` Michel Dänzer
       [not found]                                                                 ` <ef95f1a4-7b30-47c8-8f33-4e6379d0694b-otUistvHUpPR7s880joybQ@public.gmane.org>
  2018-02-14 18:15                                                               ` Christian König
  1 sibling, 1 reply; 71+ messages in thread
From: Michel Dänzer @ 2018-02-14 16:50 UTC (permalink / raw)
  To: Felix Kuehling, Christian König, oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-02-14 05:35 PM, Felix Kuehling wrote:
> On 2018-02-14 02:42 AM, Christian König wrote:
>> Am 14.02.2018 um 00:17 schrieb Felix Kuehling:
>>> On 2018-02-13 02:18 PM, Felix Kuehling wrote:
>>>> On 2018-02-13 01:15 PM, Christian König wrote:
>>>>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>>>>>
>>>>>> I don't see how you can have separate TTM objects referring to the
>>>>>> same memory.
>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>> laptops.
>>>> As I understand it, you use DMABuf to export/import buffers on multiple
>>>> devices. I believe all devices share a single amdgpu_bo, which contains
>>>> the ttm_buffer_object.
>>
>> That's incorrect as well. Normally multiple devices have multiple
>> ttm_buffer_object, one for each device.
>> Going a bit higher that actually makes sense because the status of
>> each BO is deferent for each device. E.g. one device could have the BO
>> in access while it could be idle on another device.
> 
> Can you point me where this is done? I'm looking at
> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
> a different AMDGPU device. It creates a new GEM object, with a reference
> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
> very much like the same amdgpu_bo, and cosequently the same TTM BO being
> shared by two GEM objects and two devices.

amdgpu_gem_prime_foreign_bo doesn't exist in amd-staging-drm-next, let
alone upstream. Even on current internal branches, it's no longer used
for dma-buf import AFAICT.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                                 ` <ef95f1a4-7b30-47c8-8f33-4e6379d0694b-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2018-02-14 18:12                                                                   ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 18:12 UTC (permalink / raw)
  To: Michel Dänzer, Christian König,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 2018-02-14 11:50 AM, Michel Dänzer wrote:
> On 2018-02-14 05:35 PM, Felix Kuehling wrote:
>> On 2018-02-14 02:42 AM, Christian König wrote:
>>> Am 14.02.2018 um 00:17 schrieb Felix Kuehling:
>>>> On 2018-02-13 02:18 PM, Felix Kuehling wrote:
>>>>> On 2018-02-13 01:15 PM, Christian König wrote:
>>>>>> Am 13.02.2018 um 18:18 schrieb Felix Kuehling:
>>>>>>> I don't see how you can have separate TTM objects referring to the
>>>>>>> same memory.
>>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>>> laptops.
>>>>> As I understand it, you use DMABuf to export/import buffers on multiple
>>>>> devices. I believe all devices share a single amdgpu_bo, which contains
>>>>> the ttm_buffer_object.
>>> That's incorrect as well. Normally multiple devices have multiple
>>> ttm_buffer_object, one for each device.
>>> Going a bit higher that actually makes sense because the status of
>>> each BO is deferent for each device. E.g. one device could have the BO
>>> in access while it could be idle on another device.
>> Can you point me where this is done? I'm looking at
>> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
>> a different AMDGPU device. It creates a new GEM object, with a reference
>> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
>> very much like the same amdgpu_bo, and cosequently the same TTM BO being
>> shared by two GEM objects and two devices.
> amdgpu_gem_prime_foreign_bo doesn't exist in amd-staging-drm-next, let
> alone upstream. Even on current internal branches, it's no longer used
> for dma-buf import AFAICT.

You're right. It exists on the KFD branch for so long that I was taking
it for granted. I see that on amd-staging-drm-next, importing a BO from
a different device, even with the same driver, results in pinning and
using SG tables.

Either way, the patch series discussed here doesn't touch any of that
code. All we do is map system memory BOs into multiple VMs on different
devices. And according Christian that is OK. So I'm a bit perplexed
about the opposition I'm facing from him.

Maybe it's time to take a step back from discussing details that are
irrelevant to the patch series being reviewed here. I understand and I
agree that the hacks we have in the KFD branch for enabling P2P
(including amdgpu_gem_prime_foreign_bo) will not be accepted upstream.
We discussed this a few monts ago with a patch series that was rejected.
Maybe your understanding has evolved beyond that since then and I'm just
catching up on that.

I also want to make it clear, that none of those hacks are included in
this patch series. They are not needed for enabling multi-GPU support
with system memory for KFD. If anything slipped into this patch series
that is objectionable from your point of view, please point it out to
me, and I will gladly address it.

The constructive feedback I've gathered so far concerns:

  * Use of VMs from DRM file descriptors for KFD instead of creating our own
      o I agree with this one because it's a more efficient use of resources
  * Creating GEM objects for KFD buffers
      o I disagree with this one because we don't need GEM objects and I
        haven't seen a good reason to convince me otherwise
  * Using GEM buffer handlers (and IDR) instead of our own in KFD
      o Related to the previous one. I think the overhead of KFD having
        its own per-process IDR is small, compared to the overhead of
        creating a GEM object for each KFD BO. Again, I haven't seen a
        good argument to convince me otherwise

Thanks,
  Felix

>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                             ` <50befcd2-6a1a-534e-1699-8556c4977b76-5C7GfCeVMHo@public.gmane.org>
  2018-02-14 16:50                                                               ` Michel Dänzer
@ 2018-02-14 18:15                                                               ` Christian König
       [not found]                                                                 ` <df9f32ce-7cfe-8684-1090-48f37863a3c7-5C7GfCeVMHo@public.gmane.org>
  1 sibling, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-14 18:15 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 14.02.2018 um 17:35 schrieb Felix Kuehling:
> [SNIP]
>>>>>> I don't see how you can have separate TTM objects referring to the
>>>>>> same memory.
>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>> laptops.
>>>> As I understand it, you use DMABuf to export/import buffers on multiple
>>>> devices. I believe all devices share a single amdgpu_bo, which contains
>>>> the ttm_buffer_object.
>> That's incorrect as well. Normally multiple devices have multiple
>> ttm_buffer_object, one for each device.
>> Going a bit higher that actually makes sense because the status of
>> each BO is deferent for each device. E.g. one device could have the BO
>> in access while it could be idle on another device.
> Can you point me where this is done? I'm looking at
> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
> a different AMDGPU device. It creates a new GEM object, with a reference
> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
> very much like the same amdgpu_bo, and cosequently the same TTM BO being
> shared by two GEM objects and two devices.

As Michel pointed out as well that stuff isn't upstream and judging from 
the recent requirements it will never go upstream.

>> If this is enabled by any changes that break existing buffer sharing for
>> A+A or A+I systems, please point it out to me. I'm not aware that this
>> patch series does anything to that effect.
>> As I said it completely breaks scanout with A+I systems.
> Please tell me what "it" is. What in the changes I have posted is
> breaking A+I systems. I don't see it.

Using the same amdgpu_bo structure with multiple devices is what "it" 
means here.

As I said that concept is incompatible with the requirements on A+A 
systems, so we need to find another solution to provide the functionality.

What's on my TODO list anyway is to extend DMA-buf to not require 
pinning and to be able to deal with P2P.

The former is actually rather easy and already mostly done by sharing 
the reservation object between exporter and importer.

The later is a bit more tricky because I need to create the necessary 
P2P infrastructure, but even that is doable in the mid term.

Regards,
Christian.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                                 ` <df9f32ce-7cfe-8684-1090-48f37863a3c7-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 18:24                                                                   ` Felix Kuehling
       [not found]                                                                     ` <78274290-eaf9-0f79-eb2b-ec7866a4cb70-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 18:24 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-14 01:15 PM, Christian König wrote:
> Am 14.02.2018 um 17:35 schrieb Felix Kuehling:
>> [SNIP]
>>>>>>> I don't see how you can have separate TTM objects referring to the
>>>>>>> same memory.
>>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>>> laptops.
>>>>> As I understand it, you use DMABuf to export/import buffers on
>>>>> multiple
>>>>> devices. I believe all devices share a single amdgpu_bo, which
>>>>> contains
>>>>> the ttm_buffer_object.
>>> That's incorrect as well. Normally multiple devices have multiple
>>> ttm_buffer_object, one for each device.
>>> Going a bit higher that actually makes sense because the status of
>>> each BO is deferent for each device. E.g. one device could have the BO
>>> in access while it could be idle on another device.
>> Can you point me where this is done? I'm looking at
>> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
>> a different AMDGPU device. It creates a new GEM object, with a reference
>> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
>> very much like the same amdgpu_bo, and cosequently the same TTM BO being
>> shared by two GEM objects and two devices.
>
> As Michel pointed out as well that stuff isn't upstream and judging
> from the recent requirements it will never go upstream.
>
>>> If this is enabled by any changes that break existing buffer sharing
>>> for
>>> A+A or A+I systems, please point it out to me. I'm not aware that this
>>> patch series does anything to that effect.
>>> As I said it completely breaks scanout with A+I systems.
>> Please tell me what "it" is. What in the changes I have posted is
>> breaking A+I systems. I don't see it.
>
> Using the same amdgpu_bo structure with multiple devices is what "it"
> means here.

That statement seems to contradict this previous statement by you:
>>>   What we do is map the same
>>> BO in VMs on other devices. It has no effect on GART mappings.
>
> Correct VM mapping is unaffected here. 
Can you clarify that contradiction? Is it OK for us to map the same BO
in multiple VMs or not?

> As I said that concept is incompatible with the requirements on A+A
> systems, so we need to find another solution to provide the
> functionality.

Do you mean you need to find another solution for A+A buffer sharing
specifically? Or is this a more general statement that includes the
mapping of BOs to multiple VMs on different devices?

> What's on my TODO list anyway is to extend DMA-buf to not require
> pinning and to be able to deal with P2P.

Sounds good. That said, KFD is not using DMABufs here.

>
> The former is actually rather easy and already mostly done by sharing
> the reservation object between exporter and importer.
>
> The later is a bit more tricky because I need to create the necessary
> P2P infrastructure, but even that is doable in the mid term.

The sooner you can share your plans, the better. Right now I'm in a bit
of limbo. I feel you're blocking KFD upstreaming based on AMDGPU plans
and changes that no one has seen yet.

Thanks,
  Felix

>
> Regards,
> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                                     ` <78274290-eaf9-0f79-eb2b-ec7866a4cb70-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 18:33                                                                       ` Christian König
       [not found]                                                                         ` <e236e458-5114-49ec-9266-945d11f29035-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Christian König @ 2018-02-14 18:33 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

Am 14.02.2018 um 19:24 schrieb Felix Kuehling:
> On 2018-02-14 01:15 PM, Christian König wrote:
>> Am 14.02.2018 um 17:35 schrieb Felix Kuehling:
>>> [SNIP]
>>>>>>>> I don't see how you can have separate TTM objects referring to the
>>>>>>>> same memory.
>>>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>>>> laptops.
>>>>>> As I understand it, you use DMABuf to export/import buffers on
>>>>>> multiple
>>>>>> devices. I believe all devices share a single amdgpu_bo, which
>>>>>> contains
>>>>>> the ttm_buffer_object.
>>>> That's incorrect as well. Normally multiple devices have multiple
>>>> ttm_buffer_object, one for each device.
>>>> Going a bit higher that actually makes sense because the status of
>>>> each BO is deferent for each device. E.g. one device could have the BO
>>>> in access while it could be idle on another device.
>>> Can you point me where this is done? I'm looking at
>>> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported into
>>> a different AMDGPU device. It creates a new GEM object, with a reference
>>> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
>>> very much like the same amdgpu_bo, and cosequently the same TTM BO being
>>> shared by two GEM objects and two devices.
>> As Michel pointed out as well that stuff isn't upstream and judging
>> from the recent requirements it will never go upstream.
>>
>>>> If this is enabled by any changes that break existing buffer sharing
>>>> for
>>>> A+A or A+I systems, please point it out to me. I'm not aware that this
>>>> patch series does anything to that effect.
>>>> As I said it completely breaks scanout with A+I systems.
>>> Please tell me what "it" is. What in the changes I have posted is
>>> breaking A+I systems. I don't see it.
>> Using the same amdgpu_bo structure with multiple devices is what "it"
>> means here.
> That statement seems to contradict this previous statement by you:
>>>>    What we do is map the same
>>>> BO in VMs on other devices. It has no effect on GART mappings.
>> Correct VM mapping is unaffected here.
> Can you clarify that contradiction? Is it OK for us to map the same BO
> in multiple VMs or not?

By  the current requirements I have I think the answer is no.

>> As I said that concept is incompatible with the requirements on A+A
>> systems, so we need to find another solution to provide the
>> functionality.
> Do you mean you need to find another solution for A+A buffer sharing
> specifically? Or is this a more general statement that includes the
> mapping of BOs to multiple VMs on different devices?

A more general statement. We need to find a solution which works for 
everybody and not just works like this in the KFD but breaks A+A buffer 
sharing and so needs to be disabled there.

>
>> What's on my TODO list anyway is to extend DMA-buf to not require
>> pinning and to be able to deal with P2P.
> Sounds good. That said, KFD is not using DMABufs here.
>
>> The former is actually rather easy and already mostly done by sharing
>> the reservation object between exporter and importer.
>>
>> The later is a bit more tricky because I need to create the necessary
>> P2P infrastructure, but even that is doable in the mid term.
> The sooner you can share your plans, the better. Right now I'm in a bit
> of limbo. I feel you're blocking KFD upstreaming based on AMDGPU plans
> and changes that no one has seen yet.

Well as far as I understand it that is not blocking for the current 
upstreaming because you didn't planned to upstream this use case anyway, 
didn't you?

Regards,
Christian.

>
> Thanks,
>    Felix
>
>> Regards,
>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                                         ` <e236e458-5114-49ec-9266-945d11f29035-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 19:01                                                                           ` Felix Kuehling
       [not found]                                                                             ` <2421ca47-773e-d9c7-0fec-d573812df2c4-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 19:01 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng

On 2018-02-14 01:33 PM, Christian König wrote:
> Am 14.02.2018 um 19:24 schrieb Felix Kuehling:
>> On 2018-02-14 01:15 PM, Christian König wrote:
>>> Am 14.02.2018 um 17:35 schrieb Felix Kuehling:
>>>> [SNIP]
>>>>>>>>> I don't see how you can have separate TTM objects referring to
>>>>>>>>> the
>>>>>>>>> same memory.
>>>>>>>> Well that is trivial, we do this all the time with prime and I+A
>>>>>>>> laptops.
>>>>>>> As I understand it, you use DMABuf to export/import buffers on
>>>>>>> multiple
>>>>>>> devices. I believe all devices share a single amdgpu_bo, which
>>>>>>> contains
>>>>>>> the ttm_buffer_object.
>>>>> That's incorrect as well. Normally multiple devices have multiple
>>>>> ttm_buffer_object, one for each device.
>>>>> Going a bit higher that actually makes sense because the status of
>>>>> each BO is deferent for each device. E.g. one device could have
>>>>> the BO
>>>>> in access while it could be idle on another device.
>>>> Can you point me where this is done? I'm looking at
>>>> amdgpu_gem_prime_foreign_bo. It is used if an AMDGPU BO is imported
>>>> into
>>>> a different AMDGPU device. It creates a new GEM object, with a
>>>> reference
>>>> to the same amdgpu BO (gobj->bo = amdgpu_bo_ref(bo)). To me this looks
>>>> very much like the same amdgpu_bo, and cosequently the same TTM BO
>>>> being
>>>> shared by two GEM objects and two devices.
>>> As Michel pointed out as well that stuff isn't upstream and judging
>>> from the recent requirements it will never go upstream.
>>>
>>>>> If this is enabled by any changes that break existing buffer sharing
>>>>> for
>>>>> A+A or A+I systems, please point it out to me. I'm not aware that
>>>>> this
>>>>> patch series does anything to that effect.
>>>>> As I said it completely breaks scanout with A+I systems.
>>>> Please tell me what "it" is. What in the changes I have posted is
>>>> breaking A+I systems. I don't see it.
>>> Using the same amdgpu_bo structure with multiple devices is what "it"
>>> means here.
>> That statement seems to contradict this previous statement by you:
>>>>>    What we do is map the same
>>>>> BO in VMs on other devices. It has no effect on GART mappings.
>>> Correct VM mapping is unaffected here.
>> Can you clarify that contradiction? Is it OK for us to map the same BO
>> in multiple VMs or not?
>
> By  the current requirements I have I think the answer is no.
>
>>> As I said that concept is incompatible with the requirements on A+A
>>> systems, so we need to find another solution to provide the
>>> functionality.
>> Do you mean you need to find another solution for A+A buffer sharing
>> specifically? Or is this a more general statement that includes the
>> mapping of BOs to multiple VMs on different devices?
>
> A more general statement. We need to find a solution which works for
> everybody and not just works like this in the KFD but breaks A+A
> buffer sharing and so needs to be disabled there.

Well, KFD sharing system memory BOs between GPUs doesn't break A+A.
Implementing a solution for A+A that involves DMABufs will not affect
KFD. And KFD isn't actually broken as far as I know. Once you have a
solution for A+A, maybe it will help me understand the problem and I can
evaluate whether the solution is applicable to KFD and worth adopting.
But for now I have neither a good understanding of the problem, no
evidence that there is a problem affecting KFD, and no way towards a
solution.

>
>>
>>> What's on my TODO list anyway is to extend DMA-buf to not require
>>> pinning and to be able to deal with P2P.
>> Sounds good. That said, KFD is not using DMABufs here.
>>
>>> The former is actually rather easy and already mostly done by sharing
>>> the reservation object between exporter and importer.
>>>
>>> The later is a bit more tricky because I need to create the necessary
>>> P2P infrastructure, but even that is doable in the mid term.
>> The sooner you can share your plans, the better. Right now I'm in a bit
>> of limbo. I feel you're blocking KFD upstreaming based on AMDGPU plans
>> and changes that no one has seen yet.
>
> Well as far as I understand it that is not blocking for the current
> upstreaming because you didn't planned to upstream this use case
> anyway, didn't you?

Which use case? The current patch series enables multi-GPU buffer
sharing of system memory BOs. If it is actually broken, I can reduce the
scope to single-GPU support. But I have no evidence that multi-GPU is
actually broken.

Regards,
  Felix

>
> Regards,
> Christian.
>
>>
>> Thanks,
>>    Felix
>>
>>> Regards,
>>> Christian.
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* Re: [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found]                                                                             ` <2421ca47-773e-d9c7-0fec-d573812df2c4-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-14 19:04                                                                               ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-02-14 19:04 UTC (permalink / raw)
  To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng


On 2018-02-14 02:01 PM, Felix Kuehling wrote:
> On 2018-02-14 01:33 PM, Christian König wrote:
>> Am 14.02.2018 um 19:24 schrieb Felix Kuehling:
>>> On 2018-02-14 01:15 PM, Christian König wrote:
>>>
>>>> As I said that concept is incompatible with the requirements on A+A
>>>> systems, so we need to find another solution to provide the
>>>> functionality.
>>> Do you mean you need to find another solution for A+A buffer sharing
>>> specifically? Or is this a more general statement that includes the
>>> mapping of BOs to multiple VMs on different devices?
>> A more general statement. We need to find a solution which works for
>> everybody and not just works like this in the KFD but breaks A+A
>> buffer sharing and so needs to be disabled there.
> Well, KFD sharing system memory BOs between GPUs doesn't break A+A.
> Implementing a solution for A+A that involves DMABufs will not affect
> KFD. And KFD isn't actually broken as far as I know. Once you have a
> solution for A+A, maybe it will help me understand the problem and I can
> evaluate whether the solution is applicable to KFD and worth adopting.
> But for now I have neither a good understanding of the problem, no
> evidence that there is a problem affecting KFD, and no way towards a
> solution.

Let me add, I'm definitely interested in your solution for P2P, because
we want to enable that for KFD for large-BAR systems. For now I'm not
upstreaming any P2P support, because I know that our current hack is
going to be superseded by the solution you're working on.

Thanks,
  Felix

>
>>>> What's on my TODO list anyway is to extend DMA-buf to not require
>>>> pinning and to be able to deal with P2P.
>>> Sounds good. That said, KFD is not using DMABufs here.
>>>
>>>> The former is actually rather easy and already mostly done by sharing
>>>> the reservation object between exporter and importer.
>>>>
>>>> The later is a bit more tricky because I need to create the necessary
>>>> P2P infrastructure, but even that is doable in the mid term.
>>> The sooner you can share your plans, the better. Right now I'm in a bit
>>> of limbo. I feel you're blocking KFD upstreaming based on AMDGPU plans
>>> and changes that no one has seen yet.
>> Well as far as I understand it that is not blocking for the current
>> upstreaming because you didn't planned to upstream this use case
>> anyway, didn't you?
> Which use case? The current patch series enables multi-GPU buffer
> sharing of system memory BOs. If it is actually broken, I can reduce the
> scope to single-GPU support. But I have no evidence that multi-GPU is
> actually broken.
>
> Regards,
>   Felix
>
>> Regards,
>> Christian.
>>
>>> Thanks,
>>>    Felix
>>>
>>>> Regards,
>>>> Christian.

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 71+ messages in thread

* [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-27  1:09   ` Felix Kuehling
  0 siblings, 0 replies; 71+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

When the TTM memory manager in KGD evicts BOs, all user mode queues
potentially accessing these BOs must be evicted temporarily. Once
user mode queues are evicted, the eviction fence is signaled,
allowing the migration of the BO to proceed.

A delayed worker is scheduled to restore all the BOs belonging to
the evicted process and restart its queues.

During suspend/resume of the GPU we also evict all processes to allow
KGD to save BOs in system memory, since VRAM will be lost.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  65 +++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 219 ++++++++++++++++++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  32 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 213 ++++++++++++++++++++
 6 files changed, 537 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 612afaf..9299a91 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -32,6 +32,7 @@
 #include "cwsr_trap_handler_gfx8.asm"
 
 #define MQD_SIZE_ALIGNED 768
+static atomic_t kfd_device_suspended = ATOMIC_INIT(0);
 
 #if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
 static const struct kfd_device_info kaveri_device_info = {
@@ -545,6 +546,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 	if (!kfd->init_complete)
 		return;
 
+	/* For first KFD device suspend all the KFD processes */
+	if (atomic_inc_return(&kfd_device_suspended) == 1)
+		kfd_suspend_all_processes();
+
 	kfd->dqm->ops.stop(kfd->dqm);
 
 #if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
@@ -561,11 +566,21 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
 {
+	int ret, count;
+
 	if (!kfd->init_complete)
 		return 0;
 
-	return kfd_resume(kfd);
+	ret = kfd_resume(kfd);
+	if (ret)
+		return ret;
+
+	count = atomic_dec_return(&kfd_device_suspended);
+	WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
+	if (count == 0)
+		ret = kfd_resume_all_processes();
 
+	return ret;
 }
 
 static int kfd_resume(struct kfd_dev *kfd)
@@ -625,6 +640,54 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	spin_unlock(&kfd->interrupt_lock);
 }
 
+/** kgd2kfd_schedule_evict_and_restore_process - Schedules work queue that will
+ *   prepare for safe eviction of KFD BOs that belong to the specified
+ *   process.
+ *
+ * @mm: mm_struct that identifies the specified KFD process
+ * @fence: eviction fence attached to KFD process BOs
+ *
+ */
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence)
+{
+	struct kfd_process *p;
+	unsigned long active_time;
+	unsigned long delay_jiffies = msecs_to_jiffies(PROCESS_ACTIVE_TIME_MS);
+
+	if (!fence)
+		return -EINVAL;
+
+	if (dma_fence_is_signaled(fence))
+		return 0;
+
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return -ENODEV;
+
+	if (fence->seqno == p->last_eviction_seqno)
+		goto out;
+
+	p->last_eviction_seqno = fence->seqno;
+
+	/* Avoid KFD process starvation. Wait for at least
+	 * PROCESS_ACTIVE_TIME_MS before evicting the process again
+	 */
+	active_time = get_jiffies_64() - p->last_restore_timestamp;
+	if (delay_jiffies > active_time)
+		delay_jiffies -= active_time;
+	else
+		delay_jiffies = 0;
+
+	/* During process initialization eviction_work.dwork is initialized
+	 * to kfd_evict_bo_worker
+	 */
+	schedule_delayed_work(&p->eviction_work, delay_jiffies);
+out:
+	kfd_unref_process(p);
+	return 0;
+}
+
 static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
 				unsigned int chunk_size)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b7d0639..b3b6dab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -21,10 +21,11 @@
  *
  */
 
+#include <linux/ratelimit.h>
+#include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/types.h>
-#include <linux/printk.h>
 #include <linux/bitops.h>
 #include <linux/sched.h>
 #include "kfd_priv.h"
@@ -180,6 +181,14 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 			goto out_unlock;
 	}
 	q->properties.vmid = qpd->vmid;
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	q->properties.tba_addr = qpd->tba_addr;
 	q->properties.tma_addr = qpd->tma_addr;
@@ -377,15 +386,29 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 {
 	int retval;
 	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
 	bool prev_active = false;
 
 	mutex_lock(&dqm->lock);
+	pdd = kfd_get_process_device_data(q->device, q->process);
+	if (!pdd) {
+		retval = -ENODEV;
+		goto out_unlock;
+	}
 	mqd = dqm->ops.get_mqd_manager(dqm,
 			get_mqd_type_from_queue_type(q->properties.type));
 	if (!mqd) {
 		retval = -ENOMEM;
 		goto out_unlock;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (pdd->qpd.evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	/* Save previous activity state for counters */
 	prev_active = q->properties.is_active;
@@ -457,6 +480,187 @@ static struct mqd_manager *get_mqd_manager(
 	return mqd;
 }
 
+static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot evict queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		retval = mqd->destroy_mqd(mqd, q->mqd,
+				KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,
+				KFD_UNMAP_LATENCY_MS, q->pipe, q->queue);
+		if (retval)
+			goto out;
+		dqm->queue_count--;
+	}
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		dqm->queue_count--;
+	}
+	retval = execute_queues_cpsch(dqm,
+				qpd->is_debug ?
+				KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
+					  struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	if (!list_empty(&qpd->queues_list)) {
+		dqm->dev->kfd2kgd->set_vm_context_page_table_base(
+				dqm->dev->kgd,
+				qpd->vmid,
+				qpd->page_table_base);
+		kfd_flush_tlb(pdd);
+	}
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot restore queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		retval = mqd->load_mqd(mqd, q->mqd, q->pipe,
+				       q->queue, &q->properties,
+				       q->process->mm);
+		if (retval)
+			goto out;
+		dqm->queue_count++;
+	}
+	qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_cpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		dqm->queue_count++;
+	}
+	retval = execute_queues_cpsch(dqm,
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+	if (!retval)
+		qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
 static int register_process(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
@@ -853,6 +1057,14 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		retval = -ENOMEM;
 		goto out;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
 
@@ -1291,6 +1503,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_cpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_cpsch;
+		dqm->ops.restore_process_queues = restore_process_queues_cpsch;
 		break;
 	case KFD_SCHED_POLICY_NO_HWS:
 		/* initialize dqm for no cp scheduling */
@@ -1307,6 +1521,9 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_nocpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_nocpsch;
+		dqm->ops.restore_process_queues =
+			restore_process_queues_nocpsch;
 		break;
 	default:
 		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 68be0aa..412beff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -79,6 +79,10 @@ struct device_process_node {
  *
  * @process_termination: Clears all process queues belongs to that device.
  *
+ * @evict_process_queues: Evict all active queues of a process
+ *
+ * @restore_process_queues: Restore all evicted queues queues of a process
+ *
  */
 
 struct device_queue_manager_ops {
@@ -129,6 +133,11 @@ struct device_queue_manager_ops {
 
 	int (*process_termination)(struct device_queue_manager *dqm,
 			struct qcm_process_device *qpd);
+
+	int (*evict_process_queues)(struct device_queue_manager *dqm,
+				    struct qcm_process_device *qpd);
+	int (*restore_process_queues)(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd);
 };
 
 struct device_queue_manager_asic_ops {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 3ac72be..65574c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -43,6 +43,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.interrupt	= kgd2kfd_interrupt,
 	.suspend	= kgd2kfd_suspend,
 	.resume		= kgd2kfd_resume,
+	.schedule_evict_and_restore_process =
+			  kgd2kfd_schedule_evict_and_restore_process,
 };
 
 int sched_policy = KFD_SCHED_POLICY_HWS;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0687161..785161e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -335,7 +335,11 @@ enum kfd_queue_format {
  * @is_interop: Defines if this is a interop queue. Interop queue means that
  * the queue can access both graphics and compute resources.
  *
- * @is_active: Defines if the queue is active or not.
+ * @is_evicted: Defines if the queue is evicted. Only active queues
+ * are evicted, rendering them inactive.
+ *
+ * @is_active: Defines if the queue is active or not. @is_active and
+ * @is_evicted are protected by the DQM lock.
  *
  * @vmid: If the scheduling mode is no cp scheduling the field defines the vmid
  * of the queue.
@@ -357,6 +361,7 @@ struct queue_properties {
 	uint32_t __iomem *doorbell_ptr;
 	uint32_t doorbell_off;
 	bool is_interop;
+	bool is_evicted;
 	bool is_active;
 	/* Not relevant for user mode queues in cp scheduling */
 	unsigned int vmid;
@@ -460,6 +465,7 @@ struct qcm_process_device {
 	unsigned int queue_count;
 	unsigned int vmid;
 	bool is_debug;
+	unsigned int evicted; /* eviction counter, 0=active */
 
 	/* This flag tells if we should reset all wavefronts on
 	 * process termination
@@ -486,6 +492,17 @@ struct qcm_process_device {
 	uint64_t tma_addr;
 };
 
+/* KFD Memory Eviction */
+
+/* Approx. wait time before attempting to restore evicted BOs */
+#define PROCESS_RESTORE_TIME_MS 100
+/* Approx. back off time if restore fails due to lack of memory */
+#define PROCESS_BACK_OFF_TIME_MS 100
+/* Approx. time before evicting the process again */
+#define PROCESS_ACTIVE_TIME_MS 10
+
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence);
 
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
@@ -600,6 +617,16 @@ struct kfd_process {
 	 * during restore
 	 */
 	struct dma_fence *ef;
+
+	/* Work items for evicting and restoring BOs */
+	struct delayed_work eviction_work;
+	struct delayed_work restore_work;
+	/* seqno of the last scheduled eviction */
+	unsigned int last_eviction_seqno;
+	/* Approx. the last timestamp (in jiffies) when the process was
+	 * restored after an eviction
+	 */
+	unsigned long last_restore_timestamp;
 };
 
 /**
@@ -625,7 +652,10 @@ void kfd_process_destroy_wq(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
 void kfd_unref_process(struct kfd_process *p);
+void kfd_suspend_all_processes(void);
+int kfd_resume_all_processes(void);
 
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index e82f4ac..7eeadfe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -55,6 +55,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 					struct file *filep);
 static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep);
 
+static void evict_process_worker(struct work_struct *work);
+static void restore_process_worker(struct work_struct *work);
+
 
 void kfd_process_create_wq(void)
 {
@@ -239,6 +242,9 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
 	mutex_unlock(&kfd_processes_mutex);
 	synchronize_srcu(&kfd_processes_srcu);
 
+	cancel_delayed_work_sync(&p->eviction_work);
+	cancel_delayed_work_sync(&p->restore_work);
+
 	mutex_lock(&p->mutex);
 
 	/* Iterate over all process device data structures and if the
@@ -360,6 +366,10 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	if (err != 0)
 		goto err_init_apertures;
 
+	INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
+	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
+	process->last_restore_timestamp = get_jiffies_64();
+
 	err = kfd_process_init_cwsr(process, filep);
 	if (err)
 		goto err_init_cwsr;
@@ -411,6 +421,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
 	pdd->qpd.dqm = dev->dqm;
 	pdd->qpd.pqm = &p->pqm;
+	pdd->qpd.evicted = 0;
 	pdd->process = p;
 	pdd->bound = PDD_UNBOUND;
 	pdd->already_dequeued = false;
@@ -625,6 +636,208 @@ struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 	return ret_p;
 }
 
+/* This increments the process->ref counter. */
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
+{
+	struct kfd_process *p;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	p = find_process_by_mm(mm);
+	if (p)
+		kref_get(&p->ref);
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return p;
+}
+
+/* process_evict_queues - Evict all user queues of a process
+ *
+ * Eviction is reference-counted per process-device. This means multiple
+ * evictions from different sources can be nested safely.
+ */
+static int process_evict_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r = 0;
+	unsigned int n_evicted = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
+							    &pdd->qpd);
+		if (r) {
+			pr_err("Failed to evict process queues\n");
+			goto fail;
+		}
+		n_evicted++;
+	}
+
+	return r;
+
+fail:
+	/* To keep state consistent, roll back partial eviction by
+	 * restoring queues
+	 */
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		if (n_evicted == 0)
+			break;
+		if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd))
+			pr_err("Failed to restore queues\n");
+
+		n_evicted--;
+	}
+
+	return r;
+}
+
+/* process_restore_queues - Restore all user queues of a process */
+static  int process_restore_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r, ret = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd);
+		if (r) {
+			pr_err("Failed to restore process queues\n");
+			if (!ret)
+				ret = r;
+		}
+	}
+
+	return ret;
+}
+
+static void evict_process_worker(struct work_struct *work)
+{
+	int ret;
+	struct kfd_process *p;
+	struct delayed_work *dwork;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, eviction_work);
+	WARN_ONCE(p->last_eviction_seqno != p->ef->seqno,
+		  "Eviction fence mismatch\n");
+
+	/* Narrow window of overlap between restore and evict work
+	 * item is possible. Once amdgpu_amdkfd_gpuvm_restore_process_bos
+	 * unreserves KFD BOs, it is possible to evicted again. But
+	 * restore has few more steps of finish. So lets wait for any
+	 * previous restore work to complete
+	 */
+	flush_delayed_work(&p->restore_work);
+
+	pr_debug("Started evicting pasid %d\n", p->pasid);
+	ret = process_evict_queues(p);
+	if (!ret) {
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+		schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
+
+		pr_debug("Finished evicting pasid %d\n", p->pasid);
+	} else
+		pr_err("Failed to evict queues of pasid %d\n", p->pasid);
+}
+
+static void restore_process_worker(struct work_struct *work)
+{
+	struct delayed_work *dwork;
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+	int ret = 0;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, restore_work);
+
+	/* Call restore_process_bos on the first KGD device. This function
+	 * takes care of restoring the whole process including other devices.
+	 * Restore can fail if enough memory is not available. If so,
+	 * reschedule again.
+	 */
+	pdd = list_first_entry(&p->per_device_data,
+			       struct kfd_process_device,
+			       per_device_list);
+
+	pr_debug("Started restoring pasid %d\n", p->pasid);
+
+	/* Setting last_restore_timestamp before successful restoration.
+	 * Otherwise this would have to be set by KGD (restore_process_bos)
+	 * before KFD BOs are unreserved. If not, the process can be evicted
+	 * again before the timestamp is set.
+	 * If restore fails, the timestamp will be set again in the next
+	 * attempt. This would mean that the minimum GPU quanta would be
+	 * PROCESS_ACTIVE_TIME_MS - (time to execute the following two
+	 * functions)
+	 */
+
+	p->last_restore_timestamp = get_jiffies_64();
+	ret = pdd->dev->kfd2kgd->restore_process_bos(p->kgd_process_info,
+						     &p->ef);
+	if (ret) {
+		pr_debug("Failed to restore BOs of pasid %d, retry after %d ms\n",
+			 p->pasid, PROCESS_BACK_OFF_TIME_MS);
+		ret = schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_BACK_OFF_TIME_MS));
+		WARN(!ret, "reschedule restore work failed\n");
+		return;
+	}
+
+	ret = process_restore_queues(p);
+	if (!ret)
+		pr_debug("Finished restoring pasid %d\n", p->pasid);
+	else
+		pr_err("Failed to restore queues of pasid %d\n", p->pasid);
+}
+
+void kfd_suspend_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		cancel_delayed_work_sync(&p->eviction_work);
+		cancel_delayed_work_sync(&p->restore_work);
+
+		if (process_evict_queues(p))
+			pr_err("Failed to suspend process %d\n", p->pasid);
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+}
+
+int kfd_resume_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int ret = 0, idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		if (!schedule_delayed_work(&p->restore_work, 0)) {
+			pr_err("Restore process %d failed during resume\n",
+			       p->pasid);
+			ret = -EFAULT;
+		}
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+	return ret;
+}
+
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 71+ messages in thread

end of thread, other threads:[~2018-02-14 19:04 UTC | newest]

Thread overview: 71+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-07  1:32 [PATCH 00/25] Add KFD GPUVM support for dGPUs v2 Felix Kuehling
     [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-07  1:32   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
2018-02-07  1:32   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
2018-02-07  1:32   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
2018-02-07  1:32   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
2018-02-07  1:32   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
     [not found]     ` <1517967174-21709-6-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-11 12:44       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
     [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-11 12:42       ` Oded Gabbay
2018-02-12 19:19         ` Felix Kuehling
2018-02-07  1:32   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
     [not found]     ` <1517967174-21709-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-11 12:54       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
     [not found]     ` <1517967174-21709-9-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-11 12:54       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
     [not found]     ` <1517967174-21709-10-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  8:42       ` Oded Gabbay
     [not found]         ` <CAFCwf10ThSfo8zphxPRH549LoyJ1H+XM89rpwpSNeJeuWYayAA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-02-12 19:20           ` Felix Kuehling
2018-02-07  1:32   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
2018-02-07  1:32   ` [PATCH 11/25] drm/amdkfd: Centralize IOMMUv2 code and make it conditional Felix Kuehling
     [not found]     ` <1517967174-21709-12-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-07 11:20       ` Christian König
     [not found]         ` <281bede7-0ae6-c7d1-3d3a-a3e0497244c1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-07 20:51           ` Felix Kuehling
     [not found]             ` <bfe03de8-63fb-efb4-94e5-5eaf4628bfc1-5C7GfCeVMHo@public.gmane.org>
2018-02-08  8:16               ` Christian König
     [not found]                 ` <50866577-97a4-2786-18af-ddb60a435aea-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-12  9:06                   ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
     [not found]     ` <1517967174-21709-13-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  9:07       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
     [not found]     ` <1517967174-21709-14-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  9:11       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
     [not found]     ` <1517967174-21709-15-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-09 12:34       ` Christian König
     [not found]         ` <16aa5300-7ddc-518a-2080-70cb31b6ad56-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-09 20:31           ` Felix Kuehling
     [not found]             ` <5f1c33b3-b3ed-f627-9b5c-347b2399d90f-5C7GfCeVMHo@public.gmane.org>
2018-02-11  9:55               ` Christian König
     [not found]                 ` <dd476550-9a48-3adc-30e6-8a94bd04833b-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-02-12 16:57                   ` Felix Kuehling
     [not found]                     ` <d11c598a-b51f-b957-7dae-485025a1ad34-5C7GfCeVMHo@public.gmane.org>
2018-02-12 23:23                       ` Felix Kuehling
     [not found]                         ` <ce14b4cd-2bb7-8f19-b464-ddf9f68f45ad-5C7GfCeVMHo@public.gmane.org>
2018-02-13 10:25                           ` Christian König
     [not found]                             ` <cbd18308-c464-125e-ef9f-180c12a9926a-5C7GfCeVMHo@public.gmane.org>
2018-02-13 16:42                               ` Felix Kuehling
     [not found]                                 ` <a2e39184-8db8-407d-6608-6ae211563459-5C7GfCeVMHo@public.gmane.org>
2018-02-13 17:06                                   ` Christian König
     [not found]                                     ` <8424282f-d196-3cb6-9a6e-a26f8be7d198-5C7GfCeVMHo@public.gmane.org>
2018-02-13 17:18                                       ` Felix Kuehling
     [not found]                                         ` <ebf4d6d7-2424-764f-0bc0-615240c82483-5C7GfCeVMHo@public.gmane.org>
2018-02-13 18:15                                           ` Christian König
     [not found]                                             ` <9f078a60-0cca-ba43-3e1c-c67c2b758988-5C7GfCeVMHo@public.gmane.org>
2018-02-13 19:18                                               ` Felix Kuehling
     [not found]                                                 ` <c991529e-2489-169c-cc34-96ed5bb94a12-5C7GfCeVMHo@public.gmane.org>
2018-02-13 23:17                                                   ` Felix Kuehling
     [not found]                                                     ` <b28b0e4c-f16b-0751-7957-45196c26da82-5C7GfCeVMHo@public.gmane.org>
2018-02-14  7:42                                                       ` Christian König
     [not found]                                                         ` <ca973468-f1af-b510-a6db-af29e279f5ca-5C7GfCeVMHo@public.gmane.org>
2018-02-14 16:35                                                           ` Felix Kuehling
     [not found]                                                             ` <50befcd2-6a1a-534e-1699-8556c4977b76-5C7GfCeVMHo@public.gmane.org>
2018-02-14 16:50                                                               ` Michel Dänzer
     [not found]                                                                 ` <ef95f1a4-7b30-47c8-8f33-4e6379d0694b-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-02-14 18:12                                                                   ` Felix Kuehling
2018-02-14 18:15                                                               ` Christian König
     [not found]                                                                 ` <df9f32ce-7cfe-8684-1090-48f37863a3c7-5C7GfCeVMHo@public.gmane.org>
2018-02-14 18:24                                                                   ` Felix Kuehling
     [not found]                                                                     ` <78274290-eaf9-0f79-eb2b-ec7866a4cb70-5C7GfCeVMHo@public.gmane.org>
2018-02-14 18:33                                                                       ` Christian König
     [not found]                                                                         ` <e236e458-5114-49ec-9266-945d11f29035-5C7GfCeVMHo@public.gmane.org>
2018-02-14 19:01                                                                           ` Felix Kuehling
     [not found]                                                                             ` <2421ca47-773e-d9c7-0fec-d573812df2c4-5C7GfCeVMHo@public.gmane.org>
2018-02-14 19:04                                                                               ` Felix Kuehling
2018-02-14  8:50                                                   ` Michel Dänzer
     [not found]                                                     ` <255915a6-f101-554a-9087-1fd792ee1de3-otUistvHUpPR7s880joybQ@public.gmane.org>
2018-02-14 16:39                                                       ` Felix Kuehling
2018-02-13 10:46                       ` Christian König
     [not found]                         ` <d5499f91-6ebf-94ae-f933-d57cd953e01d-5C7GfCeVMHo@public.gmane.org>
2018-02-13 16:56                           ` Felix Kuehling
     [not found]                             ` <a776f882-2612-35a1-431b-2e939cd36f29-5C7GfCeVMHo@public.gmane.org>
2018-02-13 18:45                               ` Christian König
     [not found]                                 ` <6c9d2b9e-7ae9-099a-9d02-bc2a4985a95a-5C7GfCeVMHo@public.gmane.org>
2018-02-13 19:22                                   ` Felix Kuehling
     [not found]                                     ` <b6af300e-be33-b802-385a-20980d95545d-5C7GfCeVMHo@public.gmane.org>
2018-02-14  7:49                                       ` Christian König
2018-02-07  1:32   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
     [not found]     ` <1517967174-21709-16-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  9:16       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
     [not found]     ` <1517967174-21709-17-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  9:36       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
     [not found]     ` <1517967174-21709-18-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-12  9:41       ` Oded Gabbay
2018-02-07  1:32   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
2018-02-07  1:32   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
2018-02-07  1:32   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
2018-02-07  1:32   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
2018-02-07  1:32   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
2018-02-07  1:32   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
2018-02-07  1:32   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
2018-02-07  1:32   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
  -- strict thread matches above, loose matches on Subject: below --
2018-01-27  1:09 [PATCH 00/25] Add KFD GPUVM support for dGPUs Felix Kuehling
     [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-01-27  1:09   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.