All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/25] Add KFD GPUVM support for dGPUs
@ 2018-01-27  1:09 Felix Kuehling
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

I split this into an AMDGPU and AMDKFD part. The bigger patches that
add lots of new code are not cherry-picked and squashed. Instead I
copied, reorganized and cleaned up the code by hand and then split it
into some semblance of a sensible history. I acknowledged major
contributors with signed-off-by lines but didn't list everyone who
ever touched that code (would probably be most of the team).

I pushed an updated Thunk (rebased on ROCm 1.7) that works with this
KFD update. Most testing was done on Fiji with KFDTest (Yong started
working on open-sourcing it). I was also able to run the OpenCL
version of SHOC, though most sub-tests still fail.

KFDTest can manage VRAM and system memory, submit shader dispatches,
receive events. I haven't tested multi-GPU yet, but in theory that
should also work, with system memory buffers shared between multiple
GPUs.

The big missing piece at this point is support for userptr memory
(user-allocated memory mapped for GPU access). That's giong to be my
next patch series that should enable a much wider range of real-world
applications.

AMDGPU:
Patches 1-5 are minor cleanups and fixes
Patches 6-10 add and implement KFD->KGD interfaces for GPUVM

AMDKFD:
Patches 11-13 are minor cleanups and fixes
Patches 14-25 add all the GPUVM memory management functionality

Felix Kuehling (22):
  drm/amdgpu: remove useless BUG_ONs
  drm/amdgpu: Fix header file dependencies
  drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
  drm/amdgpu: Remove unused kfd2kgd interface
  drm/amdgpu: Add KFD eviction fence
  drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
  drm/amdgpu: add amdgpu_sync_clone
  drm/amdgpu: Add GPUVM memory management functions for KFD
  drm/amdgpu: Add submit IB function for KFD
  drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard
  drm/amdkfd: Use per-device sched_policy
  drm/amdkfd: Add GPUVM virtual address space to PDD
  drm/amdkfd: Implement KFD process eviction/restore
  uapi: Fix type used in ioctl parameter structures
  drm/amdkfd: Remove limit on number of GPUs
  drm/amdkfd: Aperture setup for dGPUs
  drm/amdkfd: Add per-process IDR for buffer handles
  drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
  drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
  drm/amdkfd: Add ioctls for GPUVM memory management
  drm/amdkfd: Kmap event page for dGPUs
  drm/amdkfd: Add module option for testing large-BAR functionality

Harish Kasiviswanathan (1):
  drm/amdkfd: Remove unaligned memory access

Oak Zeng (1):
  drm/amdkfd: Populate DRM render device minor

Yong Zhao (1):
  drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem

 drivers/gpu/drm/amd/amdgpu/Makefile                |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |  127 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  115 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c   |  196 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   80 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   82 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 1500 ++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c         |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h         |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h           |    6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c           |   53 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h           |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |   25 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |    1 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |  484 +++++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |    3 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   65 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  290 +++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    9 +
 drivers/gpu/drm/amd/amdkfd/kfd_events.c            |   31 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |   59 +-
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    7 +
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |   37 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |   79 +-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  490 ++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    4 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  101 +-
 include/uapi/linux/kfd_ioctl.h                     |   87 +-
 29 files changed, 3811 insertions(+), 130 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
                     ` (24 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Dereferencing NULL pointers will cause a BUG anyway. No need to do
an explicit check.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 6 ------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 --
 3 files changed, 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3abed1e..1f620b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -212,10 +212,6 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 	struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
 	int r;
 
-	BUG_ON(kgd == NULL);
-	BUG_ON(gpu_addr == NULL);
-	BUG_ON(cpu_ptr == NULL);
-
 	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
 	if ((*mem) == NULL)
 		return -ENOMEM;
@@ -270,8 +266,6 @@ void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 {
 	struct kgd_mem *mem = (struct kgd_mem *) mem_obj;
 
-	BUG_ON(mem == NULL);
-
 	amdgpu_bo_reserve(mem->bo, true);
 	amdgpu_bo_kunmap(mem->bo);
 	amdgpu_bo_unpin(mem->bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index a9e6aea..74fcb8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -812,8 +812,6 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 	const union amdgpu_firmware_header *hdr;
 
-	BUG_ON(kgd == NULL);
-
 	switch (type) {
 	case KGD_ENGINE_PFP:
 		hdr = (const union amdgpu_firmware_header *)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index b127259..c70c8e1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -775,8 +775,6 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 	const union amdgpu_firmware_header *hdr;
 
-	BUG_ON(kgd == NULL);
-
 	switch (type) {
 	case KGD_ENGINE_PFP:
 		hdr = (const union amdgpu_firmware_header *)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-01-27  1:09   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
                     ` (23 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Yong Zhao, Felix Kuehling

From: Yong Zhao <yong.zhao@amd.com>

The extra fields in struct kgd_mem aren't actually needed. This struct
will be used for GPUVM allocations later.

Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 48 ++++++++++++++----------------
 1 file changed, 23 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 1f620b8..c9f204d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -209,16 +209,13 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **cpu_ptr)
 {
 	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
-	struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
+	struct amdgpu_bo *bo = NULL;
 	int r;
-
-	*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
-	if ((*mem) == NULL)
-		return -ENOMEM;
+	uint64_t gpu_addr_tmp = 0;
+	void *cpu_ptr_tmp = NULL;
 
 	r = amdgpu_bo_create(adev, size, PAGE_SIZE, true, AMDGPU_GEM_DOMAIN_GTT,
-			     AMDGPU_GEM_CREATE_CPU_GTT_USWC, NULL, NULL, 0,
-			     &(*mem)->bo);
+			AMDGPU_GEM_CREATE_CPU_GTT_USWC, NULL, NULL, 0, &bo);
 	if (r) {
 		dev_err(adev->dev,
 			"failed to allocate BO for amdkfd (%d)\n", r);
@@ -226,52 +223,53 @@ int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 	}
 
 	/* map the buffer */
-	r = amdgpu_bo_reserve((*mem)->bo, true);
+	r = amdgpu_bo_reserve(bo, true);
 	if (r) {
 		dev_err(adev->dev, "(%d) failed to reserve bo for amdkfd\n", r);
 		goto allocate_mem_reserve_bo_failed;
 	}
 
-	r = amdgpu_bo_pin((*mem)->bo, AMDGPU_GEM_DOMAIN_GTT,
-				&(*mem)->gpu_addr);
+	r = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT,
+				&gpu_addr_tmp);
 	if (r) {
 		dev_err(adev->dev, "(%d) failed to pin bo for amdkfd\n", r);
 		goto allocate_mem_pin_bo_failed;
 	}
-	*gpu_addr = (*mem)->gpu_addr;
 
-	r = amdgpu_bo_kmap((*mem)->bo, &(*mem)->cpu_ptr);
+	r = amdgpu_bo_kmap(bo, &cpu_ptr_tmp);
 	if (r) {
 		dev_err(adev->dev,
 			"(%d) failed to map bo to kernel for amdkfd\n", r);
 		goto allocate_mem_kmap_bo_failed;
 	}
-	*cpu_ptr = (*mem)->cpu_ptr;
 
-	amdgpu_bo_unreserve((*mem)->bo);
+	*mem_obj = bo;
+	*gpu_addr = gpu_addr_tmp;
+	*cpu_ptr = cpu_ptr_tmp;
+
+	amdgpu_bo_unreserve(bo);
 
 	return 0;
 
 allocate_mem_kmap_bo_failed:
-	amdgpu_bo_unpin((*mem)->bo);
+	amdgpu_bo_unpin(bo);
 allocate_mem_pin_bo_failed:
-	amdgpu_bo_unreserve((*mem)->bo);
+	amdgpu_bo_unreserve(bo);
 allocate_mem_reserve_bo_failed:
-	amdgpu_bo_unref(&(*mem)->bo);
+	amdgpu_bo_unref(&bo);
 
 	return r;
 }
 
 void free_gtt_mem(struct kgd_dev *kgd, void *mem_obj)
 {
-	struct kgd_mem *mem = (struct kgd_mem *) mem_obj;
-
-	amdgpu_bo_reserve(mem->bo, true);
-	amdgpu_bo_kunmap(mem->bo);
-	amdgpu_bo_unpin(mem->bo);
-	amdgpu_bo_unreserve(mem->bo);
-	amdgpu_bo_unref(&(mem->bo));
-	kfree(mem);
+	struct amdgpu_bo *bo = (struct amdgpu_bo *) mem_obj;
+
+	amdgpu_bo_reserve(bo, true);
+	amdgpu_bo_kunmap(bo);
+	amdgpu_bo_unpin(bo);
+	amdgpu_bo_unreserve(bo);
+	amdgpu_bo_unref(&(bo));
 }
 
 void get_local_mem_info(struct kgd_dev *kgd,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 03/25] drm/amdgpu: Fix header file dependencies
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-01-27  1:09   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
  2018-01-27  1:09   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
                     ` (22 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 102dad3..65d5a4e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -26,6 +26,7 @@
 
 #include <drm/amdgpu_drm.h>
 #include <drm/gpu_scheduler.h>
+#include <drm/drm_print.h>
 
 /* max number of rings */
 #define AMDGPU_MAX_RINGS		18
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 21a80f1..13c367a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -28,6 +28,7 @@
 #include <linux/kfifo.h>
 #include <linux/rbtree.h>
 #include <drm/gpu_scheduler.h>
+#include <drm/drm_file.h>
 
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
                     ` (21 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 74fcb8b..b8be7b96 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -787,7 +787,7 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 
 	reg = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
-	return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
+	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
 static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index c70c8e1..744c05b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -704,7 +704,7 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
 
 	reg = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
-	return reg & ATC_VMID0_PASID_MAPPING__VALID_MASK;
+	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
 static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
                     ` (20 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  9 ---------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 10 ----------
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  2 --
 3 files changed, 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index b8be7b96..1362181 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -139,7 +139,6 @@ static uint32_t kgd_address_watch_get_offset(struct kgd_dev *kgd,
 static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd, uint8_t vmid);
 static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 							uint8_t vmid);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
@@ -196,7 +195,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.address_watch_get_offset = kgd_address_watch_get_offset,
 	.get_atc_vmid_pasid_mapping_pasid = get_atc_vmid_pasid_mapping_pasid,
 	.get_atc_vmid_pasid_mapping_valid = get_atc_vmid_pasid_mapping_valid,
-	.write_vmid_invalidate_request = write_vmid_invalidate_request,
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
@@ -790,13 +788,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-{
-	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
-
-	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
-}
-
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 744c05b..5130eac 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -81,7 +81,6 @@ static int kgd_hqd_destroy(struct kgd_dev *kgd, void *mqd,
 				uint32_t queue_id);
 static int kgd_hqd_sdma_destroy(struct kgd_dev *kgd, void *mqd,
 				unsigned int utimeout);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 static int kgd_address_watch_disable(struct kgd_dev *kgd);
 static int kgd_address_watch_execute(struct kgd_dev *kgd,
 					unsigned int watch_point_id,
@@ -99,7 +98,6 @@ static bool get_atc_vmid_pasid_mapping_valid(struct kgd_dev *kgd,
 		uint8_t vmid);
 static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 		uint8_t vmid);
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid);
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
@@ -157,7 +155,6 @@ static const struct kfd2kgd_calls kfd2kgd = {
 			get_atc_vmid_pasid_mapping_pasid,
 	.get_atc_vmid_pasid_mapping_valid =
 			get_atc_vmid_pasid_mapping_valid,
-	.write_vmid_invalidate_request = write_vmid_invalidate_request,
 	.get_fw_version = get_fw_version,
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
@@ -707,13 +704,6 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 	return reg & ATC_VMID0_PASID_MAPPING__PASID_MASK;
 }
 
-static void write_vmid_invalidate_request(struct kgd_dev *kgd, uint8_t vmid)
-{
-	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
-
-	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
-}
-
 static int kgd_address_watch_disable(struct kgd_dev *kgd)
 {
 	return 0;
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index a6752bd..94eab54 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -258,8 +258,6 @@ struct kfd2kgd_calls {
 	uint16_t (*get_atc_vmid_pasid_mapping_pasid)(
 					struct kgd_dev *kgd,
 					uint8_t vmid);
-	void (*write_vmid_invalidate_request)(struct kgd_dev *kgd,
-					uint8_t vmid);
 
 	uint16_t (*get_fw_version)(struct kgd_dev *kgd,
 				enum kgd_engine_type type);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
       [not found]     ` <1517015381-1080-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-01-27  1:09   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
                     ` (19 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This fence is used by KFD to keep memory resident while user mode
queues are enabled. Trying to evict memory will trigger the
enable_signaling callback, which starts a KFD eviction, which
involves preempting user mode queues before signaling the fence.
There is one such fence per process.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 196 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  18 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
 7 files changed, 256 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index d6e5b72..43dc3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -130,6 +130,7 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += \
 	 amdgpu_amdkfd.o \
+	 amdgpu_amdkfd_fence.o \
 	 amdgpu_amdkfd_gfx_v8.o
 
 # add cgs
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 2a519f9..8d92f5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -29,6 +29,8 @@
 #include <linux/mmu_context.h>
 #include <kgd_kfd_interface.h>
 
+extern const struct kgd2kfd_calls *kgd2kfd;
+
 struct amdgpu_device;
 
 struct kgd_mem {
@@ -37,6 +39,19 @@ struct kgd_mem {
 	void *cpu_ptr;
 };
 
+/* KFD Memory Eviction */
+struct amdgpu_amdkfd_fence {
+	struct dma_fence base;
+	void *mm;
+	spinlock_t lock;
+	char timeline_name[TASK_COMM_LEN];
+};
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       void *mm);
+bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm);
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
+
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
new file mode 100644
index 0000000..252e44e
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -0,0 +1,196 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/stacktrace.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include "amdgpu_amdkfd.h"
+
+const struct dma_fence_ops amd_kfd_fence_ops;
+static atomic_t fence_seq = ATOMIC_INIT(0);
+
+static int amd_kfd_fence_signal(struct dma_fence *f);
+
+/* Eviction Fence
+ * Fence helper functions to deal with KFD memory eviction.
+ * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
+ *  evicted unless all the user queues for that process are evicted.
+ *
+ * All the BOs in a process share an eviction fence. When process X wants
+ * to map VRAM memory but TTM can't find enough space, TTM will attempt to
+ * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
+ * by calling ttm_bo_driver->eviction_valuable().
+ *
+ * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
+ *  to process X. Otherwise, it will return true to indicate BO can be
+ *  evicted by TTM.
+ *
+ * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
+ * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
+ * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
+ *
+ * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
+ *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
+ *  --> amdgpu_amdkfd_fence.enable_signaling
+ *
+ * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
+ * user queues and signal fence. The work item will also start another delayed
+ * work item to restore BOs
+ */
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       void *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = NULL;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (fence == NULL)
+		return NULL;
+
+	/* mm_struct mm is used as void pointer to identify the parent
+	 * KFD process. Don't dereference it. Fence and any threads using
+	 * mm is guranteed to be released before process termination.
+	 */
+	fence->mm = mm;
+	get_task_comm(fence->timeline_name, current);
+	spin_lock_init(&fence->lock);
+
+	dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
+		   context, atomic_inc_return(&fence_seq));
+
+	return fence;
+}
+
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence;
+
+	if (!f)
+		return NULL;
+
+	fence = container_of(f, struct amdgpu_amdkfd_fence, base);
+	if (fence && f->ops == &amd_kfd_fence_ops)
+		return fence;
+
+	return NULL;
+}
+
+static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
+{
+	return "amdgpu_amdkfd_fence";
+}
+
+static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	return fence->timeline_name;
+}
+
+/**
+ * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
+ *  a KFD BO and schedules a job to move the BO.
+ *  If fence is already signaled return true.
+ *  If fence is not signaled schedule a evict KFD process work item.
+ */
+static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+
+	if (dma_fence_is_signaled(f))
+		return true;
+
+	if (!kgd2kfd->schedule_evict_and_restore_process(
+				(struct mm_struct *)fence->mm, f))
+		return true;
+
+	return false;
+}
+
+static int amd_kfd_fence_signal(struct dma_fence *f)
+{
+	unsigned long flags;
+	int ret;
+
+	spin_lock_irqsave(f->lock, flags);
+	/* Set enabled bit so cb will called */
+	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
+	ret = dma_fence_signal_locked(f);
+	spin_unlock_irqrestore(f->lock, flags);
+
+	return ret;
+}
+
+/**
+ * amd_kfd_fence_release - callback that fence can be freed
+ *
+ * @fence: fence
+ *
+ * This function is called when the reference count becomes zero.
+ * It just RCU schedules freeing up the fence.
+ */
+static void amd_kfd_fence_release(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+	/* Unconditionally signal the fence. The process is getting
+	 * terminated.
+	 */
+	if (WARN_ON(!fence))
+		return; /* Not an amdgpu_amdkfd_fence */
+
+	amd_kfd_fence_signal(f);
+	kfree_rcu(f, rcu);
+}
+
+/**
+ * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
+ *  if same return TRUE else return FALSE.
+ *
+ * @f: [IN] fence
+ * @mm: [IN] mm that needs to be verified
+ */
+bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+	else if (fence->mm == mm)
+		return true;
+
+	return false;
+}
+
+const struct dma_fence_ops amd_kfd_fence_ops = {
+	.get_driver_name = amd_kfd_fence_get_driver_name,
+	.get_timeline_name = amd_kfd_fence_get_timeline_name,
+	.enable_signaling = amd_kfd_fence_enable_signaling,
+	.signaled = NULL,
+	.wait = dma_fence_default_wait,
+	.release = amd_kfd_fence_release,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 65d5a4e..ca00dd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -36,8 +36,9 @@
 #define AMDGPU_MAX_UVD_ENC_RINGS	2
 
 /* some special values for the owner field */
-#define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
-#define AMDGPU_FENCE_OWNER_VM		((void*)1ul)
+#define AMDGPU_FENCE_OWNER_UNDEFINED	((void *)0ul)
+#define AMDGPU_FENCE_OWNER_VM		((void *)1ul)
+#define AMDGPU_FENCE_OWNER_KFD		((void *)2ul)
 
 #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
 #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index df65c66..0cb31d9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -31,6 +31,7 @@
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 struct amdgpu_sync_entry {
 	struct hlist_node	node;
@@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
 static void *amdgpu_sync_get_owner(struct dma_fence *f)
 {
 	struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
+	struct amdgpu_amdkfd_fence *kfd_fence;
+
+	if (!f)
+		return AMDGPU_FENCE_OWNER_UNDEFINED;
 
 	if (s_fence)
 		return s_fence->owner;
 
+	kfd_fence = to_amdgpu_amdkfd_fence(f);
+	if (kfd_fence)
+		return AMDGPU_FENCE_OWNER_KFD;
+
 	return AMDGPU_FENCE_OWNER_UNDEFINED;
 }
 
@@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 	for (i = 0; i < flist->shared_count; ++i) {
 		f = rcu_dereference_protected(flist->shared[i],
 					      reservation_object_held(resv));
+		/* We only want to trigger KFD eviction fences on
+		 * evict or move jobs. Skip KFD fences otherwise.
+		 */
+		fence_owner = amdgpu_sync_get_owner(f);
+		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+			continue;
+
 		if (amdgpu_sync_same_dev(adev, f)) {
 			/* VM updates are only interesting
 			 * for other VM updates and moves.
 			 */
-			fence_owner = amdgpu_sync_get_owner(f);
 			if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    ((owner == AMDGPU_FENCE_OWNER_VM) !=
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435..c3f33d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -46,6 +46,7 @@
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 #include "bif/bif_4_1_d.h"
 
 #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
@@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	unsigned long num_pages = bo->mem.num_pages;
 	struct drm_mm_node *node = bo->mem.mm_node;
+	struct reservation_object_list *flist;
+	struct dma_fence *f;
+	int i;
+
+	/* If bo is a KFD BO, check if the bo belongs to the current process.
+	 * If true, then return false as any KFD process needs all its BOs to
+	 * be resident to run successfully
+	 */
+	flist = reservation_object_get_list(bo->resv);
+	if (flist) {
+		for (i = 0; i < flist->shared_count; ++i) {
+			f = rcu_dereference_protected(flist->shared[i],
+				reservation_object_held(bo->resv));
+			if (amd_kfd_fence_check_mm(f, current->mm))
+				return false;
+		}
+	}
 
 	switch (bo->mem.mem_type) {
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 94eab54..9e35249 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -30,6 +30,7 @@
 
 #include <linux/types.h>
 #include <linux/bitmap.h>
+#include <linux/dma-fence.h>
 
 struct pci_dev;
 
@@ -286,6 +287,9 @@ struct kfd2kgd_calls {
  *
  * @resume: Notifies amdkfd about a resume action done to a kgd device
  *
+ * @schedule_evict_and_restore_process: Schedules work queue that will prepare
+ * for safe eviction of KFD BOs that belong to the specified process.
+ *
  * This structure contains function callback pointers so the kgd driver
  * will notify to the amdkfd about certain status changes.
  *
@@ -300,6 +304,8 @@ struct kgd2kfd_calls {
 	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
 	void (*suspend)(struct kfd_dev *kfd);
 	int (*resume)(struct kfd_dev *kfd);
+	int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
+			struct dma_fence *fence);
 };
 
 int kgd2kfd_init(unsigned interface_version,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
       [not found]     ` <1517015381-1080-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-01-27  1:09   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
                     ` (18 subsequent siblings)
  25 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Add GPUVM size and DRM render node. Also add function to query the
VMID mask to avoid hard-coding it in multiple places later.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19 +++++++++++++++++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c9f204d..294c467 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -30,6 +30,8 @@
 const struct kgd2kfd_calls *kgd2kfd;
 bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
 
+static const unsigned int compute_vmid_bitmap = 0xFF00;
+
 int amdgpu_amdkfd_init(void)
 {
 	int ret;
@@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 	int last_valid_bit;
 	if (adev->kfd) {
 		struct kgd2kfd_shared_resources gpu_resources = {
-			.compute_vmid_bitmap = 0xFF00,
+			.compute_vmid_bitmap = compute_vmid_bitmap,
 			.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
-			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
+			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
+			.gpuvm_size = adev->vm_manager.max_pfn
+						<< AMDGPU_GPU_PAGE_SHIFT,
+			.drm_render_minor = adev->ddev->render->index
 		};
 
 		/* this is going to have a few of the MSBs set that we need to
@@ -351,3 +356,13 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 
 	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
 }
+
+bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
+{
+	if (adev->kfd) {
+		if ((1 << vmid) & compute_vmid_bitmap)
+			return true;
+	}
+
+	return false;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 8d92f5c..cc3aa13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -66,6 +66,8 @@ void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
 
+bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
+
 /* Shared API */
 int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
 			void **mem_obj, uint64_t *gpu_addr,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 9e35249..36c706a 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -108,6 +108,12 @@ struct kgd2kfd_shared_resources {
 
 	/* Number of bytes at start of aperture reserved for KGD. */
 	size_t doorbell_start_offset;
+
+	/* GPUVM address space size in bytes */
+	uint64_t gpuvm_size;
+
+	/* Minor device number of the render node */
+	int drm_render_minor;
 };
 
 struct tile_config {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
                     ` (17 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Cloning a sync object is useful for waiting for a sync object
without locking the original structure indefinitely, blocking
other threads.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 35 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h |  1 +
 2 files changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index 0cb31d9..b871b97 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -321,6 +321,41 @@ struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit
 	return NULL;
 }
 
+/**
+ * amdgpu_sync_clone - clone a sync object
+ *
+ * @source: sync object to clone
+ * @clone: pointer to destination sync object
+ *
+ * Adds references to all unsignaled fences in @source to @clone. Also
+ * removes signaled fences from @source while at it.
+ */
+int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone)
+{
+	struct amdgpu_sync_entry *e;
+	struct hlist_node *tmp;
+	struct dma_fence *f;
+	int i, r;
+
+	hash_for_each_safe(source->fences, i, tmp, e, node) {
+		f = e->fence;
+		if (!dma_fence_is_signaled(f)) {
+			r = amdgpu_sync_fence(NULL, clone, f, e->explicit);
+			if (r)
+				return r;
+		} else {
+			hash_del(&e->node);
+			dma_fence_put(f);
+			kmem_cache_free(amdgpu_sync_slab, e);
+		}
+	}
+
+	dma_fence_put(clone->last_vm_update);
+	clone->last_vm_update = dma_fence_get(source->last_vm_update);
+
+	return 0;
+}
+
 int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr)
 {
 	struct amdgpu_sync_entry *e;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
index 7aba38d..10cf23a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h
@@ -50,6 +50,7 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 struct dma_fence *amdgpu_sync_peek_fence(struct amdgpu_sync *sync,
 				     struct amdgpu_ring *ring);
 struct dma_fence *amdgpu_sync_get_fence(struct amdgpu_sync *sync, bool *explicit);
+int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone);
 int amdgpu_sync_wait(struct amdgpu_sync *sync, bool intr);
 void amdgpu_sync_free(struct amdgpu_sync *sync);
 int amdgpu_sync_init(void);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
                     ` (16 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile               |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        |    1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |   94 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |   66 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |   67 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 1500 +++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c        |    4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h        |    2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c           |    7 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |   77 ++
 10 files changed, 1815 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 43dc3f9..180b2a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -131,6 +131,7 @@ amdgpu-y += \
 amdgpu-y += \
 	 amdgpu_amdkfd.o \
 	 amdgpu_amdkfd_fence.o \
+	 amdgpu_amdkfd_gpuvm.o \
 	 amdgpu_amdkfd_gfx_v8.o
 
 # add cgs
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 294c467..a44b146 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -58,6 +58,7 @@ int amdgpu_amdkfd_init(void)
 #else
 	ret = -ENOENT;
 #endif
+	amdgpu_amdkfd_gpuvm_init_mem_limits();
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index cc3aa13..5e5a9fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -28,15 +28,44 @@
 #include <linux/types.h>
 #include <linux/mmu_context.h>
 #include <kgd_kfd_interface.h>
+#include <drm/ttm/ttm_execbuf_util.h>
+#include "amdgpu_sync.h"
+#include "amdgpu_vm.h"
 
 extern const struct kgd2kfd_calls *kgd2kfd;
 
 struct amdgpu_device;
 
+struct kfd_bo_va_list {
+	struct list_head bo_list;
+	struct amdgpu_bo_va *bo_va;
+	void *kgd_dev;
+	bool is_mapped;
+	uint64_t va;
+	uint64_t pte_flags;
+};
+
 struct kgd_mem {
+	struct mutex lock;
 	struct amdgpu_bo *bo;
-	uint64_t gpu_addr;
-	void *cpu_ptr;
+	struct list_head bo_va_list;
+	/* protected by amdkfd_process_info.lock */
+	struct ttm_validate_buffer validate_list;
+	struct ttm_validate_buffer resv_list;
+	uint32_t domain;
+	unsigned int mapped_to_gpu_memory;
+	uint64_t va;
+
+	uint32_t mapping_flags;
+
+	struct amdkfd_process_info *process_info;
+
+	struct amdgpu_sync sync;
+
+	/* flags bitfield */
+	bool coherent      : 1;
+	bool no_substitute : 1;
+	bool aql_queue     : 1;
 };
 
 /* KFD Memory Eviction */
@@ -52,6 +81,41 @@ struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
 bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm);
 struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
 
+struct amdkfd_process_info {
+	/* List head of all VMs that belong to a KFD process */
+	struct list_head vm_list_head;
+	/* List head for all KFD BOs that belong to a KFD process. */
+	struct list_head kfd_bo_list;
+	/* Lock to protect kfd_bo_list */
+	struct mutex lock;
+
+	/* Number of VMs */
+	unsigned int n_vms;
+	/* Eviction Fence */
+	struct amdgpu_amdkfd_fence *eviction_fence;
+};
+
+/* struct amdkfd_vm -
+ * For Memory Eviction KGD requires a mechanism to keep track of all KFD BOs
+ * belonging to a KFD process. All the VMs belonging to the same process point
+ * to the same amdkfd_process_info.
+ */
+struct amdkfd_vm {
+	/* Keep base as the first parameter for pointer compatibility between
+	 * amdkfd_vm and amdgpu_vm.
+	 */
+	struct amdgpu_vm base;
+
+	/* List node in amdkfd_process_info.vm_list_head*/
+	struct list_head vm_list_node;
+
+	struct amdgpu_device *adev;
+	/* Points to the KFD process VM info*/
+	struct amdkfd_process_info *process_info;
+
+	uint64_t pd_phys_addr;
+};
+
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
@@ -96,4 +160,30 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd);
 		valid;							\
 	})
 
+/* GPUVM API */
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
+					  void **process_info,
+					  struct dma_fence **ef);
+void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm);
+uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm);
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
+		struct kgd_dev *kgd, uint64_t va, uint64_t size,
+		void *vm, struct kgd_mem **mem,
+		uint64_t *offset, uint32_t flags);
+int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem);
+int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
+int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm);
+int amdgpu_amdkfd_gpuvm_sync_memory(
+		struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
+int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
+		struct kgd_mem *mem, void **kptr, uint64_t *size);
+int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
+					    struct dma_fence **ef);
+
+void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
+void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo);
+
 #endif /* AMDGPU_AMDKFD_H_INCLUDED */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 1362181..65783d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -143,6 +143,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base);
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
 
 /* Because of REG_GET_FIELD() being used, we put this function in the
  * asic specific file.
@@ -199,7 +203,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
 	.get_cu_info = get_cu_info,
-	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
+	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
+	.set_vm_context_page_table_base = set_vm_context_page_table_base,
+	.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
+	.free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
+	.map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
+	.unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
+	.sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
+	.map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
+	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
+	.invalidate_tlbs = invalidate_tlbs,
+	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
@@ -855,3 +872,50 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	return hdr->common.ucode_version;
 }
 
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+			uint32_t page_table_base)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("trying to set page table base for wrong VMID\n");
+		return;
+	}
+	WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
+}
+
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	int vmid;
+	unsigned int tmp;
+
+	for (vmid = 0; vmid < 16; vmid++) {
+		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
+			continue;
+
+		tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
+		if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
+			(tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
+			WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+			RREG32(mmVM_INVALIDATE_RESPONSE);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("non kfd vmid\n");
+		return 0;
+	}
+
+	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+	RREG32(mmVM_INVALIDATE_RESPONSE);
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 5130eac..1b5bf13 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -101,6 +101,10 @@ static uint16_t get_atc_vmid_pasid_mapping_pasid(struct kgd_dev *kgd,
 static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type);
 static void set_scratch_backing_va(struct kgd_dev *kgd,
 					uint64_t va, uint32_t vmid);
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base);
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid);
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid);
 
 /* Because of REG_GET_FIELD() being used, we put this function in the
  * asic specific file.
@@ -159,7 +163,20 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.set_scratch_backing_va = set_scratch_backing_va,
 	.get_tile_config = get_tile_config,
 	.get_cu_info = get_cu_info,
-	.get_vram_usage = amdgpu_amdkfd_get_vram_usage
+	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
+	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
+	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
+	.set_vm_context_page_table_base = set_vm_context_page_table_base,
+	.alloc_memory_of_gpu = amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu,
+	.free_memory_of_gpu = amdgpu_amdkfd_gpuvm_free_memory_of_gpu,
+	.map_memory_to_gpu = amdgpu_amdkfd_gpuvm_map_memory_to_gpu,
+	.unmap_memory_to_gpu = amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu,
+	.sync_memory = amdgpu_amdkfd_gpuvm_sync_memory,
+	.map_gtt_bo_to_kernel = amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel,
+	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
+	.invalidate_tlbs = invalidate_tlbs,
+	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
@@ -816,3 +833,51 @@ static uint16_t get_fw_version(struct kgd_dev *kgd, enum kgd_engine_type type)
 	/* Only 12 bit in use*/
 	return hdr->common.ucode_version;
 }
+
+static void set_vm_context_page_table_base(struct kgd_dev *kgd, uint32_t vmid,
+		uint32_t page_table_base)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("trying to set page table base for wrong VMID\n");
+		return;
+	}
+	WREG32(mmVM_CONTEXT8_PAGE_TABLE_BASE_ADDR + vmid - 8, page_table_base);
+}
+
+static int invalidate_tlbs(struct kgd_dev *kgd, uint16_t pasid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+	int vmid;
+	unsigned int tmp;
+
+	for (vmid = 0; vmid < 16; vmid++) {
+		if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid))
+			continue;
+
+		tmp = RREG32(mmATC_VMID0_PASID_MAPPING + vmid);
+		if ((tmp & ATC_VMID0_PASID_MAPPING__VALID_MASK) &&
+			(tmp & ATC_VMID0_PASID_MAPPING__PASID_MASK) == pasid) {
+			WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+			RREG32(mmVM_INVALIDATE_RESPONSE);
+			break;
+		}
+	}
+
+	return 0;
+}
+
+static int invalidate_tlbs_vmid(struct kgd_dev *kgd, uint16_t vmid)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *) kgd;
+
+	if (!amdgpu_amdkfd_is_kfd_vmid(adev, vmid)) {
+		pr_err("non kfd vmid %d\n", vmid);
+		return -EINVAL;
+	}
+
+	WREG32(mmVM_INVALIDATE_REQUEST, 1 << vmid);
+	RREG32(mmVM_INVALIDATE_RESPONSE);
+	return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
new file mode 100644
index 0000000..b02c297
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -0,0 +1,1500 @@
+/*
+ * Copyright 2014-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#define pr_fmt(fmt) "kfd2kgd: " fmt
+
+#include <linux/list.h>
+#include <drm/drmP.h>
+#include "amdgpu_object.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_amdkfd.h"
+
+/* Special VM and GART address alignment needed for VI pre-Fiji due to
+ * a HW bug.
+ */
+#define VI_BO_SIZE_ALIGN (0x8000)
+
+/* Impose limit on how much memory KFD can use */
+static struct {
+	uint64_t max_system_mem_limit;
+	int64_t system_mem_used;
+	spinlock_t mem_limit_lock;
+} kfd_mem_limit;
+
+/* Struct used for amdgpu_amdkfd_bo_validate */
+struct amdgpu_vm_parser {
+	uint32_t        domain;
+	bool            wait;
+};
+
+static const char * const domain_bit_to_string[] = {
+		"CPU",
+		"GTT",
+		"VRAM",
+		"GDS",
+		"GWS",
+		"OA"
+};
+
+#define domain_string(domain) domain_bit_to_string[ffs(domain)-1]
+
+
+
+static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
+{
+	return (struct amdgpu_device *)kgd;
+}
+
+static bool check_if_add_bo_to_vm(struct amdgpu_vm *avm,
+		struct kgd_mem *mem)
+{
+	struct kfd_bo_va_list *entry;
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list)
+		if (entry->bo_va->base.vm == avm)
+			return false;
+
+	return true;
+}
+
+/* Set memory usage limits. Current, limits are
+ *  System (kernel) memory - 3/8th System RAM
+ */
+void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
+{
+	struct sysinfo si;
+	uint64_t mem;
+
+	si_meminfo(&si);
+	mem = si.totalram - si.totalhigh;
+	mem *= si.mem_unit;
+
+	spin_lock_init(&kfd_mem_limit.mem_limit_lock);
+	kfd_mem_limit.max_system_mem_limit = (mem >> 1) - (mem >> 3);
+	pr_debug("Kernel memory limit %lluM\n",
+		(kfd_mem_limit.max_system_mem_limit >> 20));
+}
+
+static int amdgpu_amdkfd_reserve_system_mem_limit(struct amdgpu_device *adev,
+					      uint64_t size, u32 domain)
+{
+	size_t acc_size;
+	int ret = 0;
+
+	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
+				       sizeof(struct amdgpu_bo));
+
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+	if (domain == AMDGPU_GEM_DOMAIN_GTT) {
+		if (kfd_mem_limit.system_mem_used + (acc_size + size) >
+			kfd_mem_limit.max_system_mem_limit) {
+			ret = -ENOMEM;
+			goto err_no_mem;
+		}
+		kfd_mem_limit.system_mem_used += (acc_size + size);
+	}
+err_no_mem:
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+	return ret;
+}
+
+static void unreserve_system_mem_limit(struct amdgpu_device *adev,
+				       uint64_t size, u32 domain)
+{
+	size_t acc_size;
+
+	acc_size = ttm_bo_dma_acc_size(&adev->mman.bdev, size,
+				       sizeof(struct amdgpu_bo));
+
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+	if (domain == AMDGPU_GEM_DOMAIN_GTT)
+		kfd_mem_limit.system_mem_used -= (acc_size + size);
+	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
+		  "kfd system memory accounting unbalanced");
+
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+}
+
+void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo)
+{
+	spin_lock(&kfd_mem_limit.mem_limit_lock);
+
+	if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT) {
+		kfd_mem_limit.system_mem_used -=
+			(bo->tbo.acc_size + amdgpu_bo_size(bo));
+	}
+	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
+		  "kfd system memory accounting unbalanced");
+
+	spin_unlock(&kfd_mem_limit.mem_limit_lock);
+}
+
+
+/* amdgpu_amdkfd_remove_eviction_fence - Removes eviction fence(s) from BO's
+ *  reservation object.
+ *
+ * @bo: [IN] Remove eviction fence(s) from this BO
+ * @ef: [IN] If ef is specified, then this eviction fence is removed if it
+ *  is present in the shared list.
+ * @ef_list: [OUT] Returns list of eviction fences. These fences are removed
+ *  from BO's reservation object shared list.
+ * @ef_count: [OUT] Number of fences in ef_list.
+ *
+ * NOTE: If called with ef_list, then amdgpu_amdkfd_add_eviction_fence must be
+ *  called to restore the eviction fences and to avoid memory leak. This is
+ *  useful for shared BOs.
+ * NOTE: Must be called with BO reserved i.e. bo->tbo.resv->lock held.
+ */
+static int amdgpu_amdkfd_remove_eviction_fence(struct amdgpu_bo *bo,
+					struct amdgpu_amdkfd_fence *ef,
+					struct amdgpu_amdkfd_fence ***ef_list,
+					unsigned int *ef_count)
+{
+	struct reservation_object_list *fobj;
+	struct reservation_object *resv;
+	unsigned int i = 0, j = 0, k = 0, shared_count;
+	unsigned int count = 0;
+	struct amdgpu_amdkfd_fence **fence_list;
+
+	if (!ef && !ef_list)
+		return -EINVAL;
+
+	if (ef_list) {
+		*ef_list = NULL;
+		*ef_count = 0;
+	}
+
+	resv = bo->tbo.resv;
+	fobj = reservation_object_get_list(resv);
+
+	if (!fobj)
+		return 0;
+
+	preempt_disable();
+	write_seqcount_begin(&resv->seq);
+
+	/* Go through all the shared fences in the resevation object. If
+	 * ef is specified and it exists in the list, remove it and reduce the
+	 * count. If ef is not specified, then get the count of eviction fences
+	 * present.
+	 */
+	shared_count = fobj->shared_count;
+	for (i = 0; i < shared_count; ++i) {
+		struct dma_fence *f;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+					      reservation_object_held(resv));
+
+		if (ef) {
+			if (f->context == ef->base.context) {
+				dma_fence_put(f);
+				fobj->shared_count--;
+			} else
+				RCU_INIT_POINTER(fobj->shared[j++], f);
+
+		} else if (to_amdgpu_amdkfd_fence(f))
+			count++;
+	}
+	write_seqcount_end(&resv->seq);
+	preempt_enable();
+
+	if (ef || !count)
+		return 0;
+
+	/* Alloc memory for count number of eviction fence pointers. Fill the
+	 * ef_list array and ef_count
+	 */
+	fence_list = kcalloc(count, sizeof(struct amdgpu_amdkfd_fence *),
+			     GFP_KERNEL);
+	if (!fence_list)
+		return -ENOMEM;
+
+	preempt_disable();
+	write_seqcount_begin(&resv->seq);
+
+	j = 0;
+	for (i = 0; i < shared_count; ++i) {
+		struct dma_fence *f;
+		struct amdgpu_amdkfd_fence *efence;
+
+		f = rcu_dereference_protected(fobj->shared[i],
+			reservation_object_held(resv));
+
+		efence = to_amdgpu_amdkfd_fence(f);
+		if (efence) {
+			fence_list[k++] = efence;
+			fobj->shared_count--;
+		} else
+			RCU_INIT_POINTER(fobj->shared[j++], f);
+	}
+
+	write_seqcount_end(&resv->seq);
+	preempt_enable();
+
+	*ef_list = fence_list;
+	*ef_count = k;
+
+	return 0;
+}
+
+/* amdgpu_amdkfd_add_eviction_fence - Adds eviction fence(s) back into BO's
+ *  reservation object.
+ *
+ * @bo: [IN] Add eviction fences to this BO
+ * @ef_list: [IN] List of eviction fences to be added
+ * @ef_count: [IN] Number of fences in ef_list.
+ *
+ * NOTE: Must call amdgpu_amdkfd_remove_eviction_fence before calling this
+ *  function.
+ */
+static void amdgpu_amdkfd_add_eviction_fence(struct amdgpu_bo *bo,
+				struct amdgpu_amdkfd_fence **ef_list,
+				unsigned int ef_count)
+{
+	int i;
+
+	if (!ef_list || !ef_count)
+		return;
+
+	for (i = 0; i < ef_count; i++) {
+		amdgpu_bo_fence(bo, &ef_list[i]->base, true);
+		/* Readding the fence takes an additional reference. Drop that
+		 * reference.
+		 */
+		dma_fence_put(&ef_list[i]->base);
+	}
+
+	kfree(ef_list);
+}
+
+static int amdgpu_amdkfd_bo_validate(struct amdgpu_bo *bo, uint32_t domain,
+				     bool wait)
+{
+	struct ttm_operation_ctx ctx = { false, false };
+	int ret;
+
+	if (WARN(amdgpu_ttm_tt_get_usermm(bo->tbo.ttm),
+		 "Called with userptr BO"))
+		return -EINVAL;
+
+	amdgpu_ttm_placement_from_domain(bo, domain);
+
+	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (ret)
+		goto validate_fail;
+	if (wait) {
+		struct amdgpu_amdkfd_fence **ef_list;
+		unsigned int ef_count;
+
+		ret = amdgpu_amdkfd_remove_eviction_fence(bo, NULL, &ef_list,
+							  &ef_count);
+		if (ret)
+			goto validate_fail;
+
+		ttm_bo_wait(&bo->tbo, false, false);
+		amdgpu_amdkfd_add_eviction_fence(bo, ef_list, ef_count);
+	}
+
+validate_fail:
+	return ret;
+}
+
+static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo)
+{
+	struct amdgpu_vm_parser *p = param;
+
+	return amdgpu_amdkfd_bo_validate(bo, p->domain, p->wait);
+}
+
+/* vm_validate_pt_pd_bos - Validate page table and directory BOs
+ *
+ * Page directories are not updated here because huge page handling
+ * during page table updates can invalidate page directory entries
+ * again. Page directories are only updated after updating page
+ * tables.
+ */
+static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
+{
+	struct amdgpu_bo *pd = vm->base.root.base.bo;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
+	struct amdgpu_vm_parser param;
+	uint64_t addr, flags = AMDGPU_PTE_VALID;
+	int ret;
+
+	param.domain = AMDGPU_GEM_DOMAIN_VRAM;
+	param.wait = false;
+
+	ret = amdgpu_vm_validate_pt_bos(adev, &vm->base, amdgpu_amdkfd_validate,
+					&param);
+	if (ret) {
+		pr_err("amdgpu: failed to validate PT BOs\n");
+		return ret;
+	}
+
+	ret = amdgpu_amdkfd_validate(&param, pd);
+	if (ret) {
+		pr_err("amdgpu: failed to validate PD\n");
+		return ret;
+	}
+
+	addr = amdgpu_bo_gpu_offset(vm->base.root.base.bo);
+	amdgpu_gart_get_vm_pde(adev, -1, &addr, &flags);
+	vm->pd_phys_addr = addr;
+
+	if (vm->base.use_cpu_for_update) {
+		ret = amdgpu_bo_kmap(pd, NULL);
+		if (ret) {
+			pr_err("amdgpu: failed to kmap PD, ret=%d\n", ret);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int sync_vm_fence(struct amdgpu_device *adev, struct amdgpu_sync *sync,
+			 struct dma_fence *f)
+{
+	int ret = amdgpu_sync_fence(adev, sync, f, false);
+
+	/* Sync objects can't handle multiple GPUs (contexts) updating
+	 * sync->last_vm_update. Fortunately we don't need it for
+	 * KFD's purposes, so we can just drop that fence.
+	 */
+	if (sync->last_vm_update) {
+		dma_fence_put(sync->last_vm_update);
+		sync->last_vm_update = NULL;
+	}
+
+	return ret;
+}
+
+static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
+{
+	struct amdgpu_bo *pd = vm->root.base.bo;
+	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
+	int ret;
+
+	ret = amdgpu_vm_update_directories(adev, vm);
+	if (ret)
+		return ret;
+
+	return sync_vm_fence(adev, sync, vm->last_update);
+}
+
+/* add_bo_to_vm - Add a BO to a VM
+ *
+ * Everything that needs to bo done only once when a BO is first added
+ * to a VM. It can later be mapped and unmapped many times without
+ * repeating these steps.
+ *
+ * 1. Allocate and initialize BO VA entry data structure
+ * 2. Add BO to the VM
+ * 3. Determine ASIC-specific PTE flags
+ * 4. Alloc page tables and directories if needed
+ * 4a.  Validate new page tables and directories
+ */
+static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
+		struct amdgpu_vm *avm, bool is_aql,
+		struct kfd_bo_va_list **p_bo_va_entry)
+{
+	int ret;
+	struct kfd_bo_va_list *bo_va_entry;
+	struct amdkfd_vm *kvm = container_of(avm,
+					     struct amdkfd_vm, base);
+	struct amdgpu_bo *pd = avm->root.base.bo;
+	struct amdgpu_bo *bo = mem->bo;
+	uint64_t va = mem->va;
+	struct list_head *list_bo_va = &mem->bo_va_list;
+	unsigned long bo_size = bo->tbo.mem.size;
+
+	if (!va) {
+		pr_err("Invalid VA when adding BO to VM\n");
+		return -EINVAL;
+	}
+
+	if (is_aql)
+		va += bo_size;
+
+	bo_va_entry = kzalloc(sizeof(*bo_va_entry), GFP_KERNEL);
+	if (!bo_va_entry)
+		return -ENOMEM;
+
+	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
+			va + bo_size, avm);
+
+	/* Add BO to VM internal data structures*/
+	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, avm, bo);
+	if (!bo_va_entry->bo_va) {
+		ret = -EINVAL;
+		pr_err("Failed to add BO object to VM. ret == %d\n",
+				ret);
+		goto err_vmadd;
+	}
+
+	bo_va_entry->va = va;
+	bo_va_entry->pte_flags = amdgpu_vm_get_pte_flags(adev,
+							 mem->mapping_flags);
+	bo_va_entry->kgd_dev = (void *)adev;
+	list_add(&bo_va_entry->bo_list, list_bo_va);
+
+	if (p_bo_va_entry)
+		*p_bo_va_entry = bo_va_entry;
+
+	/* Allocate new page tables if neeeded and validate
+	 * them. Clearing of new page tables and validate need to wait
+	 * on move fences. We don't want that to trigger the eviction
+	 * fence, so remove it temporarily.
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(pd,
+					kvm->process_info->eviction_fence,
+					NULL, NULL);
+
+	ret = amdgpu_vm_alloc_pts(adev, avm, va, amdgpu_bo_size(bo));
+	if (ret) {
+		pr_err("Failed to allocate pts, err=%d\n", ret);
+		goto err_alloc_pts;
+	}
+
+	ret = vm_validate_pt_pd_bos(kvm);
+	if (ret) {
+		pr_err("validate_pt_pd_bos() failed\n");
+		goto err_alloc_pts;
+	}
+
+	/* Add the eviction fence back */
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+
+	return 0;
+
+err_alloc_pts:
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+	amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
+	list_del(&bo_va_entry->bo_list);
+err_vmadd:
+	kfree(bo_va_entry);
+	return ret;
+}
+
+static void remove_bo_from_vm(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry, unsigned long size)
+{
+	pr_debug("\t remove VA 0x%llx - 0x%llx in entry %p\n",
+			entry->va,
+			entry->va + size, entry);
+	amdgpu_vm_bo_rmv(adev, entry->bo_va);
+	list_del(&entry->bo_list);
+	kfree(entry);
+}
+
+static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
+				struct amdkfd_process_info *process_info)
+{
+	struct ttm_validate_buffer *entry = &mem->validate_list;
+	struct amdgpu_bo *bo = mem->bo;
+
+	INIT_LIST_HEAD(&entry->head);
+	entry->shared = true;
+	entry->bo = &bo->tbo;
+	mutex_lock(&process_info->lock);
+	list_add_tail(&entry->head, &process_info->kfd_bo_list);
+	mutex_unlock(&process_info->lock);
+}
+
+/* Reserving a BO and its page table BOs must happen atomically to
+ * avoid deadlocks. Some operations update multiple VMs at once. Track
+ * all the reservation info in a context structure. Optionally a sync
+ * object can track VM updates.
+ */
+struct bo_vm_reservation_context {
+	struct amdgpu_bo_list_entry kfd_bo; /* BO list entry for the KFD BO */
+	unsigned int n_vms;		    /* Number of VMs reserved	    */
+	struct amdgpu_bo_list_entry *vm_pd; /* Array of VM BO list entries  */
+	struct ww_acquire_ctx ticket;	    /* Reservation ticket	    */
+	struct list_head list, duplicates;  /* BO lists			    */
+	struct amdgpu_sync *sync;	    /* Pointer to sync object	    */
+	bool reserved;			    /* Whether BOs are reserved	    */
+};
+
+enum bo_vm_match {
+	BO_VM_NOT_MAPPED = 0,	/* Match VMs where a BO is not mapped */
+	BO_VM_MAPPED,		/* Match VMs where a BO is mapped     */
+	BO_VM_ALL,		/* Match all VMs a BO was added to    */
+};
+
+/**
+ * reserve_bo_and_vm - reserve a BO and a VM unconditionally.
+ * @mem: KFD BO structure.
+ * @vm: the VM to reserve.
+ * @ctx: the struct that will be used in unreserve_bo_and_vms().
+ */
+static int reserve_bo_and_vm(struct kgd_mem *mem,
+			      struct amdgpu_vm *vm,
+			      struct bo_vm_reservation_context *ctx)
+{
+	struct amdgpu_bo *bo = mem->bo;
+	int ret;
+
+	WARN_ON(!vm);
+
+	ctx->reserved = false;
+	ctx->n_vms = 1;
+	ctx->sync = &mem->sync;
+
+	INIT_LIST_HEAD(&ctx->list);
+	INIT_LIST_HEAD(&ctx->duplicates);
+
+	ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd), GFP_KERNEL);
+	if (!ctx->vm_pd)
+		return -ENOMEM;
+
+	ctx->kfd_bo.robj = bo;
+	ctx->kfd_bo.priority = 0;
+	ctx->kfd_bo.tv.bo = &bo->tbo;
+	ctx->kfd_bo.tv.shared = true;
+	ctx->kfd_bo.user_pages = NULL;
+	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
+
+	amdgpu_vm_get_pd_bo(vm, &ctx->list, &ctx->vm_pd[0]);
+
+	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
+				     false, &ctx->duplicates);
+	if (!ret)
+		ctx->reserved = true;
+	else {
+		pr_err("Failed to reserve buffers in ttm\n");
+		kfree(ctx->vm_pd);
+		ctx->vm_pd = NULL;
+	}
+
+	return ret;
+}
+
+/**
+ * reserve_bo_and_vm - reserve a BO and some VMs that the BO has been added
+ * to, conditionally based on map_type.
+ * @mem: KFD BO structure.
+ * @vm: the VM to reserve. If NULL, then all VMs associated with the BO
+ * is used. Otherwise, a single VM associated with the BO.
+ * @map_type: the mapping status that will be used to filter the VMs.
+ * @ctx: the struct that will be used in unreserve_bo_and_vms().
+ *
+ * Returns 0 for success, negative for failure. If no VMs were reserved
+ * -EINVAL is returned.
+ */
+static int reserve_bo_and_cond_vms(struct kgd_mem *mem,
+				struct amdgpu_vm *vm, enum bo_vm_match map_type,
+				struct bo_vm_reservation_context *ctx)
+{
+	struct amdgpu_bo *bo = mem->bo;
+	struct kfd_bo_va_list *entry;
+	unsigned int i;
+	int ret;
+
+	ctx->reserved = false;
+	ctx->n_vms = 0;
+	ctx->vm_pd = NULL;
+	ctx->sync = &mem->sync;
+
+	INIT_LIST_HEAD(&ctx->list);
+	INIT_LIST_HEAD(&ctx->duplicates);
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if ((vm && vm != entry->bo_va->base.vm) ||
+			(entry->is_mapped != map_type
+			&& map_type != BO_VM_ALL))
+			continue;
+
+		ctx->n_vms++;
+	}
+
+	if (ctx->n_vms != 0) {
+		ctx->vm_pd = kcalloc(ctx->n_vms, sizeof(*ctx->vm_pd),
+				     GFP_KERNEL);
+		if (!ctx->vm_pd)
+			return -ENOMEM;
+	}
+
+	ctx->kfd_bo.robj = bo;
+	ctx->kfd_bo.priority = 0;
+	ctx->kfd_bo.tv.bo = &bo->tbo;
+	ctx->kfd_bo.tv.shared = true;
+	ctx->kfd_bo.user_pages = NULL;
+	list_add(&ctx->kfd_bo.tv.head, &ctx->list);
+
+	i = 0;
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if ((vm && vm != entry->bo_va->base.vm) ||
+			(entry->is_mapped != map_type
+			&& map_type != BO_VM_ALL))
+			continue;
+
+		amdgpu_vm_get_pd_bo(entry->bo_va->base.vm, &ctx->list,
+				&ctx->vm_pd[i]);
+		i++;
+	}
+
+	ret = ttm_eu_reserve_buffers(&ctx->ticket, &ctx->list,
+				     false, &ctx->duplicates);
+	if (!ret)
+		ctx->reserved = true;
+	else
+		pr_err("Failed to reserve buffers in ttm.\n");
+
+	if (ret) {
+		kfree(ctx->vm_pd);
+		ctx->vm_pd = NULL;
+	}
+
+	return ret;
+}
+
+/**
+ * unreserve_bo_and_vms - Unreserve BO and VMs from a reservation context
+ * @ctx: Reservation context to unreserve
+ * @wait: Optionally wait for a sync object representing pending VM updates
+ * @intr: Whether the wait is interruptible
+ *
+ * Also frees any resources allocated in
+ * reserve_bo_and_(cond_)vm(s). Returns the status from
+ * amdgpu_sync_wait.
+ */
+static int unreserve_bo_and_vms(struct bo_vm_reservation_context *ctx,
+				 bool wait, bool intr)
+{
+	int ret = 0;
+
+	if (wait)
+		ret = amdgpu_sync_wait(ctx->sync, intr);
+
+	if (ctx->reserved)
+		ttm_eu_backoff_reservation(&ctx->ticket, &ctx->list);
+	kfree(ctx->vm_pd);
+
+	ctx->sync = NULL;
+
+	ctx->reserved = false;
+	ctx->vm_pd = NULL;
+
+	return ret;
+}
+
+static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
+				struct kfd_bo_va_list *entry,
+				struct amdgpu_sync *sync)
+{
+	struct amdgpu_bo_va *bo_va = entry->bo_va;
+	struct amdgpu_vm *vm = bo_va->base.vm;
+	struct amdkfd_vm *kvm = container_of(vm, struct amdkfd_vm, base);
+	struct amdgpu_bo *pd = vm->root.base.bo;
+
+	/* Remove eviction fence from PD (and thereby from PTs too as
+	 * they share the resv. object). Otherwise during PT update
+	 * job (see amdgpu_vm_bo_update_mapping), eviction fence would
+	 * get added to job->sync object and job execution would
+	 * trigger the eviction fence.
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(pd,
+					    kvm->process_info->eviction_fence,
+					    NULL, NULL);
+	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
+
+	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
+
+	/* Add the eviction fence back */
+	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+
+	sync_vm_fence(adev, sync, bo_va->last_pt_update);
+
+	return 0;
+}
+
+static int update_gpuvm_pte(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry,
+		struct amdgpu_sync *sync)
+{
+	int ret;
+	struct amdgpu_vm *vm;
+	struct amdgpu_bo_va *bo_va;
+	struct amdgpu_bo *bo;
+
+	bo_va = entry->bo_va;
+	vm = bo_va->base.vm;
+	bo = bo_va->base.bo;
+
+	/* Update the page tables  */
+	ret = amdgpu_vm_bo_update(adev, bo_va, false);
+	if (ret) {
+		pr_err("amdgpu_vm_bo_update failed\n");
+		return ret;
+	}
+
+	return sync_vm_fence(adev, sync, bo_va->last_pt_update);
+}
+
+static int map_bo_to_gpuvm(struct amdgpu_device *adev,
+		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync)
+{
+	int ret;
+
+	/* Set virtual address for the allocation */
+	ret = amdgpu_vm_bo_map(adev, entry->bo_va, entry->va, 0,
+			       amdgpu_bo_size(entry->bo_va->base.bo),
+			       entry->pte_flags);
+	if (ret) {
+		pr_err("Failed to map VA 0x%llx in vm. ret %d\n",
+				entry->va, ret);
+		return ret;
+	}
+
+	ret = update_gpuvm_pte(adev, entry, sync);
+	if (ret) {
+		pr_err("update_gpuvm_pte() failed\n");
+		goto update_gpuvm_pte_failed;
+	}
+
+	return 0;
+
+update_gpuvm_pte_failed:
+	unmap_bo_from_gpuvm(adev, entry, sync);
+	return ret;
+}
+
+static int process_validate_vms(struct amdkfd_process_info *process_info)
+{
+	struct amdkfd_vm *peer_vm;
+	int ret;
+
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		ret = vm_validate_pt_pd_bos(peer_vm);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int process_update_pds(struct amdkfd_process_info *process_info,
+			      struct amdgpu_sync *sync)
+{
+	struct amdkfd_vm *peer_vm;
+	int ret;
+
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		ret = vm_update_pds(&peer_vm->base, sync);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
+					  void **process_info,
+					  struct dma_fence **ef)
+{
+	int ret;
+	struct amdkfd_vm *new_vm;
+	struct amdkfd_process_info *info;
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+	new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
+	if (!new_vm)
+		return -ENOMEM;
+
+	/* Initialize the VM context, allocate the page directory and zero it */
+	ret = amdgpu_vm_init(adev, &new_vm->base, AMDGPU_VM_CONTEXT_COMPUTE, 0);
+	if (ret) {
+		pr_err("Failed init vm ret %d\n", ret);
+		goto vm_init_fail;
+	}
+	new_vm->adev = adev;
+
+	if (!*process_info) {
+		info = kzalloc(sizeof(*info), GFP_KERNEL);
+		if (!info) {
+			ret = -ENOMEM;
+			goto alloc_process_info_fail;
+		}
+
+		mutex_init(&info->lock);
+		INIT_LIST_HEAD(&info->vm_list_head);
+		INIT_LIST_HEAD(&info->kfd_bo_list);
+
+		info->eviction_fence =
+			amdgpu_amdkfd_fence_create(dma_fence_context_alloc(1),
+						   current->mm);
+		if (!info->eviction_fence) {
+			pr_err("Failed to create eviction fence\n");
+			goto create_evict_fence_fail;
+		}
+
+		*process_info = info;
+		*ef = dma_fence_get(&info->eviction_fence->base);
+	}
+
+	new_vm->process_info = *process_info;
+
+	mutex_lock(&new_vm->process_info->lock);
+	list_add_tail(&new_vm->vm_list_node,
+			&(new_vm->process_info->vm_list_head));
+	new_vm->process_info->n_vms++;
+	mutex_unlock(&new_vm->process_info->lock);
+
+	*vm = (void *) new_vm;
+
+	pr_debug("Created process vm %p\n", *vm);
+
+	return ret;
+
+create_evict_fence_fail:
+	kfree(info);
+alloc_process_info_fail:
+	amdgpu_vm_fini(adev, &new_vm->base);
+vm_init_fail:
+	kfree(new_vm);
+	return ret;
+
+}
+
+void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *) vm;
+	struct amdgpu_vm *avm = &kfd_vm->base;
+	struct amdgpu_bo *pd;
+	struct amdkfd_process_info *process_info;
+
+	if (WARN_ON(!kgd || !vm))
+		return;
+
+	pr_debug("Destroying process vm %p\n", vm);
+	/* Release eviction fence from PD */
+	pd = avm->root.base.bo;
+	amdgpu_bo_reserve(pd, false);
+	amdgpu_bo_fence(pd, NULL, false);
+	amdgpu_bo_unreserve(pd);
+
+	process_info = kfd_vm->process_info;
+
+	mutex_lock(&process_info->lock);
+	process_info->n_vms--;
+	list_del(&kfd_vm->vm_list_node);
+	mutex_unlock(&process_info->lock);
+
+	/* Release per-process resources */
+	if (!process_info->n_vms) {
+		WARN_ON(!list_empty(&process_info->kfd_bo_list));
+
+		dma_fence_put(&process_info->eviction_fence->base);
+		kfree(process_info);
+	}
+
+	/* Release the VM context */
+	amdgpu_vm_fini(adev, avm);
+	kfree(vm);
+}
+
+uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm)
+{
+	struct amdkfd_vm *avm = (struct amdkfd_vm *)vm;
+
+	return avm->pd_phys_addr >> AMDGPU_GPU_PAGE_SHIFT;
+}
+
+int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
+		struct kgd_dev *kgd, uint64_t va, uint64_t size,
+		void *vm, struct kgd_mem **mem,
+		uint64_t *offset, uint32_t flags)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	struct amdgpu_bo *bo;
+	int byte_align;
+	u32 alloc_domain;
+	u64 alloc_flags;
+	uint32_t mapping_flags;
+	int ret;
+
+	/*
+	 * Check on which domain to allocate BO
+	 */
+	if (flags & ALLOC_MEM_FLAGS_VRAM) {
+		alloc_domain = AMDGPU_GEM_DOMAIN_VRAM;
+		alloc_flags = AMDGPU_GEM_CREATE_VRAM_CLEARED;
+		alloc_flags |= (flags & ALLOC_MEM_FLAGS_PUBLIC) ?
+			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED :
+			AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
+	} else if (flags & ALLOC_MEM_FLAGS_GTT) {
+		alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
+		alloc_flags = 0;
+	} else {
+		return -EINVAL;
+	}
+
+	*mem = kzalloc(sizeof(struct kgd_mem), GFP_KERNEL);
+	if (!*mem)
+		return -ENOMEM;
+	INIT_LIST_HEAD(&(*mem)->bo_va_list);
+	mutex_init(&(*mem)->lock);
+	(*mem)->coherent      = !!(flags & ALLOC_MEM_FLAGS_COHERENT);
+	(*mem)->no_substitute = !!(flags & ALLOC_MEM_FLAGS_NO_SUBSTITUTE);
+	(*mem)->aql_queue     = !!(flags & ALLOC_MEM_FLAGS_AQL_QUEUE_MEM);
+
+	/* Workaround for AQL queue wraparound bug. Map the same
+	 * memory twice. That means we only actually allocate half
+	 * the memory.
+	 */
+	if ((*mem)->aql_queue)
+		size = size >> 1;
+
+	/* Workaround for TLB bug on older VI chips */
+	byte_align = (adev->family == AMDGPU_FAMILY_VI &&
+			adev->asic_type != CHIP_FIJI &&
+			adev->asic_type != CHIP_POLARIS10 &&
+			adev->asic_type != CHIP_POLARIS11) ?
+			VI_BO_SIZE_ALIGN : 1;
+
+	mapping_flags = AMDGPU_VM_PAGE_READABLE;
+	if (flags & ALLOC_MEM_FLAGS_WRITABLE)
+		mapping_flags |= AMDGPU_VM_PAGE_WRITEABLE;
+	if (flags & ALLOC_MEM_FLAGS_EXECUTABLE)
+		mapping_flags |= AMDGPU_VM_PAGE_EXECUTABLE;
+	if ((*mem)->coherent)
+		mapping_flags |= AMDGPU_VM_MTYPE_UC;
+	else
+		mapping_flags |= AMDGPU_VM_MTYPE_NC;
+	(*mem)->mapping_flags = mapping_flags;
+
+	amdgpu_sync_create(&(*mem)->sync);
+
+	ret = amdgpu_amdkfd_reserve_system_mem_limit(adev, size, alloc_domain);
+	if (ret) {
+		pr_debug("Insufficient system memory\n");
+		goto err_bo_create;
+	}
+
+	pr_debug("\tcreate BO VA 0x%llx size 0x%llx domain %s\n",
+			va, size, domain_string(alloc_domain));
+
+	ret = amdgpu_bo_create(adev, size, byte_align, false,
+				alloc_domain, alloc_flags, NULL, NULL, 0, &bo);
+	if (ret) {
+		pr_debug("Failed to create BO on domain %s. ret %d\n",
+				domain_string(alloc_domain), ret);
+		unreserve_system_mem_limit(adev, size, alloc_domain);
+		goto err_bo_create;
+	}
+	bo->kfd_bo = *mem;
+	(*mem)->bo = bo;
+
+	(*mem)->va = va;
+	(*mem)->domain = alloc_domain;
+	(*mem)->mapped_to_gpu_memory = 0;
+	(*mem)->process_info = kfd_vm->process_info;
+	add_kgd_mem_to_kfd_bo_list(*mem, kfd_vm->process_info);
+
+	if (offset)
+		*offset = amdgpu_bo_mmap_offset(bo);
+
+	return 0;
+
+err_bo_create:
+	kfree(*mem);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+	struct amdkfd_process_info *process_info = mem->process_info;
+	unsigned long bo_size = mem->bo->tbo.mem.size;
+	struct kfd_bo_va_list *entry, *tmp;
+	struct bo_vm_reservation_context ctx;
+	struct ttm_validate_buffer *bo_list_entry;
+	int ret;
+
+	mutex_lock(&mem->lock);
+
+	if (mem->mapped_to_gpu_memory > 0) {
+		pr_debug("BO VA 0x%llx size 0x%lx is still mapped.\n",
+				mem->va, bo_size);
+		mutex_unlock(&mem->lock);
+		return -EBUSY;
+	}
+
+	mutex_unlock(&mem->lock);
+	/* lock is not needed after this, since mem is unused and will
+	 * be freed anyway
+	 */
+
+	/* Make sure restore workers don't access the BO any more */
+	bo_list_entry = &mem->validate_list;
+	mutex_lock(&process_info->lock);
+	list_del(&bo_list_entry->head);
+	mutex_unlock(&process_info->lock);
+
+	ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
+	if (unlikely(ret))
+		return ret;
+
+	/* The eviction fence should be removed by the last unmap.
+	 * TODO: Log an error condition if the bo still has the eviction fence
+	 * attached
+	 */
+	amdgpu_amdkfd_remove_eviction_fence(mem->bo,
+					process_info->eviction_fence,
+					NULL, NULL);
+	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
+		mem->va + bo_size * (1 + mem->aql_queue));
+
+	/* Remove from VM internal data structures */
+	list_for_each_entry_safe(entry, tmp, &mem->bo_va_list, bo_list)
+		remove_bo_from_vm((struct amdgpu_device *)entry->kgd_dev,
+				entry, bo_size);
+
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
+	/* Free the sync object */
+	amdgpu_sync_free(&mem->sync);
+
+	/* Free the BO*/
+	amdgpu_bo_unref(&mem->bo);
+	kfree(mem);
+
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	int ret;
+	struct amdgpu_bo *bo;
+	uint32_t domain;
+	struct kfd_bo_va_list *entry;
+	struct bo_vm_reservation_context ctx;
+	struct kfd_bo_va_list *bo_va_entry = NULL;
+	struct kfd_bo_va_list *bo_va_entry_aql = NULL;
+	unsigned long bo_size;
+
+	/* Make sure restore is not running concurrently.
+	 */
+	mutex_lock(&mem->process_info->lock);
+
+	mutex_lock(&mem->lock);
+
+	bo = mem->bo;
+
+	if (!bo) {
+		pr_err("Invalid BO when mapping memory to GPU\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	domain = mem->domain;
+	bo_size = bo->tbo.mem.size;
+
+	pr_debug("Map VA 0x%llx - 0x%llx to vm %p domain %s\n",
+			mem->va,
+			mem->va + bo_size * (1 + mem->aql_queue),
+			vm, domain_string(domain));
+
+	ret = reserve_bo_and_vm(mem, vm, &ctx);
+	if (unlikely(ret))
+		goto out;
+
+	if (check_if_add_bo_to_vm((struct amdgpu_vm *)vm, mem)) {
+		ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm, false,
+				&bo_va_entry);
+		if (ret)
+			goto add_bo_to_vm_failed;
+		if (mem->aql_queue) {
+			ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm,
+					true, &bo_va_entry_aql);
+			if (ret)
+				goto add_bo_to_vm_failed_aql;
+		}
+	} else {
+		ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+		if (unlikely(ret))
+			goto add_bo_to_vm_failed;
+	}
+
+	if (mem->mapped_to_gpu_memory == 0) {
+		/* Validate BO only once. The eviction fence gets added to BO
+		 * the first time it is mapped. Validate will wait for all
+		 * background evictions to complete.
+		 */
+		ret = amdgpu_amdkfd_bo_validate(bo, domain, true);
+		if (ret) {
+			pr_debug("Validate failed\n");
+			goto map_bo_to_gpuvm_failed;
+		}
+	}
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if (entry->bo_va->base.vm == vm && !entry->is_mapped) {
+			pr_debug("\t map VA 0x%llx - 0x%llx in entry %p\n",
+					entry->va, entry->va + bo_size,
+					entry);
+
+			ret = map_bo_to_gpuvm(adev, entry, ctx.sync);
+			if (ret) {
+				pr_err("Failed to map radeon bo to gpuvm\n");
+				goto map_bo_to_gpuvm_failed;
+			}
+
+			ret = vm_update_pds(vm, ctx.sync);
+			if (ret) {
+				pr_err("Failed to update page directories\n");
+				goto map_bo_to_gpuvm_failed;
+			}
+
+			entry->is_mapped = true;
+			mem->mapped_to_gpu_memory++;
+			pr_debug("\t INC mapping count %d\n",
+					mem->mapped_to_gpu_memory);
+		}
+	}
+
+	if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->pin_count)
+		amdgpu_bo_fence(bo,
+				&kfd_vm->process_info->eviction_fence->base,
+				true);
+	ret = unreserve_bo_and_vms(&ctx, false, false);
+
+	goto out;
+
+map_bo_to_gpuvm_failed:
+	if (bo_va_entry_aql)
+		remove_bo_from_vm(adev, bo_va_entry_aql, bo_size);
+add_bo_to_vm_failed_aql:
+	if (bo_va_entry)
+		remove_bo_from_vm(adev, bo_va_entry, bo_size);
+add_bo_to_vm_failed:
+	unreserve_bo_and_vms(&ctx, false, false);
+
+out:
+	mutex_unlock(&mem->process_info->lock);
+	mutex_unlock(&mem->lock);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
+		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdkfd_process_info *process_info =
+		((struct amdkfd_vm *)vm)->process_info;
+	unsigned long bo_size = mem->bo->tbo.mem.size;
+	struct kfd_bo_va_list *entry;
+	struct bo_vm_reservation_context ctx;
+	int ret;
+
+	mutex_lock(&mem->lock);
+
+	ret = reserve_bo_and_cond_vms(mem, vm, BO_VM_MAPPED, &ctx);
+	if (unlikely(ret))
+		goto out;
+
+	ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+	if (unlikely(ret))
+		goto unreserve_out;
+
+	pr_debug("Unmap VA 0x%llx - 0x%llx from vm %p\n",
+		mem->va,
+		mem->va + bo_size * (1 + mem->aql_queue),
+		vm);
+
+	list_for_each_entry(entry, &mem->bo_va_list, bo_list) {
+		if (entry->bo_va->base.vm == vm && entry->is_mapped) {
+			pr_debug("\t unmap VA 0x%llx - 0x%llx from entry %p\n",
+					entry->va,
+					entry->va + bo_size,
+					entry);
+
+			ret = unmap_bo_from_gpuvm(adev, entry, ctx.sync);
+			if (ret == 0) {
+				entry->is_mapped = false;
+			} else {
+				pr_err("failed to unmap VA 0x%llx\n",
+						mem->va);
+				goto unreserve_out;
+			}
+
+			mem->mapped_to_gpu_memory--;
+			pr_debug("\t DEC mapping count %d\n",
+					mem->mapped_to_gpu_memory);
+		}
+	}
+
+	/* If BO is unmapped from all VMs, unfence it. It can be evicted if
+	 * required.
+	 */
+	if (mem->mapped_to_gpu_memory == 0 &&
+	    !amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm) && !mem->bo->pin_count)
+		amdgpu_amdkfd_remove_eviction_fence(mem->bo,
+						process_info->eviction_fence,
+						    NULL, NULL);
+
+unreserve_out:
+	unreserve_bo_and_vms(&ctx, false, false);
+out:
+	mutex_unlock(&mem->lock);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_sync_memory(
+		struct kgd_dev *kgd, struct kgd_mem *mem, bool intr)
+{
+	struct amdgpu_sync sync;
+	int ret;
+
+	amdgpu_sync_create(&sync);
+
+	mutex_lock(&mem->lock);
+	amdgpu_sync_clone(&mem->sync, &sync);
+	mutex_unlock(&mem->lock);
+
+	ret = amdgpu_sync_wait(&sync, intr);
+	amdgpu_sync_free(&sync);
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
+		struct kgd_mem *mem, void **kptr, uint64_t *size)
+{
+	int ret;
+	struct amdgpu_bo *bo = mem->bo;
+
+	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
+		pr_err("userptr can't be mapped to kernel\n");
+		return -EINVAL;
+	}
+
+	/* delete kgd_mem from kfd_bo_list to avoid re-validating
+	 * this BO in BO's restoring after eviction.
+	 */
+	mutex_lock(&mem->process_info->lock);
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		pr_err("Failed to reserve bo. ret %d\n", ret);
+		goto bo_reserve_failed;
+	}
+
+	ret = amdgpu_bo_pin(bo, AMDGPU_GEM_DOMAIN_GTT, NULL);
+	if (ret) {
+		pr_err("Failed to pin bo. ret %d\n", ret);
+		goto pin_failed;
+	}
+
+	ret = amdgpu_bo_kmap(bo, kptr);
+	if (ret) {
+		pr_err("Failed to map bo to kernel. ret %d\n", ret);
+		goto kmap_failed;
+	}
+
+	amdgpu_amdkfd_remove_eviction_fence(
+		bo, mem->process_info->eviction_fence, NULL, NULL);
+	list_del_init(&mem->validate_list.head);
+
+	if (size)
+		*size = amdgpu_bo_size(bo);
+
+	amdgpu_bo_unreserve(bo);
+
+	mutex_unlock(&mem->process_info->lock);
+	return 0;
+
+kmap_failed:
+	amdgpu_bo_unpin(bo);
+pin_failed:
+	amdgpu_bo_unreserve(bo);
+bo_reserve_failed:
+	mutex_unlock(&mem->process_info->lock);
+
+	return ret;
+}
+
+/** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
+ *   KFD process identified by process_info
+ *
+ * @process_info: amdkfd_process_info of the KFD process
+ *
+ * After memory eviction, restore thread calls this function. The function
+ * should be called when the Process is still valid. BO restore involves -
+ *
+ * 1.  Release old eviction fence and create new one
+ * 2.  Get two copies of PD BO list from all the VMs. Keep one copy as pd_list.
+ * 3   Use the second PD list and kfd_bo_list to create a list (ctx.list) of
+ *     BOs that need to be reserved.
+ * 4.  Reserve all the BOs
+ * 5.  Validate of PD and PT BOs.
+ * 6.  Validate all KFD BOs using kfd_bo_list and Map them and add new fence
+ * 7.  Add fence to all PD and PT BOs.
+ * 8.  Unreserve all BOs
+ */
+int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
+{
+	struct amdgpu_bo_list_entry *pd_bo_list;
+	struct amdkfd_process_info *process_info = info;
+	struct amdkfd_vm *peer_vm;
+	struct kgd_mem *mem;
+	struct bo_vm_reservation_context ctx;
+	struct amdgpu_amdkfd_fence *new_fence;
+	int ret = 0, i;
+	struct list_head duplicate_save;
+	struct amdgpu_sync sync_obj;
+
+	INIT_LIST_HEAD(&duplicate_save);
+	INIT_LIST_HEAD(&ctx.list);
+	INIT_LIST_HEAD(&ctx.duplicates);
+
+	pd_bo_list = kcalloc(process_info->n_vms,
+			     sizeof(struct amdgpu_bo_list_entry),
+			     GFP_KERNEL);
+	if (!pd_bo_list)
+		return -ENOMEM;
+
+	i = 0;
+	mutex_lock(&process_info->lock);
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			vm_list_node)
+		amdgpu_vm_get_pd_bo(&peer_vm->base, &ctx.list,
+				    &pd_bo_list[i++]);
+
+	/* Reserve all BOs and page tables/directory. Add all BOs from
+	 * kfd_bo_list to ctx.list
+	 */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+			    validate_list.head) {
+
+		list_add_tail(&mem->resv_list.head, &ctx.list);
+		mem->resv_list.bo = mem->validate_list.bo;
+		mem->resv_list.shared = mem->validate_list.shared;
+	}
+
+	ret = ttm_eu_reserve_buffers(&ctx.ticket, &ctx.list,
+				     false, &duplicate_save);
+	if (ret) {
+		pr_debug("Memory eviction: TTM Reserve Failed. Try again\n");
+		goto ttm_reserve_fail;
+	}
+
+	amdgpu_sync_create(&sync_obj);
+
+	/* Validate PDs and PTs */
+	ret = process_validate_vms(process_info);
+	if (ret)
+		goto validate_map_fail;
+
+	/* Wait for PD/PTs validate to finish */
+	/* FIXME: I think this isn't needed */
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+
+		ttm_bo_wait(&bo->tbo, false, false);
+	}
+
+	/* Validate BOs and map them to GPUVM (update VM page tables). */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+			    validate_list.head) {
+
+		struct amdgpu_bo *bo = mem->bo;
+		uint32_t domain = mem->domain;
+		struct kfd_bo_va_list *bo_va_entry;
+
+		ret = amdgpu_amdkfd_bo_validate(bo, domain, false);
+		if (ret) {
+			pr_debug("Memory eviction: Validate BOs failed. Try again\n");
+			goto validate_map_fail;
+		}
+
+		list_for_each_entry(bo_va_entry, &mem->bo_va_list,
+				    bo_list) {
+			ret = update_gpuvm_pte((struct amdgpu_device *)
+					      bo_va_entry->kgd_dev,
+					      bo_va_entry,
+					      &sync_obj);
+			if (ret) {
+				pr_debug("Memory eviction: update PTE failed. Try again\n");
+				goto validate_map_fail;
+			}
+		}
+	}
+
+	/* Update page directories */
+	ret = process_update_pds(process_info, &sync_obj);
+	if (ret) {
+		pr_debug("Memory eviction: update PDs failed. Try again\n");
+		goto validate_map_fail;
+	}
+
+	amdgpu_sync_wait(&sync_obj, false);
+
+	/* Release old eviction fence and create new one, because fence only
+	 * goes from unsignaled to signaled, fence cannot be reused.
+	 * Use context and mm from the old fence.
+	 */
+	new_fence = amdgpu_amdkfd_fence_create(
+				process_info->eviction_fence->base.context,
+				process_info->eviction_fence->mm);
+	if (!new_fence) {
+		pr_err("Failed to create eviction fence\n");
+		ret = -ENOMEM;
+		goto validate_map_fail;
+	}
+	dma_fence_put(&process_info->eviction_fence->base);
+	process_info->eviction_fence = new_fence;
+	*ef = dma_fence_get(&new_fence->base);
+
+	/* Wait for validate to finish and attach new eviction fence */
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+		validate_list.head)
+		ttm_bo_wait(&mem->bo->tbo, false, false);
+	list_for_each_entry(mem, &process_info->kfd_bo_list,
+		validate_list.head)
+		amdgpu_bo_fence(mem->bo,
+			&process_info->eviction_fence->base, true);
+
+	/* Attach eviction fence to PD / PT BOs */
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node) {
+		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+
+		amdgpu_bo_fence(bo, &process_info->eviction_fence->base, true);
+	}
+
+validate_map_fail:
+	ttm_eu_backoff_reservation(&ctx.ticket, &ctx.list);
+	amdgpu_sync_free(&sync_obj);
+ttm_reserve_fail:
+	mutex_unlock(&process_info->lock);
+	kfree(pd_bo_list);
+	return ret;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 5c4c3e0..f608ecf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -36,6 +36,7 @@
 #include <drm/drm_cache.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 static bool amdgpu_need_backup(struct amdgpu_device *adev)
 {
@@ -54,6 +55,9 @@ static void amdgpu_ttm_bo_destroy(struct ttm_buffer_object *tbo)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(tbo->bdev);
 	struct amdgpu_bo *bo = ttm_to_amdgpu_bo(tbo);
 
+	if (bo->kfd_bo)
+		amdgpu_amdkfd_unreserve_system_memory_limit(bo);
+
 	amdgpu_bo_kunmap(bo);
 
 	drm_gem_object_release(&bo->gem_base);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 33615e2..ba5330a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -92,6 +92,8 @@ struct amdgpu_bo {
 		struct list_head	mn_list;
 		struct list_head	shadow_list;
 	};
+
+	struct kgd_mem                  *kfd_bo;
 };
 
 static inline struct amdgpu_bo *ttm_to_amdgpu_bo(struct ttm_buffer_object *tbo)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c3f33d3..76ee968 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -261,6 +261,13 @@ static int amdgpu_verify_access(struct ttm_buffer_object *bo, struct file *filp)
 {
 	struct amdgpu_bo *abo = ttm_to_amdgpu_bo(bo);
 
+	/*
+	 * Don't verify access for KFD BOs. They don't have a GEM
+	 * object associated with them.
+	 */
+	if (abo->kfd_bo)
+		return 0;
+
 	if (amdgpu_ttm_tt_get_usermm(bo->ttm))
 		return -EPERM;
 	return drm_vma_node_verify_access(&abo->gem_base.vma_node,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 36c706a..5984fec 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -127,6 +127,25 @@ struct tile_config {
 	uint32_t num_ranks;
 };
 
+
+/*
+ * Allocation flag domains
+ */
+#define ALLOC_MEM_FLAGS_VRAM		(1 << 0)
+#define ALLOC_MEM_FLAGS_GTT		(1 << 1)
+#define ALLOC_MEM_FLAGS_USERPTR		(1 << 2) /* TODO */
+#define ALLOC_MEM_FLAGS_DOORBELL	(1 << 3) /* TODO */
+
+/*
+ * Allocation flags attributes/access options.
+ */
+#define ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
+#define ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
+#define ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
+#define ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28) /* TODO */
+#define ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
+#define ALLOC_MEM_FLAGS_COHERENT	(1 << 26) /* For GFXv9 or later */
+
 /**
  * struct kfd2kgd_calls
  *
@@ -186,6 +205,41 @@ struct tile_config {
  *
  * @get_vram_usage: Returns current VRAM usage
  *
+ * @create_process_vm: Create a VM address space for a given process and GPU
+ *
+ * @destroy_process_vm: Destroy a VM
+ *
+ * @get_process_page_dir: Get physical address of a VM page directory
+ *
+ * @set_vm_context_page_table_base: Program page table base for a VMID
+ *
+ * @alloc_memory_of_gpu: Allocate GPUVM memory
+ *
+ * @free_memory_of_gpu: Free GPUVM memory
+ *
+ * @map_memory_to_gpu: Map GPUVM memory into a specific VM address
+ * space. Allocates and updates page tables and page directories as
+ * needed. This function may return before all page table updates have
+ * completed. This allows multiple map operations (on multiple GPUs)
+ * to happen concurrently. Use sync_memory to synchronize with all
+ * pending updates.
+ *
+ * @unmap_memor_to_gpu: Unmap GPUVM memory from a specific VM address space
+ *
+ * @sync_memory: Wait for pending page table updates to complete
+ *
+ * @map_gtt_bo_to_kernel: Map a GTT BO for kernel access
+ * Pins the BO, maps it to kernel address space. Such BOs are never evicted.
+ * The kernel virtual address remains valid until the BO is freed.
+ *
+ * @restore_process_bos: Restore all BOs that belong to the
+ * process. This is intended for restoring memory mappings after a TTM
+ * eviction.
+ *
+ * @invalidate_tlbs: Invalidate TLBs for a specific PASID
+ *
+ * @invalidate_tlbs_vmid: Invalidate TLBs for a specific VMID
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -275,6 +329,29 @@ struct kfd2kgd_calls {
 	void (*get_cu_info)(struct kgd_dev *kgd,
 			struct kfd_cu_info *cu_info);
 	uint64_t (*get_vram_usage)(struct kgd_dev *kgd);
+
+	int (*create_process_vm)(struct kgd_dev *kgd, void **vm,
+			void **process_info, struct dma_fence **ef);
+	void (*destroy_process_vm)(struct kgd_dev *kgd, void *vm);
+	uint32_t (*get_process_page_dir)(void *vm);
+	void (*set_vm_context_page_table_base)(struct kgd_dev *kgd,
+			uint32_t vmid, uint32_t page_table_base);
+	int (*alloc_memory_of_gpu)(struct kgd_dev *kgd, uint64_t va,
+			uint64_t size, void *vm,
+			struct kgd_mem **mem, uint64_t *offset,
+			uint32_t flags);
+	int (*free_memory_of_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem);
+	int (*map_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void *vm);
+	int (*unmap_memory_to_gpu)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void *vm);
+	int (*sync_memory)(struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
+	int (*map_gtt_bo_to_kernel)(struct kgd_dev *kgd, struct kgd_mem *mem,
+			void **kptr, uint64_t *size);
+	int (*restore_process_bos)(void *process_info, struct dma_fence **ef);
+
+	int (*invalidate_tlbs)(struct kgd_dev *kgd, uint16_t pasid);
+	int (*invalidate_tlbs_vmid)(struct kgd_dev *kgd, uint16_t vmid);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 10/25] drm/amdgpu: Add submit IB function for KFD
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 11/25] drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard Felix Kuehling
                     ` (15 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This can be used for flushing caches when not using the HWS.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c        | 55 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  1 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |  8 ++++
 5 files changed, 69 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index a44b146..3fa636e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -358,6 +358,61 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
 	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
 }
 
+int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
+				uint32_t vmid, uint64_t gpu_addr,
+				uint32_t *ib_cmd, uint32_t ib_len)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+	struct amdgpu_job *job;
+	struct amdgpu_ib *ib;
+	struct amdgpu_ring *ring;
+	struct dma_fence *f = NULL;
+	int ret;
+
+	switch (engine) {
+	case KGD_ENGINE_MEC1:
+		ring = &adev->gfx.compute_ring[0];
+		break;
+	case KGD_ENGINE_SDMA1:
+		ring = &adev->sdma.instance[0].ring;
+		break;
+	case KGD_ENGINE_SDMA2:
+		ring = &adev->sdma.instance[1].ring;
+		break;
+	default:
+		pr_err("Invalid engine in IB submission: %d\n", engine);
+		ret = -EINVAL;
+		goto err;
+	}
+
+	ret = amdgpu_job_alloc(adev, 1, &job, NULL);
+	if (ret)
+		goto err;
+
+	ib = &job->ibs[0];
+	memset(ib, 0, sizeof(struct amdgpu_ib));
+
+	ib->gpu_addr = gpu_addr;
+	ib->ptr = ib_cmd;
+	ib->length_dw = ib_len;
+	/* This works for NO_HWS. TODO: need to handle without knowing VMID */
+	job->vmid = vmid;
+
+	ret = amdgpu_ib_schedule(ring, 1, ib, job, &f);
+	if (ret) {
+		DRM_ERROR("amdgpu: failed to schedule IB.\n");
+		goto err_ib_sched;
+	}
+
+	ret = dma_fence_wait(f, false);
+
+err_ib_sched:
+	dma_fence_put(f);
+	amdgpu_job_free(job);
+err:
+	return ret;
+}
+
 bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
 {
 	if (adev->kfd) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 5e5a9fb..6768c27 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -127,6 +127,10 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 
+int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
+				uint32_t vmid, uint64_t gpu_addr,
+				uint32_t *ib_cmd, uint32_t ib_len);
+
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 65783d1..7485c37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -217,6 +217,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
 	.invalidate_tlbs = invalidate_tlbs,
 	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
+	.submit_ib = amdgpu_amdkfd_submit_ib,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 1b5bf13..7be4534 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -177,6 +177,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.restore_process_bos = amdgpu_amdkfd_gpuvm_restore_process_bos,
 	.invalidate_tlbs = invalidate_tlbs,
 	.invalidate_tlbs_vmid = invalidate_tlbs_vmid,
+	.submit_ib = amdgpu_amdkfd_submit_ib,
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void)
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 5984fec..b7146e2 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -240,6 +240,10 @@ struct tile_config {
  *
  * @invalidate_tlbs_vmid: Invalidate TLBs for a specific VMID
  *
+ * @submit_ib: Submits an IB to the engine specified by inserting the
+ * IB to the corresonded ring (ring type). The IB is executed with the
+ * specified VMID in a user mode context.
+ *
  * This structure contains function pointers to services that the kgd driver
  * provides to amdkfd driver.
  *
@@ -352,6 +356,10 @@ struct kfd2kgd_calls {
 
 	int (*invalidate_tlbs)(struct kgd_dev *kgd, uint16_t pasid);
 	int (*invalidate_tlbs_vmid)(struct kgd_dev *kgd, uint16_t vmid);
+
+	int (*submit_ib)(struct kgd_dev *kgd, enum kgd_engine_type engine,
+			uint32_t vmid, uint64_t gpu_addr,
+			uint32_t *ib_cmd, uint32_t ib_len);
 };
 
 /**
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 11/25] drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
                     ` (14 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 43c89c5..a527c22 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -452,6 +452,7 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 	return pdd;
 }
 
+#if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
 /*
  * Bind processes do the device that have been temporarily unbound
  * (PDD_BOUND_SUSPENDED) in kfd_unbind_processes_from_device.
@@ -562,6 +563,7 @@ void kfd_process_iommu_unbind_callback(struct kfd_dev *dev, unsigned int pasid)
 
 	kfd_unref_process(p);
 }
+#endif /* CONFIG_AMD_IOMMU_V2 */
 
 struct kfd_process_device *kfd_get_first_process_device_data(
 						struct kfd_process *p)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 12/25] drm/amdkfd: Use per-device sched_policy
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 11/25] drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
                     ` (13 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This was missed in a previous commit that made the scheduler policy
a per-device setting.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 5f3d072..47d493e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1259,7 +1259,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 	}
 
 	dqm->dev = dev;
-	switch (sched_policy) {
+	switch (dqm->sched_policy) {
 	case KFD_SCHED_POLICY_HWS:
 	case KFD_SCHED_POLICY_HWS_NO_OVERSUBSCRIPTION:
 		/* initialize dqm for cp scheduling */
@@ -1296,7 +1296,7 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.process_termination = process_termination_nocpsch;
 		break;
 	default:
-		pr_err("Invalid scheduling policy %d\n", sched_policy);
+		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
 		goto out_free;
 	}
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 13/25] drm/amdkfd: Remove unaligned memory access
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
                     ` (12 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

From: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>

Unaligned atomic operations can cause problems on some CPU
architectures. Use simpler bitmask operations instead. Atomic bit
manipulations are not necessary since dqm->lock is held during these
operations.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 25 ++++++++--------------
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 47d493e..1a28dc2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -118,9 +118,8 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	if (dqm->vmid_bitmap == 0)
 		return -ENOMEM;
 
-	bit = find_first_bit((unsigned long *)&dqm->vmid_bitmap,
-				dqm->dev->vm_info.vmid_num_kfd);
-	clear_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+	bit = ffs(dqm->vmid_bitmap) - 1;
+	dqm->vmid_bitmap &= ~(1 << bit);
 
 	allocated_vmid = bit + dqm->dev->vm_info.first_vmid_kfd;
 	pr_debug("vmid allocation %d\n", allocated_vmid);
@@ -142,7 +141,7 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 	/* Release the vmid mapping */
 	set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
 
-	set_bit(bit, (unsigned long *)&dqm->vmid_bitmap);
+	dqm->vmid_bitmap |= (1 << bit);
 	qpd->vmid = 0;
 	q->properties.vmid = 0;
 }
@@ -223,12 +222,8 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
 			continue;
 
 		if (dqm->allocated_queues[pipe] != 0) {
-			bit = find_first_bit(
-				(unsigned long *)&dqm->allocated_queues[pipe],
-				get_queues_per_pipe(dqm));
-
-			clear_bit(bit,
-				(unsigned long *)&dqm->allocated_queues[pipe]);
+			bit = ffs(dqm->allocated_queues[pipe]) - 1;
+			dqm->allocated_queues[pipe] &= ~(1 << bit);
 			q->pipe = pipe;
 			q->queue = bit;
 			set = true;
@@ -249,7 +244,7 @@ static int allocate_hqd(struct device_queue_manager *dqm, struct queue *q)
 static inline void deallocate_hqd(struct device_queue_manager *dqm,
 				struct queue *q)
 {
-	set_bit(q->queue, (unsigned long *)&dqm->allocated_queues[q->pipe]);
+	dqm->allocated_queues[q->pipe] |= (1 << q->queue);
 }
 
 static int create_compute_queue_nocpsch(struct device_queue_manager *dqm,
@@ -589,10 +584,8 @@ static int allocate_sdma_queue(struct device_queue_manager *dqm,
 	if (dqm->sdma_bitmap == 0)
 		return -ENOMEM;
 
-	bit = find_first_bit((unsigned long *)&dqm->sdma_bitmap,
-				CIK_SDMA_QUEUES);
-
-	clear_bit(bit, (unsigned long *)&dqm->sdma_bitmap);
+	bit = ffs(dqm->sdma_bitmap) - 1;
+	dqm->sdma_bitmap &= ~(1 << bit);
 	*sdma_queue_id = bit;
 
 	return 0;
@@ -603,7 +596,7 @@ static void deallocate_sdma_queue(struct device_queue_manager *dqm,
 {
 	if (sdma_queue_id >= CIK_SDMA_QUEUES)
 		return;
-	set_bit(sdma_queue_id, (unsigned long *)&dqm->sdma_bitmap);
+	dqm->sdma_bitmap |= (1 << sdma_queue_id);
 }
 
 static int create_sdma_queue_nocpsch(struct device_queue_manager *dqm,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 14/25] drm/amdkfd: Populate DRM render device minor
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
                     ` (11 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, Felix Kuehling

From: Oak Zeng <Oak.Zeng@amd.com>

Populate DRM render device minor in kfd topology

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index f57c305..af77e42 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -440,6 +440,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 			dev->node_props.device_id);
 	sysfs_show_32bit_prop(buffer, "location_id",
 			dev->node_props.location_id);
+	sysfs_show_32bit_prop(buffer, "drm_render_minor",
+			dev->node_props.drm_render_minor);
 
 	if (dev->gpu) {
 		log_max_watch_addr =
@@ -1226,6 +1228,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
 	dev->node_props.max_engine_clk_ccompute =
 		cpufreq_quick_get_max(0) / 1000;
+	dev->node_props.drm_render_minor =
+		gpu->shared_resources.drm_render_minor;
 
 	kfd_fill_mem_clk_max_info(dev);
 	kfd_fill_iolink_non_crat_info(dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index 111fda2..be812bb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -71,6 +71,7 @@ struct kfd_node_properties {
 	uint32_t location_id;
 	uint32_t max_engine_clk_fcompute;
 	uint32_t max_engine_clk_ccompute;
+	int32_t  drm_render_minor;
 	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
                     ` (10 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Create/destroy the GPUVM context during PDD creation/destruction.
Get VM page table base and program it during process registration
(HWS) or VMID allocation (non-HWS).

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 20 +++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              | 13 +++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 33 ++++++++++++++++++++++
 3 files changed, 66 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 1a28dc2..b7d0639 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -129,6 +129,15 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	set_pasid_vmid_mapping(dqm, q->process->pasid, q->properties.vmid);
 	program_sh_mem_settings(dqm, qpd);
 
+	/* qpd->page_table_base is set earlier when register_process()
+	 * is called, i.e. when the first queue is created.
+	 */
+	dqm->dev->kfd2kgd->set_vm_context_page_table_base(dqm->dev->kgd,
+			qpd->vmid,
+			qpd->page_table_base);
+	/* invalidate the VM context after pasid and vmid mapping is set up */
+	kfd_flush_tlb(qpd_to_pdd(qpd));
+
 	return 0;
 }
 
@@ -138,6 +147,8 @@ static void deallocate_vmid(struct device_queue_manager *dqm,
 {
 	int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
 
+	kfd_flush_tlb(qpd_to_pdd(qpd));
+
 	/* Release the vmid mapping */
 	set_pasid_vmid_mapping(dqm, 0, qpd->vmid);
 
@@ -450,6 +461,8 @@ static int register_process(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
 	struct device_process_node *n;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
 	int retval;
 
 	n = kzalloc(sizeof(*n), GFP_KERNEL);
@@ -458,9 +471,16 @@ static int register_process(struct device_queue_manager *dqm,
 
 	n->qpd = qpd;
 
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
 	mutex_lock(&dqm->lock);
 	list_add(&n->list, &dqm->queues);
 
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+
 	retval = dqm->asic_ops.update_qpd(dqm, qpd);
 
 	dqm->processes_count++;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index daaa114..0687161 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -521,6 +521,9 @@ struct kfd_process_device {
 	/* Is this process/pasid bound to this device? (amd_iommu_bind_pasid) */
 	enum kfd_pdd_bound bound;
 
+	/* VM context for GPUVM allocations */
+	void *vm;
+
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
@@ -589,6 +592,14 @@ struct kfd_process {
 	size_t signal_mapped_size;
 	size_t signal_event_count;
 	bool signal_event_limit_reached;
+
+	/* Information used for memory eviction */
+	void *kgd_process_info;
+	/* Eviction fence that is attached to all the BOs of this process. The
+	 * fence will be triggered during eviction and new one will be created
+	 * during restore
+	 */
+	struct dma_fence *ef;
 };
 
 /**
@@ -805,6 +816,8 @@ int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint64_t *event_page_offset, uint32_t *event_slot_index);
 int kfd_event_destroy(struct kfd_process *p, uint32_t event_id);
 
+void kfd_flush_tlb(struct kfd_process_device *pdd);
+
 int dbgdev_wave_reset_wavefronts(struct kfd_dev *dev, struct kfd_process *p);
 
 /* Debugfs */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index a527c22..e82f4ac 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -34,6 +34,7 @@
 struct mm_struct;
 
 #include "kfd_priv.h"
+#include "kfd_device_queue_manager.h"
 #include "kfd_dbgmgr.h"
 
 /*
@@ -154,6 +155,10 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 		pr_debug("Releasing pdd (topology id %d) for process (pasid %d)\n",
 				pdd->dev->id, p->pasid);
 
+		if (pdd->vm)
+			pdd->dev->kfd2kgd->destroy_process_vm(
+				pdd->dev->kgd, pdd->vm);
+
 		list_del(&pdd->per_device_list);
 
 		if (pdd->qpd.cwsr_kaddr)
@@ -186,6 +191,7 @@ static void kfd_process_wq_release(struct work_struct *work)
 #endif
 
 	kfd_process_destroy_pdds(p);
+	dma_fence_put(p->ef);
 
 	kfd_event_free_process(p);
 
@@ -410,7 +416,18 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
+	/* Create the GPUVM context for this specific device */
+	if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
+					    &p->kgd_process_info, &p->ef)) {
+		pr_err("Failed to create process VM object\n");
+		goto err_create_pdd;
+	}
 	return pdd;
+
+err_create_pdd:
+	list_del(&pdd->per_device_list);
+	kfree(pdd);
+	return NULL;
 }
 
 /*
@@ -642,6 +659,22 @@ int kfd_reserved_mem_mmap(struct kfd_process *process,
 			       KFD_CWSR_TBA_TMA_SIZE, vma->vm_page_prot);
 }
 
+void kfd_flush_tlb(struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+	const struct kfd2kgd_calls *f2g = dev->kfd2kgd;
+
+	if (dev->dqm->sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+		/* Nothing to flush until a VMID is assigned, which
+		 * only happens when the first queue is created.
+		 */
+		if (pdd->qpd.vmid)
+			f2g->invalidate_tlbs_vmid(dev->kgd, pdd->qpd.vmid);
+	} else {
+		f2g->invalidate_tlbs(pdd->dev->kgd, pdd->process->pasid);
+	}
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 int kfd_debugfs_mqds_by_process(struct seq_file *m, void *data)
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
                     ` (9 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

When the TTM memory manager in KGD evicts BOs, all user mode queues
potentially accessing these BOs must be evicted temporarily. Once
user mode queues are evicted, the eviction fence is signaled,
allowing the migration of the BO to proceed.

A delayed worker is scheduled to restore all the BOs belonging to
the evicted process and restart its queues.

During suspend/resume of the GPU we also evict all processes to allow
KGD to save BOs in system memory, since VRAM will be lost.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  65 +++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 219 ++++++++++++++++++++-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |   9 +
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   2 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  32 ++-
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 213 ++++++++++++++++++++
 6 files changed, 537 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 612afaf..9299a91 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -32,6 +32,7 @@
 #include "cwsr_trap_handler_gfx8.asm"
 
 #define MQD_SIZE_ALIGNED 768
+static atomic_t kfd_device_suspended = ATOMIC_INIT(0);
 
 #if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
 static const struct kfd_device_info kaveri_device_info = {
@@ -545,6 +546,10 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 	if (!kfd->init_complete)
 		return;
 
+	/* For first KFD device suspend all the KFD processes */
+	if (atomic_inc_return(&kfd_device_suspended) == 1)
+		kfd_suspend_all_processes();
+
 	kfd->dqm->ops.stop(kfd->dqm);
 
 #if defined(CONFIG_AMD_IOMMU_V2_MODULE) || defined(CONFIG_AMD_IOMMU_V2)
@@ -561,11 +566,21 @@ void kgd2kfd_suspend(struct kfd_dev *kfd)
 
 int kgd2kfd_resume(struct kfd_dev *kfd)
 {
+	int ret, count;
+
 	if (!kfd->init_complete)
 		return 0;
 
-	return kfd_resume(kfd);
+	ret = kfd_resume(kfd);
+	if (ret)
+		return ret;
+
+	count = atomic_dec_return(&kfd_device_suspended);
+	WARN_ONCE(count < 0, "KFD suspend / resume ref. error");
+	if (count == 0)
+		ret = kfd_resume_all_processes();
 
+	return ret;
 }
 
 static int kfd_resume(struct kfd_dev *kfd)
@@ -625,6 +640,54 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	spin_unlock(&kfd->interrupt_lock);
 }
 
+/** kgd2kfd_schedule_evict_and_restore_process - Schedules work queue that will
+ *   prepare for safe eviction of KFD BOs that belong to the specified
+ *   process.
+ *
+ * @mm: mm_struct that identifies the specified KFD process
+ * @fence: eviction fence attached to KFD process BOs
+ *
+ */
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence)
+{
+	struct kfd_process *p;
+	unsigned long active_time;
+	unsigned long delay_jiffies = msecs_to_jiffies(PROCESS_ACTIVE_TIME_MS);
+
+	if (!fence)
+		return -EINVAL;
+
+	if (dma_fence_is_signaled(fence))
+		return 0;
+
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return -ENODEV;
+
+	if (fence->seqno == p->last_eviction_seqno)
+		goto out;
+
+	p->last_eviction_seqno = fence->seqno;
+
+	/* Avoid KFD process starvation. Wait for at least
+	 * PROCESS_ACTIVE_TIME_MS before evicting the process again
+	 */
+	active_time = get_jiffies_64() - p->last_restore_timestamp;
+	if (delay_jiffies > active_time)
+		delay_jiffies -= active_time;
+	else
+		delay_jiffies = 0;
+
+	/* During process initialization eviction_work.dwork is initialized
+	 * to kfd_evict_bo_worker
+	 */
+	schedule_delayed_work(&p->eviction_work, delay_jiffies);
+out:
+	kfd_unref_process(p);
+	return 0;
+}
+
 static int kfd_gtt_sa_init(struct kfd_dev *kfd, unsigned int buf_size,
 				unsigned int chunk_size)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b7d0639..b3b6dab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -21,10 +21,11 @@
  *
  */
 
+#include <linux/ratelimit.h>
+#include <linux/printk.h>
 #include <linux/slab.h>
 #include <linux/list.h>
 #include <linux/types.h>
-#include <linux/printk.h>
 #include <linux/bitops.h>
 #include <linux/sched.h>
 #include "kfd_priv.h"
@@ -180,6 +181,14 @@ static int create_queue_nocpsch(struct device_queue_manager *dqm,
 			goto out_unlock;
 	}
 	q->properties.vmid = qpd->vmid;
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	q->properties.tba_addr = qpd->tba_addr;
 	q->properties.tma_addr = qpd->tma_addr;
@@ -377,15 +386,29 @@ static int update_queue(struct device_queue_manager *dqm, struct queue *q)
 {
 	int retval;
 	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
 	bool prev_active = false;
 
 	mutex_lock(&dqm->lock);
+	pdd = kfd_get_process_device_data(q->device, q->process);
+	if (!pdd) {
+		retval = -ENODEV;
+		goto out_unlock;
+	}
 	mqd = dqm->ops.get_mqd_manager(dqm,
 			get_mqd_type_from_queue_type(q->properties.type));
 	if (!mqd) {
 		retval = -ENOMEM;
 		goto out_unlock;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (pdd->qpd.evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	/* Save previous activity state for counters */
 	prev_active = q->properties.is_active;
@@ -457,6 +480,187 @@ static struct mqd_manager *get_mqd_manager(
 	return mqd;
 }
 
+static int evict_process_queues_nocpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot evict queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		retval = mqd->destroy_mqd(mqd, q->mqd,
+				KFD_PREEMPT_TYPE_WAVEFRONT_DRAIN,
+				KFD_UNMAP_LATENCY_MS, q->pipe, q->queue);
+		if (retval)
+			goto out;
+		dqm->queue_count--;
+	}
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int evict_process_queues_cpsch(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	int retval = 0;
+
+	mutex_lock(&dqm->lock);
+	if (qpd->evicted++ > 0) /* already evicted, do nothing */
+		goto out;
+
+	pdd = qpd_to_pdd(qpd);
+	pr_info_ratelimited("Evicting PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* unactivate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_active)
+			continue;
+		q->properties.is_evicted = true;
+		q->properties.is_active = false;
+		dqm->queue_count--;
+	}
+	retval = execute_queues_cpsch(dqm,
+				qpd->is_debug ?
+				KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES :
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_nocpsch(struct device_queue_manager *dqm,
+					  struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct mqd_manager *mqd;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	if (!list_empty(&qpd->queues_list)) {
+		dqm->dev->kfd2kgd->set_vm_context_page_table_base(
+				dqm->dev->kgd,
+				qpd->vmid,
+				qpd->page_table_base);
+		kfd_flush_tlb(pdd);
+	}
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		mqd = dqm->ops.get_mqd_manager(dqm,
+			get_mqd_type_from_queue_type(q->properties.type));
+		if (!mqd) { /* should not be here */
+			pr_err("Cannot restore queue, mqd mgr is NULL\n");
+			retval = -ENOMEM;
+			goto out;
+		}
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		retval = mqd->load_mqd(mqd, q->mqd, q->pipe,
+				       q->queue, &q->properties,
+				       q->process->mm);
+		if (retval)
+			goto out;
+		dqm->queue_count++;
+	}
+	qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
+static int restore_process_queues_cpsch(struct device_queue_manager *dqm,
+					struct qcm_process_device *qpd)
+{
+	struct queue *q;
+	struct kfd_process_device *pdd;
+	uint32_t pd_base;
+	int retval = 0;
+
+	pdd = qpd_to_pdd(qpd);
+	/* Retrieve PD base */
+	pd_base = dqm->dev->kfd2kgd->get_process_page_dir(pdd->vm);
+
+	mutex_lock(&dqm->lock);
+	if (WARN_ON_ONCE(!qpd->evicted)) /* already restored, do nothing */
+		goto out;
+	if (qpd->evicted > 1) { /* ref count still > 0, decrement & quit */
+		qpd->evicted--;
+		goto out;
+	}
+
+	pr_info_ratelimited("Restoring PASID %u queues\n",
+			    pdd->process->pasid);
+
+	/* Update PD Base in QPD */
+	qpd->page_table_base = pd_base;
+	pr_debug("Updated PD address to 0x%08x\n", pd_base);
+
+	/* activate all active queues on the qpd */
+	list_for_each_entry(q, &qpd->queues_list, list) {
+		if (!q->properties.is_evicted)
+			continue;
+		q->properties.is_evicted = false;
+		q->properties.is_active = true;
+		dqm->queue_count++;
+	}
+	retval = execute_queues_cpsch(dqm,
+				KFD_UNMAP_QUEUES_FILTER_DYNAMIC_QUEUES, 0);
+	if (!retval)
+		qpd->evicted = 0;
+out:
+	mutex_unlock(&dqm->lock);
+	return retval;
+}
+
 static int register_process(struct device_queue_manager *dqm,
 					struct qcm_process_device *qpd)
 {
@@ -853,6 +1057,14 @@ static int create_queue_cpsch(struct device_queue_manager *dqm, struct queue *q,
 		retval = -ENOMEM;
 		goto out;
 	}
+	/*
+	 * Eviction state logic: we only mark active queues as evicted
+	 * to avoid the overhead of restoring inactive queues later
+	 */
+	if (qpd->evicted)
+		q->properties.is_evicted = (q->properties.queue_size > 0 &&
+					    q->properties.queue_percent > 0 &&
+					    q->properties.queue_address != 0);
 
 	dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
 
@@ -1291,6 +1503,8 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_cpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_cpsch;
+		dqm->ops.restore_process_queues = restore_process_queues_cpsch;
 		break;
 	case KFD_SCHED_POLICY_NO_HWS:
 		/* initialize dqm for no cp scheduling */
@@ -1307,6 +1521,9 @@ struct device_queue_manager *device_queue_manager_init(struct kfd_dev *dev)
 		dqm->ops.set_cache_memory_policy = set_cache_memory_policy;
 		dqm->ops.set_trap_handler = set_trap_handler;
 		dqm->ops.process_termination = process_termination_nocpsch;
+		dqm->ops.evict_process_queues = evict_process_queues_nocpsch;
+		dqm->ops.restore_process_queues =
+			restore_process_queues_nocpsch;
 		break;
 	default:
 		pr_err("Invalid scheduling policy %d\n", dqm->sched_policy);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index 68be0aa..412beff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -79,6 +79,10 @@ struct device_process_node {
  *
  * @process_termination: Clears all process queues belongs to that device.
  *
+ * @evict_process_queues: Evict all active queues of a process
+ *
+ * @restore_process_queues: Restore all evicted queues queues of a process
+ *
  */
 
 struct device_queue_manager_ops {
@@ -129,6 +133,11 @@ struct device_queue_manager_ops {
 
 	int (*process_termination)(struct device_queue_manager *dqm,
 			struct qcm_process_device *qpd);
+
+	int (*evict_process_queues)(struct device_queue_manager *dqm,
+				    struct qcm_process_device *qpd);
+	int (*restore_process_queues)(struct device_queue_manager *dqm,
+				      struct qcm_process_device *qpd);
 };
 
 struct device_queue_manager_asic_ops {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 3ac72be..65574c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -43,6 +43,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.interrupt	= kgd2kfd_interrupt,
 	.suspend	= kgd2kfd_suspend,
 	.resume		= kgd2kfd_resume,
+	.schedule_evict_and_restore_process =
+			  kgd2kfd_schedule_evict_and_restore_process,
 };
 
 int sched_policy = KFD_SCHED_POLICY_HWS;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0687161..785161e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -335,7 +335,11 @@ enum kfd_queue_format {
  * @is_interop: Defines if this is a interop queue. Interop queue means that
  * the queue can access both graphics and compute resources.
  *
- * @is_active: Defines if the queue is active or not.
+ * @is_evicted: Defines if the queue is evicted. Only active queues
+ * are evicted, rendering them inactive.
+ *
+ * @is_active: Defines if the queue is active or not. @is_active and
+ * @is_evicted are protected by the DQM lock.
  *
  * @vmid: If the scheduling mode is no cp scheduling the field defines the vmid
  * of the queue.
@@ -357,6 +361,7 @@ struct queue_properties {
 	uint32_t __iomem *doorbell_ptr;
 	uint32_t doorbell_off;
 	bool is_interop;
+	bool is_evicted;
 	bool is_active;
 	/* Not relevant for user mode queues in cp scheduling */
 	unsigned int vmid;
@@ -460,6 +465,7 @@ struct qcm_process_device {
 	unsigned int queue_count;
 	unsigned int vmid;
 	bool is_debug;
+	unsigned int evicted; /* eviction counter, 0=active */
 
 	/* This flag tells if we should reset all wavefronts on
 	 * process termination
@@ -486,6 +492,17 @@ struct qcm_process_device {
 	uint64_t tma_addr;
 };
 
+/* KFD Memory Eviction */
+
+/* Approx. wait time before attempting to restore evicted BOs */
+#define PROCESS_RESTORE_TIME_MS 100
+/* Approx. back off time if restore fails due to lack of memory */
+#define PROCESS_BACK_OFF_TIME_MS 100
+/* Approx. time before evicting the process again */
+#define PROCESS_ACTIVE_TIME_MS 10
+
+int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
+					       struct dma_fence *fence);
 
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
@@ -600,6 +617,16 @@ struct kfd_process {
 	 * during restore
 	 */
 	struct dma_fence *ef;
+
+	/* Work items for evicting and restoring BOs */
+	struct delayed_work eviction_work;
+	struct delayed_work restore_work;
+	/* seqno of the last scheduled eviction */
+	unsigned int last_eviction_seqno;
+	/* Approx. the last timestamp (in jiffies) when the process was
+	 * restored after an eviction
+	 */
+	unsigned long last_restore_timestamp;
 };
 
 /**
@@ -625,7 +652,10 @@ void kfd_process_destroy_wq(void);
 struct kfd_process *kfd_create_process(struct file *filep);
 struct kfd_process *kfd_get_process(const struct task_struct *);
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
 void kfd_unref_process(struct kfd_process *p);
+void kfd_suspend_all_processes(void);
+int kfd_resume_all_processes(void);
 
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index e82f4ac..7eeadfe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -55,6 +55,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 					struct file *filep);
 static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep);
 
+static void evict_process_worker(struct work_struct *work);
+static void restore_process_worker(struct work_struct *work);
+
 
 void kfd_process_create_wq(void)
 {
@@ -239,6 +242,9 @@ static void kfd_process_notifier_release(struct mmu_notifier *mn,
 	mutex_unlock(&kfd_processes_mutex);
 	synchronize_srcu(&kfd_processes_srcu);
 
+	cancel_delayed_work_sync(&p->eviction_work);
+	cancel_delayed_work_sync(&p->restore_work);
+
 	mutex_lock(&p->mutex);
 
 	/* Iterate over all process device data structures and if the
@@ -360,6 +366,10 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	if (err != 0)
 		goto err_init_apertures;
 
+	INIT_DELAYED_WORK(&process->eviction_work, evict_process_worker);
+	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
+	process->last_restore_timestamp = get_jiffies_64();
+
 	err = kfd_process_init_cwsr(process, filep);
 	if (err)
 		goto err_init_cwsr;
@@ -411,6 +421,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	INIT_LIST_HEAD(&pdd->qpd.priv_queue_list);
 	pdd->qpd.dqm = dev->dqm;
 	pdd->qpd.pqm = &p->pqm;
+	pdd->qpd.evicted = 0;
 	pdd->process = p;
 	pdd->bound = PDD_UNBOUND;
 	pdd->already_dequeued = false;
@@ -625,6 +636,208 @@ struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 	return ret_p;
 }
 
+/* This increments the process->ref counter. */
+struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
+{
+	struct kfd_process *p;
+
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	p = find_process_by_mm(mm);
+	if (p)
+		kref_get(&p->ref);
+
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+
+	return p;
+}
+
+/* process_evict_queues - Evict all user queues of a process
+ *
+ * Eviction is reference-counted per process-device. This means multiple
+ * evictions from different sources can be nested safely.
+ */
+static int process_evict_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r = 0;
+	unsigned int n_evicted = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.evict_process_queues(pdd->dev->dqm,
+							    &pdd->qpd);
+		if (r) {
+			pr_err("Failed to evict process queues\n");
+			goto fail;
+		}
+		n_evicted++;
+	}
+
+	return r;
+
+fail:
+	/* To keep state consistent, roll back partial eviction by
+	 * restoring queues
+	 */
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		if (n_evicted == 0)
+			break;
+		if (pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd))
+			pr_err("Failed to restore queues\n");
+
+		n_evicted--;
+	}
+
+	return r;
+}
+
+/* process_restore_queues - Restore all user queues of a process */
+static  int process_restore_queues(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+	int r, ret = 0;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		r = pdd->dev->dqm->ops.restore_process_queues(pdd->dev->dqm,
+							      &pdd->qpd);
+		if (r) {
+			pr_err("Failed to restore process queues\n");
+			if (!ret)
+				ret = r;
+		}
+	}
+
+	return ret;
+}
+
+static void evict_process_worker(struct work_struct *work)
+{
+	int ret;
+	struct kfd_process *p;
+	struct delayed_work *dwork;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, eviction_work);
+	WARN_ONCE(p->last_eviction_seqno != p->ef->seqno,
+		  "Eviction fence mismatch\n");
+
+	/* Narrow window of overlap between restore and evict work
+	 * item is possible. Once amdgpu_amdkfd_gpuvm_restore_process_bos
+	 * unreserves KFD BOs, it is possible to evicted again. But
+	 * restore has few more steps of finish. So lets wait for any
+	 * previous restore work to complete
+	 */
+	flush_delayed_work(&p->restore_work);
+
+	pr_debug("Started evicting pasid %d\n", p->pasid);
+	ret = process_evict_queues(p);
+	if (!ret) {
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+		schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_RESTORE_TIME_MS));
+
+		pr_debug("Finished evicting pasid %d\n", p->pasid);
+	} else
+		pr_err("Failed to evict queues of pasid %d\n", p->pasid);
+}
+
+static void restore_process_worker(struct work_struct *work)
+{
+	struct delayed_work *dwork;
+	struct kfd_process *p;
+	struct kfd_process_device *pdd;
+	int ret = 0;
+
+	dwork = to_delayed_work(work);
+
+	/* Process termination destroys this worker thread. So during the
+	 * lifetime of this thread, kfd_process p will be valid
+	 */
+	p = container_of(dwork, struct kfd_process, restore_work);
+
+	/* Call restore_process_bos on the first KGD device. This function
+	 * takes care of restoring the whole process including other devices.
+	 * Restore can fail if enough memory is not available. If so,
+	 * reschedule again.
+	 */
+	pdd = list_first_entry(&p->per_device_data,
+			       struct kfd_process_device,
+			       per_device_list);
+
+	pr_debug("Started restoring pasid %d\n", p->pasid);
+
+	/* Setting last_restore_timestamp before successful restoration.
+	 * Otherwise this would have to be set by KGD (restore_process_bos)
+	 * before KFD BOs are unreserved. If not, the process can be evicted
+	 * again before the timestamp is set.
+	 * If restore fails, the timestamp will be set again in the next
+	 * attempt. This would mean that the minimum GPU quanta would be
+	 * PROCESS_ACTIVE_TIME_MS - (time to execute the following two
+	 * functions)
+	 */
+
+	p->last_restore_timestamp = get_jiffies_64();
+	ret = pdd->dev->kfd2kgd->restore_process_bos(p->kgd_process_info,
+						     &p->ef);
+	if (ret) {
+		pr_debug("Failed to restore BOs of pasid %d, retry after %d ms\n",
+			 p->pasid, PROCESS_BACK_OFF_TIME_MS);
+		ret = schedule_delayed_work(&p->restore_work,
+				msecs_to_jiffies(PROCESS_BACK_OFF_TIME_MS));
+		WARN(!ret, "reschedule restore work failed\n");
+		return;
+	}
+
+	ret = process_restore_queues(p);
+	if (!ret)
+		pr_debug("Finished restoring pasid %d\n", p->pasid);
+	else
+		pr_err("Failed to restore queues of pasid %d\n", p->pasid);
+}
+
+void kfd_suspend_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		cancel_delayed_work_sync(&p->eviction_work);
+		cancel_delayed_work_sync(&p->restore_work);
+
+		if (process_evict_queues(p))
+			pr_err("Failed to suspend process %d\n", p->pasid);
+		dma_fence_signal(p->ef);
+		dma_fence_put(p->ef);
+		p->ef = NULL;
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+}
+
+int kfd_resume_all_processes(void)
+{
+	struct kfd_process *p;
+	unsigned int temp;
+	int ret = 0, idx = srcu_read_lock(&kfd_processes_srcu);
+
+	hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
+		if (!schedule_delayed_work(&p->restore_work, 0)) {
+			pr_err("Restore process %d failed during resume\n",
+			       p->pasid);
+			ret = -EFAULT;
+		}
+	}
+	srcu_read_unlock(&kfd_processes_srcu, idx);
+	return ret;
+}
+
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 17/25] uapi: Fix type used in ioctl parameter structures
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
                     ` (8 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Use __u32 and __u64 instead of POSIX types that may not be defined
in user mode builds.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 include/uapi/linux/kfd_ioctl.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f4cab5b..111d73b 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -263,10 +263,10 @@ struct kfd_ioctl_get_tile_config_args {
 };
 
 struct kfd_ioctl_set_trap_handler_args {
-	uint64_t tba_addr;		/* to KFD */
-	uint64_t tma_addr;		/* to KFD */
-	uint32_t gpu_id;		/* to KFD */
-	uint32_t pad;
+	__u64 tba_addr;		/* to KFD */
+	__u64 tma_addr;		/* to KFD */
+	__u32 gpu_id;		/* to KFD */
+	__u32 pad;
 };
 
 #define AMDKFD_IOCTL_BASE 'K'
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
                     ` (7 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.

Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.

This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c     | 94 ++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 22 +++----
 include/uapi/linux/kfd_ioctl.h               | 27 +++++++-
 3 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6fe2496..7d40094 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -825,6 +825,97 @@ static int kfd_ioctl_get_process_apertures(struct file *filp,
 	return 0;
 }
 
+static int kfd_ioctl_get_process_apertures_new(struct file *filp,
+				struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_get_process_apertures_new_args *args = data;
+	struct kfd_process_device_apertures *pa;
+	struct kfd_process_device *pdd;
+	uint32_t nodes = 0;
+	int ret;
+
+	dev_dbg(kfd_device, "get apertures for PASID %d", p->pasid);
+
+	if (args->num_of_nodes == 0) {
+		/* Return number of nodes, so that user space can alloacate
+		 * sufficient memory
+		 */
+		mutex_lock(&p->mutex);
+
+		if (!kfd_has_process_device_data(p))
+			goto out_unlock;
+
+		/* Run over all pdd of the process */
+		pdd = kfd_get_first_process_device_data(p);
+		do {
+			args->num_of_nodes++;
+			pdd = kfd_get_next_process_device_data(p, pdd);
+		} while (pdd);
+
+		goto out_unlock;
+	}
+
+	/* Fill in process-aperture information for all available
+	 * nodes, but not more than args->num_of_nodes as that is
+	 * the amount of memory allocated by user
+	 */
+	pa = kzalloc((sizeof(struct kfd_process_device_apertures) *
+				args->num_of_nodes), GFP_KERNEL);
+	if (!pa)
+		return -ENOMEM;
+
+	mutex_lock(&p->mutex);
+
+	if (!kfd_has_process_device_data(p)) {
+		args->num_of_nodes = 0;
+		kfree(pa);
+		goto out_unlock;
+	}
+
+	/* Run over all pdd of the process */
+	pdd = kfd_get_first_process_device_data(p);
+	do {
+		pa[nodes].gpu_id = pdd->dev->id;
+		pa[nodes].lds_base = pdd->lds_base;
+		pa[nodes].lds_limit = pdd->lds_limit;
+		pa[nodes].gpuvm_base = pdd->gpuvm_base;
+		pa[nodes].gpuvm_limit = pdd->gpuvm_limit;
+		pa[nodes].scratch_base = pdd->scratch_base;
+		pa[nodes].scratch_limit = pdd->scratch_limit;
+
+		dev_dbg(kfd_device,
+			"gpu id %u\n", pdd->dev->id);
+		dev_dbg(kfd_device,
+			"lds_base %llX\n", pdd->lds_base);
+		dev_dbg(kfd_device,
+			"lds_limit %llX\n", pdd->lds_limit);
+		dev_dbg(kfd_device,
+			"gpuvm_base %llX\n", pdd->gpuvm_base);
+		dev_dbg(kfd_device,
+			"gpuvm_limit %llX\n", pdd->gpuvm_limit);
+		dev_dbg(kfd_device,
+			"scratch_base %llX\n", pdd->scratch_base);
+		dev_dbg(kfd_device,
+			"scratch_limit %llX\n", pdd->scratch_limit);
+		nodes++;
+
+		pdd = kfd_get_next_process_device_data(p, pdd);
+	} while (pdd && (nodes < args->num_of_nodes));
+	mutex_unlock(&p->mutex);
+
+	args->num_of_nodes = nodes;
+	ret = copy_to_user(
+			(void __user *)args->kfd_process_device_apertures_ptr,
+			pa,
+			(nodes * sizeof(struct kfd_process_device_apertures)));
+	kfree(pa);
+	return ret ? -EFAULT : 0;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return 0;
+}
+
 static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 					void *data)
 {
@@ -1017,6 +1108,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_TRAP_HANDLER,
 			kfd_ioctl_set_trap_handler, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
+			kfd_ioctl_get_process_apertures_new, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 7377513..a06b010 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -282,14 +282,14 @@
 	(((uint64_t)(base) & \
 		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
 
-#define MAKE_SCRATCH_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x100000000L)
+#define MAKE_SCRATCH_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
 
 #define MAKE_SCRATCH_APP_LIMIT(base) \
 	(((uint64_t)base & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
-#define MAKE_LDS_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x0)
+#define MAKE_LDS_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x0)
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
@@ -314,7 +314,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			return -1;
 		}
 		/*
-		 * For 64 bit process aperture will be statically reserved in
+		 * For 64 bit process apertures will be statically reserved in
 		 * the x86_64 non canonical process address space
 		 * amdkfd doesn't currently support apertures for 32 bit process
 		 */
@@ -323,12 +323,11 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_base = pdd->gpuvm_limit = 0;
 			pdd->scratch_base = pdd->scratch_limit = 0;
 		} else {
-			/*
-			 * node id couldn't be 0 - the three MSB bits of
-			 * aperture shoudn't be 0
+			/* Same LDS and scratch apertures can be used
+			 * on all GPUs. This allows using more dGPUs
+			 * than placement options for apertures.
 			 */
-			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
-
+			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
 			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
@@ -336,8 +335,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_limit =
 					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
 
-			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
-
+			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 		}
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 111d73b..5201437 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -107,8 +107,6 @@ struct kfd_ioctl_get_clock_counters_args {
 	__u32 pad;
 };
 
-#define NUM_OF_SUPPORTED_GPUS 7
-
 struct kfd_process_device_apertures {
 	__u64 lds_base;		/* from KFD */
 	__u64 lds_limit;		/* from KFD */
@@ -120,6 +118,12 @@ struct kfd_process_device_apertures {
 	__u32 pad;
 };
 
+/*
+ * AMDKFD_IOC_GET_PROCESS_APERTURES is deprecated. Use
+ * AMDKFD_IOC_GET_PROCESS_APERTURES_NEW instead, which supports an
+ * unlimited number of GPUs.
+ */
+#define NUM_OF_SUPPORTED_GPUS 7
 struct kfd_ioctl_get_process_apertures_args {
 	struct kfd_process_device_apertures
 			process_apertures[NUM_OF_SUPPORTED_GPUS];/* from KFD */
@@ -129,6 +133,19 @@ struct kfd_ioctl_get_process_apertures_args {
 	__u32 pad;
 };
 
+struct kfd_ioctl_get_process_apertures_new_args {
+	/* User allocated. Pointer to struct kfd_process_device_apertures
+	 * filled in by Kernel
+	 */
+	__u64 kfd_process_device_apertures_ptr;
+	/* to KFD - indicates amount of memory present in
+	 *  kfd_process_device_apertures_ptr
+	 * from KFD - Number of entries filled by KFD.
+	 */
+	__u32 num_of_nodes;
+	__u32 pad;
+};
+
 #define MAX_ALLOWED_NUM_POINTS    100
 #define MAX_ALLOWED_AW_BUFF_SIZE 4096
 #define MAX_ALLOWED_WAC_BUFF_SIZE  128
@@ -332,7 +349,11 @@ struct kfd_ioctl_set_trap_handler_args {
 #define AMDKFD_IOC_SET_TRAP_HANDLER		\
 		AMDKFD_IOW(0x13, struct kfd_ioctl_set_trap_handler_args)
 
+#define AMDKFD_IOC_GET_PROCESS_APERTURES_NEW	\
+		AMDKFD_IOWR(0x14,		\
+			struct kfd_ioctl_get_process_apertures_new_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x14
+#define AMDKFD_COMMAND_END		0x15
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
                     ` (6 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Set up the GPUVM aperture for SVM (shared virtual memory) that allows
sharing a part of virtual address space between GPUs and CPUs.

Report the size of the the GPUVM size supported by KGD accurately.

The low part of the GPUVM aperture is reserved for kernel use. This is
for kernel-allocated buffers that are only accessed on the GPU:
- CWSR trap handler
- IB for submitting commands in user-mode context from kernel mode

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 37 ++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h        |  4 +++
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index a06b010..66852de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -278,9 +278,8 @@
 #define MAKE_GPUVM_APP_BASE(gpu_num) \
 	(((uint64_t)(gpu_num) << 61) + 0x1000000000000L)
 
-#define MAKE_GPUVM_APP_LIMIT(base) \
-	(((uint64_t)(base) & \
-		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
+#define MAKE_GPUVM_APP_LIMIT(base, size) \
+	(((uint64_t)(base) & 0xFFFFFF0000000000UL) + (size) - 1)
 
 #define MAKE_SCRATCH_APP_BASE() \
 	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
@@ -293,6 +292,14 @@
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
+/* User mode manages most of the SVM aperture address space. The low
+ * 16MB are reserved for kernel use (CWSR trap handler and kernel IB
+ * for now).
+ */
+#define SVM_USER_BASE 0x1000000ull
+#define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE)
+#define SVM_IB_BASE   (SVM_CWSR_BASE - PAGE_SIZE)
+
 int kfd_init_apertures(struct kfd_process *process)
 {
 	uint8_t id  = 0;
@@ -330,14 +337,28 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
-			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
-
-			pdd->gpuvm_limit =
-					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
-
 			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+
+			if (dev->device_info->needs_iommu_device) {
+				/* APUs: GPUVM aperture in
+				 * non-canonical address space
+				 */
+				pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
+				pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(
+					pdd->gpuvm_base,
+					dev->shared_resources.gpuvm_size);
+			} else {
+				/* dGPUs: SVM aperture starting at 0
+				 * with small reserved space for kernel
+				 */
+				pdd->gpuvm_base = SVM_USER_BASE;
+				pdd->gpuvm_limit =
+					dev->shared_resources.gpuvm_size - 1;
+				pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+				pdd->qpd.ib_base = SVM_IB_BASE;
+			}
 		}
 
 		dev_dbg(kfd_device, "node id %u\n", id);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 785161e..4e5adda 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -488,8 +488,12 @@ struct qcm_process_device {
 
 	/* CWSR memory */
 	void *cwsr_kaddr;
+	uint64_t cwsr_base;
 	uint64_t tba_addr;
 	uint64_t tma_addr;
+
+	/* IB memory */
+	uint64_t ib_base;
 };
 
 /* KFD Memory Eviction */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
                     ` (5 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Also used for cleaning up on process termination.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 11 ++++++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 66 ++++++++++++++++++++++++++++++++
 2 files changed, 77 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 4e5adda..78200ba 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -545,6 +545,9 @@ struct kfd_process_device {
 	/* VM context for GPUVM allocations */
 	void *vm;
 
+	/* GPUVM allocations storage */
+	struct idr alloc_idr;
+
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
@@ -676,6 +679,14 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma);
 
+/* KFD process API for creating and translating handles */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem);
+void *kfd_process_device_translate_handle(struct kfd_process_device *p,
+					int handle);
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle);
+
 /* Process device data iterator */
 struct kfd_process_device *kfd_get_first_process_device_data(
 							struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 7eeadfe..8584f4a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -149,6 +149,32 @@ void kfd_unref_process(struct kfd_process *p)
 	kref_put(&p->ref, kfd_process_ref_release);
 }
 
+static void kfd_process_free_outstanding_kfd_bos(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	int id;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
+		/*
+		 * Remove all handles from idr and release appropriate
+		 * local memory object
+		 */
+		idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+			list_for_each_entry(peer_pdd, &p->per_device_data,
+					per_device_list) {
+				peer_pdd->dev->kfd2kgd->unmap_memory_to_gpu(
+						peer_pdd->dev->kgd,
+						mem, peer_pdd->vm);
+			}
+
+			pdd->dev->kfd2kgd->free_memory_of_gpu(
+					pdd->dev->kgd, mem);
+			kfd_process_device_remove_obj_handle(pdd, id);
+		}
+	}
+}
+
 static void kfd_process_destroy_pdds(struct kfd_process *p)
 {
 	struct kfd_process_device *pdd, *temp;
@@ -168,6 +194,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
+		idr_destroy(&pdd->alloc_idr);
+
 		kfree(pdd);
 	}
 }
@@ -193,6 +221,8 @@ static void kfd_process_wq_release(struct work_struct *work)
 	}
 #endif
 
+	kfd_process_free_outstanding_kfd_bos(p);
+
 	kfd_process_destroy_pdds(p);
 	dma_fence_put(p->ef);
 
@@ -377,6 +407,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return process;
 
 err_init_cwsr:
+	kfd_process_free_outstanding_kfd_bos(process);
 	kfd_process_destroy_pdds(process);
 err_init_apertures:
 	pqm_uninit(&process->pqm);
@@ -427,6 +458,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
+	/* Init idr used for memory handle translation */
+	idr_init(&pdd->alloc_idr);
+
 	/* Create the GPUVM context for this specific device */
 	if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
 					    &p->kgd_process_info, &p->ef)) {
@@ -436,6 +470,7 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	return pdd;
 
 err_create_pdd:
+	idr_destroy(&pdd->alloc_idr);
 	list_del(&pdd->per_device_list);
 	kfree(pdd);
 	return NULL;
@@ -615,6 +650,37 @@ bool kfd_has_process_device_data(struct kfd_process *p)
 	return !(list_empty(&p->per_device_data));
 }
 
+/* Create specific handle mapped to mem from process local memory idr
+ * Assumes that the process lock is held.
+ */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem)
+{
+	return idr_alloc(&pdd->alloc_idr, mem, 0, 0, GFP_KERNEL);
+}
+
+/* Translate specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void *kfd_process_device_translate_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle < 0)
+		return NULL;
+
+	return idr_find(&pdd->alloc_idr, handle);
+}
+
+/* Remove specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle >= 0)
+		idr_remove(&pdd->alloc_idr, handle);
+}
+
 /* This increments the process->ref counter. */
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
                     ` (4 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Add helpers for allocating GPUVM memory in kernel mode and use them
to allocate memory for the CWSR trap handler.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 125 +++++++++++++++++++++++++++----
 1 file changed, 112 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 8584f4a..12101fb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -73,6 +73,84 @@ void kfd_process_destroy_wq(void)
 	}
 }
 
+static void kfd_process_free_gpuvm(struct kgd_mem *mem,
+			struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+
+	dev->kfd2kgd->unmap_memory_to_gpu(dev->kgd, mem, pdd->vm);
+	pdd->dev->kfd2kgd->free_memory_of_gpu(pdd->dev->kgd, mem);
+}
+
+/* kfd_process_alloc_gpuvm - Allocate GPU VM for the KFD process
+ *	This function should be only called right after the process
+ *	is created and when kfd_processes_mutex is still being held
+ *	to avoid concurrency. Because of that exclusiveness, we do
+ *	not need to take p->mutex.
+ */
+static int kfd_process_alloc_gpuvm(struct kfd_process *p,
+		struct kfd_dev *kdev, uint64_t gpu_va, uint32_t size,
+		void **kptr, struct kfd_process_device *pdd, uint32_t flags)
+{
+	int err;
+	void *mem = NULL;
+	int handle;
+
+	err = kdev->kfd2kgd->alloc_memory_of_gpu(kdev->kgd, gpu_va, size,
+				pdd->vm,
+				(struct kgd_mem **)&mem, NULL, flags);
+	if (err)
+		goto err_alloc_mem;
+
+	err = kdev->kfd2kgd->map_memory_to_gpu(
+				kdev->kgd, (struct kgd_mem *)mem, pdd->vm);
+	if (err)
+		goto err_map_mem;
+
+	err = kdev->kfd2kgd->sync_memory(kdev->kgd, (struct kgd_mem *) mem,
+				true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Create an obj handle so kfd_process_device_remove_obj_handle
+	 * will take care of the bo removal when the process finishes.
+	 * We do not need to take p->mutex, because the process is just
+	 * created and the ioctls have not had the chance to run.
+	 */
+	handle = kfd_process_device_create_obj_handle(pdd, mem);
+
+	if (handle < 0) {
+		err = handle;
+		goto free_gpuvm;
+	}
+
+	if (kptr) {
+		err = kdev->kfd2kgd->map_gtt_bo_to_kernel(kdev->kgd,
+				(struct kgd_mem *)mem, kptr, NULL);
+		if (err) {
+			pr_debug("Map GTT BO to kernel failed\n");
+			goto free_obj_handle;
+		}
+	}
+
+	return err;
+
+free_obj_handle:
+	kfd_process_device_remove_obj_handle(pdd, handle);
+free_gpuvm:
+sync_memory_failed:
+	kfd_process_free_gpuvm(mem, pdd);
+	return err;
+
+err_map_mem:
+	kdev->kfd2kgd->free_memory_of_gpu(kdev->kgd, mem);
+err_alloc_mem:
+	*kptr = NULL;
+	return err;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -190,7 +268,7 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 
 		list_del(&pdd->per_device_list);
 
-		if (pdd->qpd.cwsr_kaddr)
+		if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
@@ -316,24 +394,45 @@ static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep)
 	struct kfd_process_device *pdd = NULL;
 	struct kfd_dev *dev = NULL;
 	struct qcm_process_device *qpd = NULL;
+	void *kaddr;
+	const uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+		ALLOC_MEM_FLAGS_NO_SUBSTITUTE | ALLOC_MEM_FLAGS_EXECUTABLE;
+	int ret;
 
 	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
 		dev = pdd->dev;
 		qpd = &pdd->qpd;
 		if (!dev->cwsr_enabled || qpd->cwsr_kaddr)
 			continue;
-		offset = (dev->id | KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
-		qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
-			KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
-			MAP_SHARED, offset);
-
-		if (IS_ERR_VALUE(qpd->tba_addr)) {
-			int err = qpd->tba_addr;
-
-			pr_err("Failure to set tba address. error %d.\n", err);
-			qpd->tba_addr = 0;
-			qpd->cwsr_kaddr = NULL;
-			return err;
+		if (qpd->cwsr_base) {
+			/* cwsr_base is only set for dGPU */
+			ret = kfd_process_alloc_gpuvm(p, dev, qpd->cwsr_base,
+				KFD_CWSR_TBA_TMA_SIZE, &kaddr, pdd, flags);
+			if (!ret) {
+				qpd->cwsr_kaddr = kaddr;
+				qpd->tba_addr = qpd->cwsr_base;
+			} else
+				/* In case of error, the kfd_bos for some pdds
+				 * which are already allocated successfully
+				 * will be freed in upper level function
+				 * i.e. create_process().
+				 */
+				return ret;
+		} else {
+			offset = (dev->id |
+				KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
+			qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
+				KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
+				MAP_SHARED, offset);
+
+			if (IS_ERR_VALUE(qpd->tba_addr)) {
+				ret = qpd->tba_addr;
+				pr_err("Failure to set tba address. error %d.\n",
+				       ret);
+				qpd->tba_addr = 0;
+				qpd->cwsr_kaddr = NULL;
+				return ret;
+			}
 		}
 
 		memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (20 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
                     ` (3 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Amber Lin, Felix Kuehling

On GFX7 the CP does not perform a TC flush when queues are unmapped.
To avoid TC eviction from accessing an invalid VMID, flush it
explicitly before releasing a VMID.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 22 +++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 37 ++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 51 ++++++++++++++++++++++
 4 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b3b6dab..c18e048 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -142,12 +142,31 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	return 0;
 }
 
+static int flush_texture_cache_nocpsch(struct kfd_dev *kdev,
+				struct qcm_process_device *qpd)
+{
+	uint32_t len;
+
+	if (!qpd->ib_kaddr)
+		return -ENOMEM;
+
+	len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+
+	return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid,
+				qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len);
+}
+
 static void deallocate_vmid(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
 				struct queue *q)
 {
 	int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
 
+	/* On GFX v7, CP doesn't flush TC at dequeue */
+	if (q->device->device_info->asic_family == CHIP_HAWAII)
+		if (flush_texture_cache_nocpsch(q->device, qpd))
+			pr_err("Failed to flush TC\n");
+
 	kfd_flush_tlb(qpd_to_pdd(qpd));
 
 	/* Release the vmid mapping */
@@ -792,11 +811,12 @@ static void uninitialize(struct device_queue_manager *dqm)
 static int start_nocpsch(struct device_queue_manager *dqm)
 {
 	init_interrupts(dqm);
-	return 0;
+	return pm_init(&dqm->packets, dqm);
 }
 
 static int stop_nocpsch(struct device_queue_manager *dqm)
 {
+	pm_uninit(&dqm->packets);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 0ecbd1f..7614375 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -356,6 +356,43 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 	return retval;
 }
 
+/* pm_create_release_mem - Create a RELEASE_MEM packet and return the size
+ *     of this packet
+ *     @gpu_addr - GPU address of the packet. It's a virtual address.
+ *     @buffer - buffer to fill up with the packet. It's a CPU kernel pointer
+ *     Return - length of the packet
+ */
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer)
+{
+	struct pm4_mec_release_mem *packet;
+
+	WARN_ON(!buffer);
+
+	packet = (struct pm4_mec_release_mem *)buffer;
+	memset(buffer, 0, sizeof(*packet));
+
+	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
+						 sizeof(*packet));
+
+	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
+	packet->bitfields2.event_index = event_index___release_mem__end_of_pipe;
+	packet->bitfields2.tcl1_action_ena = 1;
+	packet->bitfields2.tc_action_ena = 1;
+	packet->bitfields2.cache_policy = cache_policy___release_mem__lru;
+	packet->bitfields2.atc = 0;
+
+	packet->bitfields3.data_sel = data_sel___release_mem__send_32_bit_low;
+	packet->bitfields3.int_sel =
+		int_sel___release_mem__send_interrupt_after_write_confirm;
+
+	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
+	packet->address_hi = upper_32_bits(gpu_addr);
+
+	packet->data_lo = 0;
+
+	return sizeof(*packet) / sizeof(unsigned int);
+}
+
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 {
 	pm->dqm = dqm;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 78200ba..050fd00 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -494,6 +494,7 @@ struct qcm_process_device {
 
 	/* IB memory */
 	uint64_t ib_base;
+	void *ib_kaddr;
 };
 
 /* KFD Memory Eviction */
@@ -832,6 +833,8 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 
 void pm_release_ib(struct packet_manager *pm);
 
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer);
+
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 12101fb..25d7dfe 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -151,6 +151,53 @@ static int kfd_process_alloc_gpuvm(struct kfd_process *p,
 	return err;
 }
 
+/* kfd_process_reserve_ib_mem - Reserve memory inside the process for IB usage
+ *	The memory reserved is for KFD to submit IB to AMDGPU from kernel.
+ *	If the memory is reserved successfully, ib_kaddr will have
+ *	the CPU/kernel address. Check ib_kaddr before accessing the
+ *	memory.
+ */
+static int kfd_process_reserve_ib_mem(struct kfd_process *p)
+{
+	int ret = 0;
+	struct kfd_process_device *temp, *pdd = NULL;
+	struct kfd_dev *kdev = NULL;
+	struct qcm_process_device *qpd = NULL;
+	void *kaddr;
+	uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+			 ALLOC_MEM_FLAGS_NO_SUBSTITUTE |
+			 ALLOC_MEM_FLAGS_WRITABLE |
+			 ALLOC_MEM_FLAGS_EXECUTABLE;
+
+	list_for_each_entry_safe(pdd, temp, &p->per_device_data,
+				per_device_list) {
+		kdev = pdd->dev;
+		qpd = &pdd->qpd;
+		if (qpd->ib_kaddr)
+			continue;
+
+		if (qpd->ib_base) { /* is dGPU */
+			ret = kfd_process_alloc_gpuvm(p, kdev,
+				qpd->ib_base, PAGE_SIZE,
+				&kaddr, pdd, flags);
+			if (!ret)
+				qpd->ib_kaddr = kaddr;
+			else
+				/* In case of error, the kfd_bos for some pdds
+				 * which are already allocated successfully
+				 * will be freed in upper level function
+				 * i.e. create_process().
+				 */
+				return ret;
+		} else {
+			/* FIXME: Support APU */
+			continue;
+		}
+	}
+
+	return 0;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -499,6 +546,9 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
 	process->last_restore_timestamp = get_jiffies_64();
 
+	err = kfd_process_reserve_ib_mem(process);
+	if (err)
+		goto err_reserve_ib_mem;
 	err = kfd_process_init_cwsr(process, filep);
 	if (err)
 		goto err_init_cwsr;
@@ -506,6 +556,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return process;
 
 err_init_cwsr:
+err_reserve_ib_mem:
 	kfd_process_free_outstanding_kfd_bos(process);
 	kfd_process_destroy_pdds(process);
 err_init_apertures:
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (21 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
                     ` (2 subsequent siblings)
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c        | 329 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   8 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |   2 +
 include/uapi/linux/kfd_ioctl.h                  |  54 +++-
 4 files changed, 392 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 7d40094..160a5c8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1046,6 +1046,323 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
 	return 0;
 }
 
+bool kfd_dev_is_large_bar(struct kfd_dev *dev)
+{
+	struct kfd_local_mem_info mem_info;
+
+	if (dev->device_info->needs_iommu_device)
+		return false;
+
+	dev->kfd2kgd->get_local_mem_info(dev->kgd, &mem_info);
+	if (mem_info.local_mem_size_private == 0 &&
+			mem_info.local_mem_size_public > 0)
+		return true;
+	return false;
+}
+
+static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_alloc_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int idr_handle;
+	long err;
+	uint64_t offset = args->mmap_offset;
+	uint32_t flags = args->flags;
+
+	if (args->size == 0)
+		return -EINVAL;
+
+	dev = kfd_device_by_id(args->gpu_id);
+	if (!dev)
+		return -EINVAL;
+
+	if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) &&
+		(flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) &&
+		!kfd_dev_is_large_bar(dev)) {
+		pr_err("Alloc host visible vram on small bar is not allowed\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto err_unlock;
+	}
+
+	err = dev->kfd2kgd->alloc_memory_of_gpu(
+		dev->kgd, args->va_addr, args->size,
+		pdd->vm, (struct kgd_mem **) &mem, &offset,
+		flags);
+
+	if (err)
+		goto err_unlock;
+
+	idr_handle = kfd_process_device_create_obj_handle(pdd, mem);
+	if (idr_handle < 0) {
+		err = -EFAULT;
+		goto err_free;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
+	args->mmap_offset = offset;
+
+	return 0;
+
+err_free:
+	dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
+}
+
+static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_free_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int ret;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		pr_err("Process device data doesn't exist\n");
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	mem = kfd_process_device_translate_handle(
+		pdd, GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	ret = dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+
+	/* If freeing the buffer failed, leave the handle in place for
+	 * clean-up during process tear-down.
+	 */
+	if (!ret)
+		kfd_process_device_remove_obj_handle(
+			pdd, GET_IDR_HANDLE(args->handle));
+
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return ret;
+}
+
+static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_map_memory_to_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	int i, num_dev = 0;
+	uint32_t *devices_arr = NULL;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->device_ids_array_size) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->device_ids_array_size & 3) {
+		pr_debug("Misaligned device IDs array size %u\n",
+			 args->device_ids_array_size);
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->device_ids_array_size, GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->device_ids_array_size);
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	num_dev = args->device_ids_array_size / sizeof(uint32_t);
+	for (i = 0 ; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			pr_debug("Getting device by id failed for 0x%x\n",
+				 devices_arr[i]);
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_bind_process_to_device(peer, p);
+		if (!peer_pdd) {
+			err = PTR_ERR(pdd);
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = peer->kfd2kgd->map_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to map to gpu %d/%d\n",
+			       i, num_dev);
+			goto map_memory_to_gpu_failed;
+		}
+	}
+
+	mutex_unlock(&p->mutex);
+
+	err = dev->kfd2kgd->sync_memory(dev->kgd, (struct kgd_mem *) mem, true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Flush TLBs after waiting for the page table updates to complete */
+	for (i = 0; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (WARN_ON_ONCE(!peer))
+			continue;
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (WARN_ON_ONCE(!peer_pdd))
+			continue;
+		kfd_flush_tlb(peer_pdd);
+	}
+
+	kfree(devices_arr);
+
+	return err;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+map_memory_to_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+sync_memory_failed:
+	kfree(devices_arr);
+
+	return err;
+}
+
+static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_unmap_memory_from_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	uint32_t *devices_arr = NULL, num_dev, i;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->device_ids_array_size) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->device_ids_array_size & 3) {
+		pr_debug("Misaligned device IDs array size %u\n",
+			 args->device_ids_array_size);
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->device_ids_array_size, GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->device_ids_array_size);
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		pr_debug("Process device data doesn't exist\n");
+		err = -ENODEV;
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	num_dev = args->device_ids_array_size / sizeof(uint32_t);
+	for (i = 0 ; i < num_dev; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (!peer_pdd) {
+			err = -ENODEV;
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = dev->kfd2kgd->unmap_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to unmap from gpu %d/%d\n",
+			       i, num_dev);
+			goto unmap_memory_from_gpu_failed;
+		}
+	}
+	kfree(devices_arr);
+
+	mutex_unlock(&p->mutex);
+
+	return 0;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+unmap_memory_from_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+	kfree(devices_arr);
+	return err;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
 			    .cmd_drv = 0, .name = #ioctl}
@@ -1111,6 +1428,18 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
 			kfd_ioctl_get_process_apertures_new, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_ALLOC_MEMORY_OF_GPU,
+			kfd_ioctl_alloc_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_FREE_MEMORY_OF_GPU,
+			kfd_ioctl_free_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_MAP_MEMORY_TO_GPU,
+			kfd_ioctl_map_memory_to_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU,
+			kfd_ioctl_unmap_memory_from_gpu, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 050fd00..475d19e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -509,6 +509,14 @@ struct qcm_process_device {
 int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
 					       struct dma_fence *fence);
 
+/* 8 byte handle containing GPU ID in the most significant 4 bytes and
+ * idr_handle in the least significant 4 bytes
+ */
+#define MAKE_HANDLE(gpu_id, idr_handle) \
+	(((uint64_t)(gpu_id) << 32) + idr_handle)
+#define GET_GPU_ID(handle) (handle >> 32)
+#define GET_IDR_HANDLE(handle) (handle & 0xFFFFFFFF)
+
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
 	PDD_BOUND,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index b7146e2..9e4d392 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -130,6 +130,7 @@ struct tile_config {
 
 /*
  * Allocation flag domains
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_VRAM		(1 << 0)
 #define ALLOC_MEM_FLAGS_GTT		(1 << 1)
@@ -138,6 +139,7 @@ struct tile_config {
 
 /*
  * Allocation flags attributes/access options.
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
 #define ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 5201437..e2ba6bf 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -286,6 +286,46 @@ struct kfd_ioctl_set_trap_handler_args {
 	__u32 pad;
 };
 
+/* Allocation flags: memory types */
+#define KFD_IOC_ALLOC_MEM_FLAGS_VRAM		(1 << 0)
+#define KFD_IOC_ALLOC_MEM_FLAGS_GTT		(1 << 1)
+#define KFD_IOC_ALLOC_MEM_FLAGS_USERPTR		(1 << 2)
+#define KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL	(1 << 3)
+/* Allocation flags: attributes/access options */
+#define KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
+#define KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
+#define KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
+#define KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28)
+#define KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
+#define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
+
+struct kfd_ioctl_alloc_memory_of_gpu_args {
+	__u64 va_addr;		/* to KFD */
+	__u64 size;		/* to KFD */
+	__u64 handle;		/* from KFD */
+	__u64 mmap_offset;	/* to KFD (userptr), from KFD (mmap offset) */
+	__u32 gpu_id;		/* to KFD */
+	__u32 flags;
+};
+
+struct kfd_ioctl_free_memory_of_gpu_args {
+	__u64 handle;		/* to KFD */
+};
+
+struct kfd_ioctl_map_memory_to_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 device_ids_array_size;	/* to KFD */
+	__u32 pad;
+};
+
+struct kfd_ioctl_unmap_memory_from_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 device_ids_array_size;	/* to KFD */
+	__u32 pad;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)			_IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)		_IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -353,7 +393,19 @@ struct kfd_ioctl_set_trap_handler_args {
 		AMDKFD_IOWR(0x14,		\
 			struct kfd_ioctl_get_process_apertures_new_args)
 
+#define AMDKFD_IOC_ALLOC_MEMORY_OF_GPU		\
+		AMDKFD_IOWR(0x15, struct kfd_ioctl_alloc_memory_of_gpu_args)
+
+#define AMDKFD_IOC_FREE_MEMORY_OF_GPU		\
+		AMDKFD_IOWR(0x16, struct kfd_ioctl_free_memory_of_gpu_args)
+
+#define AMDKFD_IOC_MAP_MEMORY_TO_GPU		\
+		AMDKFD_IOWR(0x17, struct kfd_ioctl_map_memory_to_gpu_args)
+
+#define AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU	\
+		AMDKFD_IOWR(0x18, struct kfd_ioctl_unmap_memory_from_gpu_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x15
+#define AMDKFD_COMMAND_END		0x19
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (22 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  1:09   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
  2018-01-27  9:08   ` [PATCH 00/25] Add KFD GPUVM support for dGPUs Christian König
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

The events page must be accessible in user mode by the GPU and CPU
as well as in kernel mode by the CPU. On dGPUs user mode virtual
addresses are managed by the Thunk's GPU memory allocation code.
Therefore we can't allocate the memory in kernel mode like we do
on APUs. But KFD still needs to map the memory for kernel access.
To facilitate this, the Thunk provides the buffer handle of the
events page to KFD when creating the first event.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 31 ++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 160a5c8..0c9aa07 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -922,6 +922,58 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 	struct kfd_ioctl_create_event_args *args = data;
 	int err;
 
+	/* For dGPUs the event page is allocated in user mode. The
+	 * handle is passed to KFD with the first call to this IOCTL
+	 * through the event_page_offset field.
+	 */
+	if (args->event_page_offset) {
+		struct kfd_dev *kfd;
+		struct kfd_process_device *pdd;
+		void *mem, *kern_addr;
+		uint64_t size;
+
+		if (p->signal_page) {
+			pr_err("Event page is already set\n");
+			return -EINVAL;
+		}
+
+		kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
+		if (!kfd) {
+			pr_err("Getting device by id failed in %s\n", __func__);
+			return -EINVAL;
+		}
+
+		mutex_lock(&p->mutex);
+		pdd = kfd_bind_process_to_device(kfd, p);
+		if (IS_ERR(pdd)) {
+			err = PTR_ERR(pdd);
+			goto out_unlock;
+		}
+
+		mem = kfd_process_device_translate_handle(pdd,
+				GET_IDR_HANDLE(args->event_page_offset));
+		if (!mem) {
+			pr_err("Can't find BO, offset is 0x%llx\n",
+			       args->event_page_offset);
+			err = -EINVAL;
+			goto out_unlock;
+		}
+		mutex_unlock(&p->mutex);
+
+		err = kfd->kfd2kgd->map_gtt_bo_to_kernel(kfd->kgd,
+						mem, &kern_addr, &size);
+		if (err) {
+			pr_err("Failed to map event page to kernel\n");
+			return err;
+		}
+
+		err = kfd_event_page_set(p, kern_addr, size);
+		if (err) {
+			pr_err("Failed to set event page\n");
+			return err;
+		}
+	}
+
 	err = kfd_event_create(filp, p, args->event_type,
 				args->auto_reset != 0, args->node_id,
 				&args->event_id, &args->event_trigger_data,
@@ -929,6 +981,10 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 				&args->event_slot_index);
 
 	return err;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
 }
 
 static int kfd_ioctl_destroy_event(struct file *filp, struct kfd_process *p,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index f770dc7..56ec74a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -51,6 +51,7 @@ struct kfd_event_waiter {
 struct kfd_signal_page {
 	uint64_t *kernel_address;
 	uint64_t __user *user_address;
+	bool need_to_free_pages;
 };
 
 
@@ -78,6 +79,7 @@ static struct kfd_signal_page *allocate_signal_page(struct kfd_process *p)
 	       KFD_SIGNAL_EVENT_LIMIT * 8);
 
 	page->kernel_address = backing_store;
+	page->need_to_free_pages = true;
 	pr_debug("Allocated new event signal page at %p, for process %p\n",
 			page, p);
 
@@ -268,8 +270,9 @@ static void shutdown_signal_page(struct kfd_process *p)
 	struct kfd_signal_page *page = p->signal_page;
 
 	if (page) {
-		free_pages((unsigned long)page->kernel_address,
-				get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
+		if (page->need_to_free_pages)
+			free_pages((unsigned long)page->kernel_address,
+				   get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
 		kfree(page);
 	}
 }
@@ -291,6 +294,30 @@ static bool event_can_be_cpu_signaled(const struct kfd_event *ev)
 	return ev->type == KFD_EVENT_TYPE_SIGNAL;
 }
 
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size)
+{
+	struct kfd_signal_page *page;
+
+	if (p->signal_page)
+		return -EBUSY;
+
+	page = kzalloc(sizeof(*page), GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
+	/* Initialize all events to unsignaled */
+	memset(kernel_address, (uint8_t) UNSIGNALED_EVENT_SLOT,
+	       KFD_SIGNAL_EVENT_LIMIT * 8);
+
+	page->kernel_address = kernel_address;
+
+	p->signal_page = page;
+	p->signal_mapped_size = size;
+
+	return 0;
+}
+
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 475d19e..b478594 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -866,6 +866,8 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
 void kfd_signal_hw_exception_event(unsigned int pasid);
 int kfd_set_event(struct kfd_process *p, uint32_t event_id);
 int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size);
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (23 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
@ 2018-01-27  1:09   ` Felix Kuehling
  2018-01-27  9:08   ` [PATCH 00/25] Add KFD GPUVM support for dGPUs Christian König
  25 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-01-27  1:09 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Simulate large-BAR system by exporting only visible memory. This
limits the amount of available VRAM to the size of the BAR, but
enables CPU access to VRAM.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c    | 3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 6 ++++++
 4 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 0c9aa07..01cc8eb 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1106,6 +1106,11 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev)
 {
 	struct kfd_local_mem_info mem_info;
 
+	if (debug_largebar) {
+		pr_debug("Simulate large-bar allocation on non large-bar machine\n");
+		return true;
+	}
+
 	if (dev->device_info->needs_iommu_device)
 		return false;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 3478270..c1981b1 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1131,6 +1131,9 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
 			sub_type_hdr->length);
 
+	if (debug_largebar)
+		local_mem_info.local_mem_size_private = 0;
+
 	if (local_mem_info.local_mem_size_private == 0)
 		ret = kfd_fill_gpu_memory_affinity(&avail_size,
 				kdev, HSA_MEM_HEAP_TYPE_FB_PUBLIC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 65574c6..b0acb06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -71,6 +71,11 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
 	"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = enable)");
 
+int debug_largebar;
+module_param(debug_largebar, int, 0444);
+MODULE_PARM_DESC(debug_largebar,
+	"Debug large-bar flag used to simulate large-bar capability on non-large bar machine (0 = disable, 1 = enable)");
+
 int ignore_crat;
 module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b478594..2eba853 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -105,6 +105,12 @@ extern int cwsr_enable;
 extern int send_sigterm;
 
 /*
+ * This kernel module is used to simulate large bar machine on non-large bar
+ * enabled machines.
+ */
+extern int debug_largebar;
+
+/*
  * Ignore CRAT table during KFD initialization, can be used to work around
  * broken CRAT tables on some AMD systems
  */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 00/25] Add KFD GPUVM support for dGPUs
       [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (24 preceding siblings ...)
  2018-01-27  1:09   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
@ 2018-01-27  9:08   ` Christian König
  25 siblings, 0 replies; 44+ messages in thread
From: Christian König @ 2018-01-27  9:08 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
> I split this into an AMDGPU and AMDKFD part. The bigger patches that
> add lots of new code are not cherry-picked and squashed. Instead I
> copied, reorganized and cleaned up the code by hand and then split it
> into some semblance of a sensible history. I acknowledged major
> contributors with signed-off-by lines but didn't list everyone who
> ever touched that code (would probably be most of the team).
>
> I pushed an updated Thunk (rebased on ROCm 1.7) that works with this
> KFD update. Most testing was done on Fiji with KFDTest (Yong started
> working on open-sourcing it). I was also able to run the OpenCL
> version of SHOC, though most sub-tests still fail.
>
> KFDTest can manage VRAM and system memory, submit shader dispatches,
> receive events. I haven't tested multi-GPU yet, but in theory that
> should also work, with system memory buffers shared between multiple
> GPUs.
>
> The big missing piece at this point is support for userptr memory
> (user-allocated memory mapped for GPU access). That's giong to be my
> next patch series that should enable a much wider range of real-world
> applications.
>
> AMDGPU:
> Patches 1-5 are minor cleanups and fixes

Patches #1-#5 are Reviewed-by: Christian König <christian.koenig@amd.com>.

> Patches 6-10 add and implement KFD->KGD interfaces for GPUVM
>
> AMDKFD:
> Patches 11-13 are minor cleanups and fixes
> Patches 14-25 add all the GPUVM memory management functionality
>
> Felix Kuehling (22):
>    drm/amdgpu: remove useless BUG_ONs
>    drm/amdgpu: Fix header file dependencies
>    drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid
>    drm/amdgpu: Remove unused kfd2kgd interface
>    drm/amdgpu: Add KFD eviction fence
>    drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
>    drm/amdgpu: add amdgpu_sync_clone
>    drm/amdgpu: Add GPUVM memory management functions for KFD
>    drm/amdgpu: Add submit IB function for KFD
>    drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard
>    drm/amdkfd: Use per-device sched_policy
>    drm/amdkfd: Add GPUVM virtual address space to PDD
>    drm/amdkfd: Implement KFD process eviction/restore
>    uapi: Fix type used in ioctl parameter structures
>    drm/amdkfd: Remove limit on number of GPUs
>    drm/amdkfd: Aperture setup for dGPUs
>    drm/amdkfd: Add per-process IDR for buffer handles
>    drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
>    drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
>    drm/amdkfd: Add ioctls for GPUVM memory management
>    drm/amdkfd: Kmap event page for dGPUs
>    drm/amdkfd: Add module option for testing large-BAR functionality
>
> Harish Kasiviswanathan (1):
>    drm/amdkfd: Remove unaligned memory access
>
> Oak Zeng (1):
>    drm/amdkfd: Populate DRM render device minor
>
> Yong Zhao (1):
>    drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem
>
>   drivers/gpu/drm/amd/amdgpu/Makefile                |    2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c         |  127 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  115 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c   |  196 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   80 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   82 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 1500 ++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c         |    4 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h         |    2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h           |    6 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c           |   53 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h           |    1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |   25 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |    1 +
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           |  484 +++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |    3 +
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c            |   65 +-
>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  290 +++-
>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |    9 +
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c            |   31 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |   59 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_module.c            |    7 +
>   drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |   37 +
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |   79 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c           |  490 ++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |    4 +
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |    1 +
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  101 +-
>   include/uapi/linux/kfd_ioctl.h                     |   87 +-
>   29 files changed, 3811 insertions(+), 130 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]     ` <1517015381-1080-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-27  9:16       ` Christian König
       [not found]         ` <11f5f33b-0c0e-44c2-5be9-5d0d25204c2e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2018-01-27  9:16 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
> This fence is used by KFD to keep memory resident while user mode
> queues are enabled. Trying to evict memory will trigger the
> enable_signaling callback, which starts a KFD eviction, which
> involves preempting user mode queues before signaling the fence.
> There is one such fence per process.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 196 +++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  18 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
>   7 files changed, 256 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index d6e5b72..43dc3f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -130,6 +130,7 @@ amdgpu-y += \
>   # add amdkfd interfaces
>   amdgpu-y += \
>   	 amdgpu_amdkfd.o \
> +	 amdgpu_amdkfd_fence.o \
>   	 amdgpu_amdkfd_gfx_v8.o
>   
>   # add cgs
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 2a519f9..8d92f5c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -29,6 +29,8 @@
>   #include <linux/mmu_context.h>
>   #include <kgd_kfd_interface.h>
>   
> +extern const struct kgd2kfd_calls *kgd2kfd;
> +
>   struct amdgpu_device;
>   
>   struct kgd_mem {
> @@ -37,6 +39,19 @@ struct kgd_mem {
>   	void *cpu_ptr;
>   };
>   
> +/* KFD Memory Eviction */
> +struct amdgpu_amdkfd_fence {
> +	struct dma_fence base;
> +	void *mm;
> +	spinlock_t lock;
> +	char timeline_name[TASK_COMM_LEN];
> +};
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +						       void *mm);
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm);
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
> +
>   int amdgpu_amdkfd_init(void);
>   void amdgpu_amdkfd_fini(void);
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> new file mode 100644
> index 0000000..252e44e
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -0,0 +1,196 @@
> +/*
> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/dma-fence.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +#include <linux/stacktrace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include "amdgpu_amdkfd.h"
> +
> +const struct dma_fence_ops amd_kfd_fence_ops;
> +static atomic_t fence_seq = ATOMIC_INIT(0);
> +
> +static int amd_kfd_fence_signal(struct dma_fence *f);
> +
> +/* Eviction Fence
> + * Fence helper functions to deal with KFD memory eviction.
> + * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
> + *  evicted unless all the user queues for that process are evicted.
> + *
> + * All the BOs in a process share an eviction fence. When process X wants
> + * to map VRAM memory but TTM can't find enough space, TTM will attempt to
> + * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
> + * by calling ttm_bo_driver->eviction_valuable().
> + *
> + * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
> + *  to process X. Otherwise, it will return true to indicate BO can be
> + *  evicted by TTM.
> + *
> + * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
> + * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
> + * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
> + *
> + * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
> + *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
> + *  --> amdgpu_amdkfd_fence.enable_signaling
> + *
> + * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
> + * user queues and signal fence. The work item will also start another delayed
> + * work item to restore BOs
> + */
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +						       void *mm)
> +{
> +	struct amdgpu_amdkfd_fence *fence = NULL;
> +
> +	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +	if (fence == NULL)
> +		return NULL;
> +
> +	/* mm_struct mm is used as void pointer to identify the parent
> +	 * KFD process. Don't dereference it. Fence and any threads using
> +	 * mm is guranteed to be released before process termination.
> +	 */
> +	fence->mm = mm;

That won't work. Fences can live much longer than any process who 
created them.

I've already found a fence in a BO still living hours after the process 
was killed and the pid long recycled.

I suggest to make fence->mm a real mm_struct pointer with reference 
counting and then set it to NULL and drop the reference in 
enable_signaling.

> +	get_task_comm(fence->timeline_name, current);
> +	spin_lock_init(&fence->lock);
> +
> +	dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
> +		   context, atomic_inc_return(&fence_seq));
> +
> +	return fence;
> +}
> +
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
> +{
> +	struct amdgpu_amdkfd_fence *fence;
> +
> +	if (!f)
> +		return NULL;
> +
> +	fence = container_of(f, struct amdgpu_amdkfd_fence, base);
> +	if (fence && f->ops == &amd_kfd_fence_ops)
> +		return fence;
> +
> +	return NULL;
> +}
> +
> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
> +{
> +	return "amdgpu_amdkfd_fence";
> +}
> +
> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
> +{
> +	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +	return fence->timeline_name;
> +}
> +
> +/**
> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
> + *  a KFD BO and schedules a job to move the BO.
> + *  If fence is already signaled return true.
> + *  If fence is not signaled schedule a evict KFD process work item.
> + */
> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
> +{
> +	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +	if (!fence)
> +		return false;
> +
> +	if (dma_fence_is_signaled(f))
> +		return true;
> +
> +	if (!kgd2kfd->schedule_evict_and_restore_process(
> +				(struct mm_struct *)fence->mm, f))
> +		return true;
> +
> +	return false;
> +}
> +
> +static int amd_kfd_fence_signal(struct dma_fence *f)
> +{
> +	unsigned long flags;
> +	int ret;
> +
> +	spin_lock_irqsave(f->lock, flags);
> +	/* Set enabled bit so cb will called */
> +	set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);

Mhm, why is that necessary?

> +	ret = dma_fence_signal_locked(f);
> +	spin_unlock_irqrestore(f->lock, flags);
> +
> +	return ret;
> +}
> +
> +/**
> + * amd_kfd_fence_release - callback that fence can be freed
> + *
> + * @fence: fence
> + *
> + * This function is called when the reference count becomes zero.
> + * It just RCU schedules freeing up the fence.
> + */
> +static void amd_kfd_fence_release(struct dma_fence *f)
> +{
> +	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +	/* Unconditionally signal the fence. The process is getting
> +	 * terminated.
> +	 */
> +	if (WARN_ON(!fence))
> +		return; /* Not an amdgpu_amdkfd_fence */
> +
> +	amd_kfd_fence_signal(f);
> +	kfree_rcu(f, rcu);
> +}
> +
> +/**
> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
> + *  if same return TRUE else return FALSE.
> + *
> + * @f: [IN] fence
> + * @mm: [IN] mm that needs to be verified
> + */
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
> +{
> +	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +	if (!fence)
> +		return false;
> +	else if (fence->mm == mm)
> +		return true;
> +
> +	return false;
> +}
> +
> +const struct dma_fence_ops amd_kfd_fence_ops = {
> +	.get_driver_name = amd_kfd_fence_get_driver_name,
> +	.get_timeline_name = amd_kfd_fence_get_timeline_name,
> +	.enable_signaling = amd_kfd_fence_enable_signaling,
> +	.signaled = NULL,
> +	.wait = dma_fence_default_wait,
> +	.release = amd_kfd_fence_release,
> +};
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 65d5a4e..ca00dd2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -36,8 +36,9 @@
>   #define AMDGPU_MAX_UVD_ENC_RINGS	2
>   
>   /* some special values for the owner field */
> -#define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
> -#define AMDGPU_FENCE_OWNER_VM		((void*)1ul)
> +#define AMDGPU_FENCE_OWNER_UNDEFINED	((void *)0ul)
> +#define AMDGPU_FENCE_OWNER_VM		((void *)1ul)
> +#define AMDGPU_FENCE_OWNER_KFD		((void *)2ul)
>   
>   #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>   #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index df65c66..0cb31d9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -31,6 +31,7 @@
>   #include <drm/drmP.h>
>   #include "amdgpu.h"
>   #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>   
>   struct amdgpu_sync_entry {
>   	struct hlist_node	node;
> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
>   static void *amdgpu_sync_get_owner(struct dma_fence *f)
>   {
>   	struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> +	struct amdgpu_amdkfd_fence *kfd_fence;
> +
> +	if (!f)
> +		return AMDGPU_FENCE_OWNER_UNDEFINED;

When you add the extra NULL check here then please move the 
to_drm_sched_fence() after it as well.

Christian.

>   
>   	if (s_fence)
>   		return s_fence->owner;
>   
> +	kfd_fence = to_amdgpu_amdkfd_fence(f);
> +	if (kfd_fence)
> +		return AMDGPU_FENCE_OWNER_KFD;
> +
>   	return AMDGPU_FENCE_OWNER_UNDEFINED;
>   }
>   
> @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>   	for (i = 0; i < flist->shared_count; ++i) {
>   		f = rcu_dereference_protected(flist->shared[i],
>   					      reservation_object_held(resv));
> +		/* We only want to trigger KFD eviction fences on
> +		 * evict or move jobs. Skip KFD fences otherwise.
> +		 */
> +		fence_owner = amdgpu_sync_get_owner(f);
> +		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> +		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +			continue;
> +
>   		if (amdgpu_sync_same_dev(adev, f)) {
>   			/* VM updates are only interesting
>   			 * for other VM updates and moves.
>   			 */
> -			fence_owner = amdgpu_sync_get_owner(f);
>   			if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>   			    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>   			    ((owner == AMDGPU_FENCE_OWNER_VM) !=
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e4bb435..c3f33d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -46,6 +46,7 @@
>   #include "amdgpu.h"
>   #include "amdgpu_object.h"
>   #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>   #include "bif/bif_4_1_d.h"
>   
>   #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
> @@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>   {
>   	unsigned long num_pages = bo->mem.num_pages;
>   	struct drm_mm_node *node = bo->mem.mm_node;
> +	struct reservation_object_list *flist;
> +	struct dma_fence *f;
> +	int i;
> +
> +	/* If bo is a KFD BO, check if the bo belongs to the current process.
> +	 * If true, then return false as any KFD process needs all its BOs to
> +	 * be resident to run successfully
> +	 */
> +	flist = reservation_object_get_list(bo->resv);
> +	if (flist) {
> +		for (i = 0; i < flist->shared_count; ++i) {
> +			f = rcu_dereference_protected(flist->shared[i],
> +				reservation_object_held(bo->resv));
> +			if (amd_kfd_fence_check_mm(f, current->mm))
> +				return false;
> +		}
> +	}
>   
>   	switch (bo->mem.mem_type) {
>   	case TTM_PL_TT:
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 94eab54..9e35249 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -30,6 +30,7 @@
>   
>   #include <linux/types.h>
>   #include <linux/bitmap.h>
> +#include <linux/dma-fence.h>
>   
>   struct pci_dev;
>   
> @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>    *
>    * @resume: Notifies amdkfd about a resume action done to a kgd device
>    *
> + * @schedule_evict_and_restore_process: Schedules work queue that will prepare
> + * for safe eviction of KFD BOs that belong to the specified process.
> + *
>    * This structure contains function callback pointers so the kgd driver
>    * will notify to the amdkfd about certain status changes.
>    *
> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>   	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>   	void (*suspend)(struct kfd_dev *kfd);
>   	int (*resume)(struct kfd_dev *kfd);
> +	int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
> +			struct dma_fence *fence);
>   };
>   
>   int kgd2kfd_init(unsigned interface_version,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]     ` <1517015381-1080-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-27  9:19       ` Christian König
       [not found]         ` <de92f17a-5278-1b55-2a22-af17a82f7471-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2018-01-27  9:19 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
> Add GPUVM size and DRM render node. Also add function to query the
> VMID mask to avoid hard-coding it in multiple places later.
>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19 +++++++++++++++++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>   3 files changed, 25 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> index c9f204d..294c467 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> @@ -30,6 +30,8 @@
>   const struct kgd2kfd_calls *kgd2kfd;
>   bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>   
> +static const unsigned int compute_vmid_bitmap = 0xFF00;
> +
>   int amdgpu_amdkfd_init(void)
>   {
>   	int ret;
> @@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
>   	int last_valid_bit;
>   	if (adev->kfd) {
>   		struct kgd2kfd_shared_resources gpu_resources = {
> -			.compute_vmid_bitmap = 0xFF00,
> +			.compute_vmid_bitmap = compute_vmid_bitmap,
>   			.num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
> -			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
> +			.num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
> +			.gpuvm_size = adev->vm_manager.max_pfn
> +						<< AMDGPU_GPU_PAGE_SHIFT,

That most likely doesn't work as intended on Vega10. The address space 
is divided into an upper and a lower range, but max_pfn includes both.

I suggest to use something like min(adev->vm_manager.max_pfn << 
AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).

Christian.

> +			.drm_render_minor = adev->ddev->render->index
>   		};
>   
>   		/* this is going to have a few of the MSBs set that we need to
> @@ -351,3 +356,13 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd)
>   
>   	return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
>   }
> +
> +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
> +{
> +	if (adev->kfd) {
> +		if ((1 << vmid) & compute_vmid_bitmap)
> +			return true;
> +	}
> +
> +	return false;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 8d92f5c..cc3aa13 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -66,6 +66,8 @@ void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
>   struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
>   struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
>   
> +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
> +
>   /* Shared API */
>   int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
>   			void **mem_obj, uint64_t *gpu_addr,
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 9e35249..36c706a 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -108,6 +108,12 @@ struct kgd2kfd_shared_resources {
>   
>   	/* Number of bytes at start of aperture reserved for KGD. */
>   	size_t doorbell_start_offset;
> +
> +	/* GPUVM address space size in bytes */
> +	uint64_t gpuvm_size;
> +
> +	/* Minor device number of the render node */
> +	int drm_render_minor;
>   };
>   
>   struct tile_config {

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]         ` <de92f17a-5278-1b55-2a22-af17a82f7471-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-28 23:02           ` Felix Kuehling
       [not found]             ` <7425b235-e354-e9b7-0b83-623d9148c61b-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-28 23:02 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 2018-01-27 04:19 AM, Christian König wrote:
> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>> Add GPUVM size and DRM render node. Also add function to query the
>> VMID mask to avoid hard-coding it in multiple places later.
>>
>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19
>> +++++++++++++++++--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>>   3 files changed, 25 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> index c9f204d..294c467 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>> @@ -30,6 +30,8 @@
>>   const struct kgd2kfd_calls *kgd2kfd;
>>   bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>>   +static const unsigned int compute_vmid_bitmap = 0xFF00;
>> +
>>   int amdgpu_amdkfd_init(void)
>>   {
>>       int ret;
>> @@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct
>> amdgpu_device *adev)
>>       int last_valid_bit;
>>       if (adev->kfd) {
>>           struct kgd2kfd_shared_resources gpu_resources = {
>> -            .compute_vmid_bitmap = 0xFF00,
>> +            .compute_vmid_bitmap = compute_vmid_bitmap,
>>               .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
>> -            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
>> +            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
>> +            .gpuvm_size = adev->vm_manager.max_pfn
>> +                        << AMDGPU_GPU_PAGE_SHIFT,
>
> That most likely doesn't work as intended on Vega10. The address space
> is divided into an upper and a lower range, but max_pfn includes both.
>
> I suggest to use something like min(adev->vm_manager.max_pfn <<
> AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).

I think this is fine as it is. This just tells the Thunk the size of the
virtual address space supported by the GPU. Currently the Thunk only
uses 40-bits for SVM. But eventually it will be able to use the entire
47 bits of user address space. Any excess address space will just go unused.

I'm also wondering how universal the split 48-bit virtual address space 
layout is. Even for x86_64 there seems to be a 5-level page table layout
that supports 56 bits of user mode addresses
(Documentation/x86/x86_64/mm.txt). AArch64 seems to support 48-bit user
mode addresses (Documentation/arm64/memory.txt). I haven't found similar
information for PowerPC yet.

We should avoid coding too much architecture-specific logic into this
driver that's supposed to support other architectures as well. I should
also review the aperture placement with bigger user mode address spaces
in mind.

Regards,
  Felix

>
> Christian.
>
>> +            .drm_render_minor = adev->ddev->render->index
>>           };
>>             /* this is going to have a few of the MSBs set that we
>> need to
>> @@ -351,3 +356,13 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct
>> kgd_dev *kgd)
>>         return amdgpu_vram_mgr_usage(&adev->mman.bdev.man[TTM_PL_VRAM]);
>>   }
>> +
>> +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid)
>> +{
>> +    if (adev->kfd) {
>> +        if ((1 << vmid) & compute_vmid_bitmap)
>> +            return true;
>> +    }
>> +
>> +    return false;
>> +}
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> index 8d92f5c..cc3aa13 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>> @@ -66,6 +66,8 @@ void amdgpu_amdkfd_device_fini(struct amdgpu_device
>> *adev);
>>   struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void);
>>   struct kfd2kgd_calls *amdgpu_amdkfd_gfx_8_0_get_functions(void);
>>   +bool amdgpu_amdkfd_is_kfd_vmid(struct amdgpu_device *adev, u32 vmid);
>> +
>>   /* Shared API */
>>   int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
>>               void **mem_obj, uint64_t *gpu_addr,
>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> index 9e35249..36c706a 100644
>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> @@ -108,6 +108,12 @@ struct kgd2kfd_shared_resources {
>>         /* Number of bytes at start of aperture reserved for KGD. */
>>       size_t doorbell_start_offset;
>> +
>> +    /* GPUVM address space size in bytes */
>> +    uint64_t gpuvm_size;
>> +
>> +    /* Minor device number of the render node */
>> +    int drm_render_minor;
>>   };
>>     struct tile_config {
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]         ` <11f5f33b-0c0e-44c2-5be9-5d0d25204c2e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-28 23:42           ` Felix Kuehling
       [not found]             ` <05cc2831-a338-ddae-42c5-8be381787a5e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-28 23:42 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w


On 2018-01-27 04:16 AM, Christian König wrote:
> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
[snip]
>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>> +                               void *mm)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>> +
>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>> +    if (fence == NULL)
>> +        return NULL;
>> +
>> +    /* mm_struct mm is used as void pointer to identify the parent
>> +     * KFD process. Don't dereference it. Fence and any threads using
>> +     * mm is guranteed to be released before process termination.
>> +     */
>> +    fence->mm = mm;
>
> That won't work. Fences can live much longer than any process who
> created them.
>
> I've already found a fence in a BO still living hours after the
> process was killed and the pid long recycled.
>
> I suggest to make fence->mm a real mm_struct pointer with reference
> counting and then set it to NULL and drop the reference in
> enable_signaling.

I agree. But enable_signaling may be too early to drop the reference.
amd_kfd_fence_check_mm could still be called later from
amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't signaled yet.

The safe place is problably in amd_kfd_fence_release.

>
>> +    get_task_comm(fence->timeline_name, current);
>> +    spin_lock_init(&fence->lock);
>> +
>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>> +           context, atomic_inc_return(&fence_seq));
>> +
>> +    return fence;
>> +}
>> +
>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence;
>> +
>> +    if (!f)
>> +        return NULL;
>> +
>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>> +        return fence;
>> +
>> +    return NULL;
>> +}
>> +
>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
>> +{
>> +    return "amdgpu_amdkfd_fence";
>> +}
>> +
>> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> +
>> +    return fence->timeline_name;
>> +}
>> +
>> +/**
>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>> to evict
>> + *  a KFD BO and schedules a job to move the BO.
>> + *  If fence is already signaled return true.
>> + *  If fence is not signaled schedule a evict KFD process work item.
>> + */
>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> +
>> +    if (!fence)
>> +        return false;
>> +
>> +    if (dma_fence_is_signaled(f))
>> +        return true;
>> +
>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>> +                (struct mm_struct *)fence->mm, f))
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>> +{
>> +    unsigned long flags;
>> +    int ret;
>> +
>> +    spin_lock_irqsave(f->lock, flags);
>> +    /* Set enabled bit so cb will called */
>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>
> Mhm, why is that necessary?

This only gets called from fence_release below. I think this is to avoid
needlessly scheduling an eviction/restore cycle when an eviction fence
gets destroyed that hasn't been triggered before, probably during
process termination.

Harish, do you remember any other reason for this?

>
>> +    ret = dma_fence_signal_locked(f);
>> +    spin_unlock_irqrestore(f->lock, flags);
>> +
>> +    return ret;
>> +}
>> +
>> +/**
>> + * amd_kfd_fence_release - callback that fence can be freed
>> + *
>> + * @fence: fence
>> + *
>> + * This function is called when the reference count becomes zero.
>> + * It just RCU schedules freeing up the fence.
>> + */
>> +static void amd_kfd_fence_release(struct dma_fence *f)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> +    /* Unconditionally signal the fence. The process is getting
>> +     * terminated.
>> +     */
>> +    if (WARN_ON(!fence))
>> +        return; /* Not an amdgpu_amdkfd_fence */
>> +
>> +    amd_kfd_fence_signal(f);
>> +    kfree_rcu(f, rcu);
>> +}
>> +
>> +/**
>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>> fence @f
>> + *  if same return TRUE else return FALSE.
>> + *
>> + * @f: [IN] fence
>> + * @mm: [IN] mm that needs to be verified
>> + */
>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>> +{
>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>> +
>> +    if (!fence)
>> +        return false;
>> +    else if (fence->mm == mm)
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>> +    .signaled = NULL,
>> +    .wait = dma_fence_default_wait,
>> +    .release = amd_kfd_fence_release,
>> +};
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index 65d5a4e..ca00dd2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -36,8 +36,9 @@
>>   #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>     /* some special values for the owner field */
>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>     #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>   #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index df65c66..0cb31d9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -31,6 +31,7 @@
>>   #include <drm/drmP.h>
>>   #include "amdgpu.h"
>>   #include "amdgpu_trace.h"
>> +#include "amdgpu_amdkfd.h"
>>     struct amdgpu_sync_entry {
>>       struct hlist_node    node;
>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>> amdgpu_device *adev,
>>   static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>   {
>>       struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>> +
>> +    if (!f)
>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>
> When you add the extra NULL check here then please move the
> to_drm_sched_fence() after it as well.

Yeah, makes sense.

Regards,
  Felix

>
> Christian.
>
>>         if (s_fence)
>>           return s_fence->owner;
>>   +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>> +    if (kfd_fence)
>> +        return AMDGPU_FENCE_OWNER_KFD;
>> +
>>       return AMDGPU_FENCE_OWNER_UNDEFINED;
>>   }
>>   @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>>       for (i = 0; i < flist->shared_count; ++i) {
>>           f = rcu_dereference_protected(flist->shared[i],
>>                             reservation_object_held(resv));
>> +        /* We only want to trigger KFD eviction fences on
>> +         * evict or move jobs. Skip KFD fences otherwise.
>> +         */
>> +        fence_owner = amdgpu_sync_get_owner(f);
>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>> +            continue;
>> +
>>           if (amdgpu_sync_same_dev(adev, f)) {
>>               /* VM updates are only interesting
>>                * for other VM updates and moves.
>>                */
>> -            fence_owner = amdgpu_sync_get_owner(f);
>>               if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>                   (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>                   ((owner == AMDGPU_FENCE_OWNER_VM) !=
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index e4bb435..c3f33d3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -46,6 +46,7 @@
>>   #include "amdgpu.h"
>>   #include "amdgpu_object.h"
>>   #include "amdgpu_trace.h"
>> +#include "amdgpu_amdkfd.h"
>>   #include "bif/bif_4_1_d.h"
>>     #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>> @@ -1170,6 +1171,23 @@ static bool
>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>   {
>>       unsigned long num_pages = bo->mem.num_pages;
>>       struct drm_mm_node *node = bo->mem.mm_node;
>> +    struct reservation_object_list *flist;
>> +    struct dma_fence *f;
>> +    int i;
>> +
>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>> process.
>> +     * If true, then return false as any KFD process needs all its
>> BOs to
>> +     * be resident to run successfully
>> +     */
>> +    flist = reservation_object_get_list(bo->resv);
>> +    if (flist) {
>> +        for (i = 0; i < flist->shared_count; ++i) {
>> +            f = rcu_dereference_protected(flist->shared[i],
>> +                reservation_object_held(bo->resv));
>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>> +                return false;
>> +        }
>> +    }
>>         switch (bo->mem.mem_type) {
>>       case TTM_PL_TT:
>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> index 94eab54..9e35249 100644
>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>> @@ -30,6 +30,7 @@
>>     #include <linux/types.h>
>>   #include <linux/bitmap.h>
>> +#include <linux/dma-fence.h>
>>     struct pci_dev;
>>   @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>    *
>>    * @resume: Notifies amdkfd about a resume action done to a kgd device
>>    *
>> + * @schedule_evict_and_restore_process: Schedules work queue that
>> will prepare
>> + * for safe eviction of KFD BOs that belong to the specified process.
>> + *
>>    * This structure contains function callback pointers so the kgd
>> driver
>>    * will notify to the amdkfd about certain status changes.
>>    *
>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>       void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>>       void (*suspend)(struct kfd_dev *kfd);
>>       int (*resume)(struct kfd_dev *kfd);
>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>> +            struct dma_fence *fence);
>>   };
>>     int kgd2kfd_init(unsigned interface_version,
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]             ` <05cc2831-a338-ddae-42c5-8be381787a5e-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-28 23:55               ` Felix Kuehling
       [not found]                 ` <9697c103-f6cd-b7c9-a0a1-5f9ff080f789-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-28 23:55 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Harish Kasiviswanathan
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

[+Harish, forgot to acknowledge him in the commit description, will fix
that in v2]

Harish, please see Christian's question below in amd_kfd_fence_signal.
Did I understand this correctly?

Regards,
  Felix

On 2018-01-28 06:42 PM, Felix Kuehling wrote:
> On 2018-01-27 04:16 AM, Christian König wrote:
>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
> [snip]
>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>>> +                               void *mm)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>> +
>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>> +    if (fence == NULL)
>>> +        return NULL;
>>> +
>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>> +     * KFD process. Don't dereference it. Fence and any threads using
>>> +     * mm is guranteed to be released before process termination.
>>> +     */
>>> +    fence->mm = mm;
>> That won't work. Fences can live much longer than any process who
>> created them.
>>
>> I've already found a fence in a BO still living hours after the
>> process was killed and the pid long recycled.
>>
>> I suggest to make fence->mm a real mm_struct pointer with reference
>> counting and then set it to NULL and drop the reference in
>> enable_signaling.
> I agree. But enable_signaling may be too early to drop the reference.
> amd_kfd_fence_check_mm could still be called later from
> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't signaled yet.
>
> The safe place is problably in amd_kfd_fence_release.
>
>>> +    get_task_comm(fence->timeline_name, current);
>>> +    spin_lock_init(&fence->lock);
>>> +
>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>> +           context, atomic_inc_return(&fence_seq));
>>> +
>>> +    return fence;
>>> +}
>>> +
>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence;
>>> +
>>> +    if (!f)
>>> +        return NULL;
>>> +
>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>> +        return fence;
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
>>> +{
>>> +    return "amdgpu_amdkfd_fence";
>>> +}
>>> +
>>> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +
>>> +    return fence->timeline_name;
>>> +}
>>> +
>>> +/**
>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>>> to evict
>>> + *  a KFD BO and schedules a job to move the BO.
>>> + *  If fence is already signaled return true.
>>> + *  If fence is not signaled schedule a evict KFD process work item.
>>> + */
>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +
>>> +    if (!fence)
>>> +        return false;
>>> +
>>> +    if (dma_fence_is_signaled(f))
>>> +        return true;
>>> +
>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>> +                (struct mm_struct *)fence->mm, f))
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>> +{
>>> +    unsigned long flags;
>>> +    int ret;
>>> +
>>> +    spin_lock_irqsave(f->lock, flags);
>>> +    /* Set enabled bit so cb will called */
>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>> Mhm, why is that necessary?
> This only gets called from fence_release below. I think this is to avoid
> needlessly scheduling an eviction/restore cycle when an eviction fence
> gets destroyed that hasn't been triggered before, probably during
> process termination.
>
> Harish, do you remember any other reason for this?
>
>>> +    ret = dma_fence_signal_locked(f);
>>> +    spin_unlock_irqrestore(f->lock, flags);
>>> +
>>> +    return ret;
>>> +}
>>> +
>>> +/**
>>> + * amd_kfd_fence_release - callback that fence can be freed
>>> + *
>>> + * @fence: fence
>>> + *
>>> + * This function is called when the reference count becomes zero.
>>> + * It just RCU schedules freeing up the fence.
>>> + */
>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +    /* Unconditionally signal the fence. The process is getting
>>> +     * terminated.
>>> +     */
>>> +    if (WARN_ON(!fence))
>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>> +
>>> +    amd_kfd_fence_signal(f);
>>> +    kfree_rcu(f, rcu);
>>> +}
>>> +
>>> +/**
>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>> fence @f
>>> + *  if same return TRUE else return FALSE.
>>> + *
>>> + * @f: [IN] fence
>>> + * @mm: [IN] mm that needs to be verified
>>> + */
>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>> +{
>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>> +
>>> +    if (!fence)
>>> +        return false;
>>> +    else if (fence->mm == mm)
>>> +        return true;
>>> +
>>> +    return false;
>>> +}
>>> +
>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>> +    .signaled = NULL,
>>> +    .wait = dma_fence_default_wait,
>>> +    .release = amd_kfd_fence_release,
>>> +};
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index 65d5a4e..ca00dd2 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -36,8 +36,9 @@
>>>   #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>     /* some special values for the owner field */
>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>     #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>   #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> index df65c66..0cb31d9 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>> @@ -31,6 +31,7 @@
>>>   #include <drm/drmP.h>
>>>   #include "amdgpu.h"
>>>   #include "amdgpu_trace.h"
>>> +#include "amdgpu_amdkfd.h"
>>>     struct amdgpu_sync_entry {
>>>       struct hlist_node    node;
>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>> amdgpu_device *adev,
>>>   static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>   {
>>>       struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>> +
>>> +    if (!f)
>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>> When you add the extra NULL check here then please move the
>> to_drm_sched_fence() after it as well.
> Yeah, makes sense.
>
> Regards,
>   Felix
>
>> Christian.
>>
>>>         if (s_fence)
>>>           return s_fence->owner;
>>>   +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>> +    if (kfd_fence)
>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>> +
>>>       return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>   }
>>>   @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>>>       for (i = 0; i < flist->shared_count; ++i) {
>>>           f = rcu_dereference_protected(flist->shared[i],
>>>                             reservation_object_held(resv));
>>> +        /* We only want to trigger KFD eviction fences on
>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>> +         */
>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>> +            continue;
>>> +
>>>           if (amdgpu_sync_same_dev(adev, f)) {
>>>               /* VM updates are only interesting
>>>                * for other VM updates and moves.
>>>                */
>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>               if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>                   (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>                   ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> index e4bb435..c3f33d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>> @@ -46,6 +46,7 @@
>>>   #include "amdgpu.h"
>>>   #include "amdgpu_object.h"
>>>   #include "amdgpu_trace.h"
>>> +#include "amdgpu_amdkfd.h"
>>>   #include "bif/bif_4_1_d.h"
>>>     #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>> @@ -1170,6 +1171,23 @@ static bool
>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>   {
>>>       unsigned long num_pages = bo->mem.num_pages;
>>>       struct drm_mm_node *node = bo->mem.mm_node;
>>> +    struct reservation_object_list *flist;
>>> +    struct dma_fence *f;
>>> +    int i;
>>> +
>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>> process.
>>> +     * If true, then return false as any KFD process needs all its
>>> BOs to
>>> +     * be resident to run successfully
>>> +     */
>>> +    flist = reservation_object_get_list(bo->resv);
>>> +    if (flist) {
>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>> +            f = rcu_dereference_protected(flist->shared[i],
>>> +                reservation_object_held(bo->resv));
>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>> +                return false;
>>> +        }
>>> +    }
>>>         switch (bo->mem.mem_type) {
>>>       case TTM_PL_TT:
>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> index 94eab54..9e35249 100644
>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>> @@ -30,6 +30,7 @@
>>>     #include <linux/types.h>
>>>   #include <linux/bitmap.h>
>>> +#include <linux/dma-fence.h>
>>>     struct pci_dev;
>>>   @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>    *
>>>    * @resume: Notifies amdkfd about a resume action done to a kgd device
>>>    *
>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>> will prepare
>>> + * for safe eviction of KFD BOs that belong to the specified process.
>>> + *
>>>    * This structure contains function callback pointers so the kgd
>>> driver
>>>    * will notify to the amdkfd about certain status changes.
>>>    *
>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>       void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>>>       void (*suspend)(struct kfd_dev *kfd);
>>>       int (*resume)(struct kfd_dev *kfd);
>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>>> +            struct dma_fence *fence);
>>>   };
>>>     int kgd2kfd_init(unsigned interface_version,

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]             ` <7425b235-e354-e9b7-0b83-623d9148c61b-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-29 11:42               ` Christian König
       [not found]                 ` <37bf2205-ca7c-f441-1759-48f2d854dea5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2018-01-29 11:42 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 29.01.2018 um 00:02 schrieb Felix Kuehling:
> On 2018-01-27 04:19 AM, Christian König wrote:
>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>> Add GPUVM size and DRM render node. Also add function to query the
>>> VMID mask to avoid hard-coding it in multiple places later.
>>>
>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19
>>> +++++++++++++++++--
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>>>    drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>>>    3 files changed, 25 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> index c9f204d..294c467 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> @@ -30,6 +30,8 @@
>>>    const struct kgd2kfd_calls *kgd2kfd;
>>>    bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>>>    +static const unsigned int compute_vmid_bitmap = 0xFF00;
>>> +
>>>    int amdgpu_amdkfd_init(void)
>>>    {
>>>        int ret;
>>> @@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct
>>> amdgpu_device *adev)
>>>        int last_valid_bit;
>>>        if (adev->kfd) {
>>>            struct kgd2kfd_shared_resources gpu_resources = {
>>> -            .compute_vmid_bitmap = 0xFF00,
>>> +            .compute_vmid_bitmap = compute_vmid_bitmap,
>>>                .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
>>> -            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
>>> +            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
>>> +            .gpuvm_size = adev->vm_manager.max_pfn
>>> +                        << AMDGPU_GPU_PAGE_SHIFT,
>> That most likely doesn't work as intended on Vega10. The address space
>> is divided into an upper and a lower range, but max_pfn includes both.
>>
>> I suggest to use something like min(adev->vm_manager.max_pfn <<
>> AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).
> I think this is fine as it is. This just tells the Thunk the size of the
> virtual address space supported by the GPU. Currently the Thunk only
> uses 40-bits for SVM. But eventually it will be able to use the entire
> 47 bits of user address space. Any excess address space will just go unused.
>
> I'm also wondering how universal the split 48-bit virtual address space
> layout is. Even for x86_64 there seems to be a 5-level page table layout
> that supports 56 bits of user mode addresses
> (Documentation/x86/x86_64/mm.txt). AArch64 seems to support 48-bit user
> mode addresses (Documentation/arm64/memory.txt). I haven't found similar
> information for PowerPC yet.
>
> We should avoid coding too much architecture-specific logic into this
> driver that's supposed to support other architectures as well. I should
> also review the aperture placement with bigger user mode address spaces
> in mind.

And that is exactly the reason why I've suggested to change that.

See the split between lower and upper range is not related to the CPU 
configuration, instead it is a property of our GPU setup and hardware 
generation.

So we should either split it in gpuvm_size_low/gpuvm_size_high or to 
just use the lower range and limit the value as suggested above.

This should avoid having any GPU generation and CPU configuration 
specific logic in the common interface.

Regards,
Christian.

>
> Regards,
>    Felix

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                 ` <9697c103-f6cd-b7c9-a0a1-5f9ff080f789-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-29 13:43                   ` Christian König
       [not found]                     ` <fa409dd6-6a4e-ea4b-6570-9b16ed4cb4a4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2018-01-29 13:43 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Harish Kasiviswanathan
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Hi Felix & Harish,

maybe explain why I found that odd: dma_fence_add_callback() sets the 
DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.

So the flag should always be set when there are callbacks.

Did I miss anything?

Regards,
Christian.

Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
> [+Harish, forgot to acknowledge him in the commit description, will fix
> that in v2]
>
> Harish, please see Christian's question below in amd_kfd_fence_signal.
> Did I understand this correctly?
>
> Regards,
>    Felix
>
> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>> On 2018-01-27 04:16 AM, Christian König wrote:
>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>> [snip]
>>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>>>> +                               void *mm)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>>> +
>>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>> +    if (fence == NULL)
>>>> +        return NULL;
>>>> +
>>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>>> +     * KFD process. Don't dereference it. Fence and any threads using
>>>> +     * mm is guranteed to be released before process termination.
>>>> +     */
>>>> +    fence->mm = mm;
>>> That won't work. Fences can live much longer than any process who
>>> created them.
>>>
>>> I've already found a fence in a BO still living hours after the
>>> process was killed and the pid long recycled.
>>>
>>> I suggest to make fence->mm a real mm_struct pointer with reference
>>> counting and then set it to NULL and drop the reference in
>>> enable_signaling.
>> I agree. But enable_signaling may be too early to drop the reference.
>> amd_kfd_fence_check_mm could still be called later from
>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't signaled yet.
>>
>> The safe place is problably in amd_kfd_fence_release.
>>
>>>> +    get_task_comm(fence->timeline_name, current);
>>>> +    spin_lock_init(&fence->lock);
>>>> +
>>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>>> +           context, atomic_inc_return(&fence_seq));
>>>> +
>>>> +    return fence;
>>>> +}
>>>> +
>>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence;
>>>> +
>>>> +    if (!f)
>>>> +        return NULL;
>>>> +
>>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>>> +        return fence;
>>>> +
>>>> +    return NULL;
>>>> +}
>>>> +
>>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
>>>> +{
>>>> +    return "amdgpu_amdkfd_fence";
>>>> +}
>>>> +
>>>> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>> +
>>>> +    return fence->timeline_name;
>>>> +}
>>>> +
>>>> +/**
>>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>>>> to evict
>>>> + *  a KFD BO and schedules a job to move the BO.
>>>> + *  If fence is already signaled return true.
>>>> + *  If fence is not signaled schedule a evict KFD process work item.
>>>> + */
>>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>> +
>>>> +    if (!fence)
>>>> +        return false;
>>>> +
>>>> +    if (dma_fence_is_signaled(f))
>>>> +        return true;
>>>> +
>>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>>> +                (struct mm_struct *)fence->mm, f))
>>>> +        return true;
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>>> +{
>>>> +    unsigned long flags;
>>>> +    int ret;
>>>> +
>>>> +    spin_lock_irqsave(f->lock, flags);
>>>> +    /* Set enabled bit so cb will called */
>>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>>> Mhm, why is that necessary?
>> This only gets called from fence_release below. I think this is to avoid
>> needlessly scheduling an eviction/restore cycle when an eviction fence
>> gets destroyed that hasn't been triggered before, probably during
>> process termination.
>>
>> Harish, do you remember any other reason for this?
>>
>>>> +    ret = dma_fence_signal_locked(f);
>>>> +    spin_unlock_irqrestore(f->lock, flags);
>>>> +
>>>> +    return ret;
>>>> +}
>>>> +
>>>> +/**
>>>> + * amd_kfd_fence_release - callback that fence can be freed
>>>> + *
>>>> + * @fence: fence
>>>> + *
>>>> + * This function is called when the reference count becomes zero.
>>>> + * It just RCU schedules freeing up the fence.
>>>> + */
>>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>> +    /* Unconditionally signal the fence. The process is getting
>>>> +     * terminated.
>>>> +     */
>>>> +    if (WARN_ON(!fence))
>>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>>> +
>>>> +    amd_kfd_fence_signal(f);
>>>> +    kfree_rcu(f, rcu);
>>>> +}
>>>> +
>>>> +/**
>>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>>> fence @f
>>>> + *  if same return TRUE else return FALSE.
>>>> + *
>>>> + * @f: [IN] fence
>>>> + * @mm: [IN] mm that needs to be verified
>>>> + */
>>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>>> +{
>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>> +
>>>> +    if (!fence)
>>>> +        return false;
>>>> +    else if (fence->mm == mm)
>>>> +        return true;
>>>> +
>>>> +    return false;
>>>> +}
>>>> +
>>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>>> +    .signaled = NULL,
>>>> +    .wait = dma_fence_default_wait,
>>>> +    .release = amd_kfd_fence_release,
>>>> +};
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> index 65d5a4e..ca00dd2 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>> @@ -36,8 +36,9 @@
>>>>    #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>>      /* some special values for the owner field */
>>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>>      #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>>    #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> index df65c66..0cb31d9 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>> @@ -31,6 +31,7 @@
>>>>    #include <drm/drmP.h>
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_trace.h"
>>>> +#include "amdgpu_amdkfd.h"
>>>>      struct amdgpu_sync_entry {
>>>>        struct hlist_node    node;
>>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>>> amdgpu_device *adev,
>>>>    static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>>    {
>>>>        struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>>> +
>>>> +    if (!f)
>>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>> When you add the extra NULL check here then please move the
>>> to_drm_sched_fence() after it as well.
>> Yeah, makes sense.
>>
>> Regards,
>>    Felix
>>
>>> Christian.
>>>
>>>>          if (s_fence)
>>>>            return s_fence->owner;
>>>>    +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>>> +    if (kfd_fence)
>>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>>> +
>>>>        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>    }
>>>>    @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>>>>        for (i = 0; i < flist->shared_count; ++i) {
>>>>            f = rcu_dereference_protected(flist->shared[i],
>>>>                              reservation_object_held(resv));
>>>> +        /* We only want to trigger KFD eviction fences on
>>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>>> +         */
>>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>> +            continue;
>>>> +
>>>>            if (amdgpu_sync_same_dev(adev, f)) {
>>>>                /* VM updates are only interesting
>>>>                 * for other VM updates and moves.
>>>>                 */
>>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>>                if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>                    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>                    ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index e4bb435..c3f33d3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -46,6 +46,7 @@
>>>>    #include "amdgpu.h"
>>>>    #include "amdgpu_object.h"
>>>>    #include "amdgpu_trace.h"
>>>> +#include "amdgpu_amdkfd.h"
>>>>    #include "bif/bif_4_1_d.h"
>>>>      #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>> @@ -1170,6 +1171,23 @@ static bool
>>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>>    {
>>>>        unsigned long num_pages = bo->mem.num_pages;
>>>>        struct drm_mm_node *node = bo->mem.mm_node;
>>>> +    struct reservation_object_list *flist;
>>>> +    struct dma_fence *f;
>>>> +    int i;
>>>> +
>>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>>> process.
>>>> +     * If true, then return false as any KFD process needs all its
>>>> BOs to
>>>> +     * be resident to run successfully
>>>> +     */
>>>> +    flist = reservation_object_get_list(bo->resv);
>>>> +    if (flist) {
>>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>>> +            f = rcu_dereference_protected(flist->shared[i],
>>>> +                reservation_object_held(bo->resv));
>>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>>> +                return false;
>>>> +        }
>>>> +    }
>>>>          switch (bo->mem.mem_type) {
>>>>        case TTM_PL_TT:
>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> index 94eab54..9e35249 100644
>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>> @@ -30,6 +30,7 @@
>>>>      #include <linux/types.h>
>>>>    #include <linux/bitmap.h>
>>>> +#include <linux/dma-fence.h>
>>>>      struct pci_dev;
>>>>    @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>>     *
>>>>     * @resume: Notifies amdkfd about a resume action done to a kgd device
>>>>     *
>>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>>> will prepare
>>>> + * for safe eviction of KFD BOs that belong to the specified process.
>>>> + *
>>>>     * This structure contains function callback pointers so the kgd
>>>> driver
>>>>     * will notify to the amdkfd about certain status changes.
>>>>     *
>>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>>        void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>>>>        void (*suspend)(struct kfd_dev *kfd);
>>>>        int (*resume)(struct kfd_dev *kfd);
>>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>>>> +            struct dma_fence *fence);
>>>>    };
>>>>      int kgd2kfd_init(unsigned interface_version,
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                     ` <fa409dd6-6a4e-ea4b-6570-9b16ed4cb4a4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-29 19:39                       ` Felix Kuehling
       [not found]                         ` <d864b662-2212-4aa6-2dac-f0ee3157681e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-29 19:39 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Harish Kasiviswanathan
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 2018-01-29 08:43 AM, Christian König wrote:
> Hi Felix & Harish,
>
> maybe explain why I found that odd: dma_fence_add_callback() sets the
> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>
> So the flag should always be set when there are callbacks.
> Did I miss anything?

I don't think we add any callbacks to our eviction fences.

Regards,
  Felix

>
> Regards,
> Christian.
>
> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
>> [+Harish, forgot to acknowledge him in the commit description, will fix
>> that in v2]
>>
>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>> Did I understand this correctly?
>>
>> Regards,
>>    Felix
>>
>> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>>> On 2018-01-27 04:16 AM, Christian König wrote:
>>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>> [snip]
>>>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>>>>> +                               void *mm)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>>>> +
>>>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>>> +    if (fence == NULL)
>>>>> +        return NULL;
>>>>> +
>>>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>>>> +     * KFD process. Don't dereference it. Fence and any threads
>>>>> using
>>>>> +     * mm is guranteed to be released before process termination.
>>>>> +     */
>>>>> +    fence->mm = mm;
>>>> That won't work. Fences can live much longer than any process who
>>>> created them.
>>>>
>>>> I've already found a fence in a BO still living hours after the
>>>> process was killed and the pid long recycled.
>>>>
>>>> I suggest to make fence->mm a real mm_struct pointer with reference
>>>> counting and then set it to NULL and drop the reference in
>>>> enable_signaling.
>>> I agree. But enable_signaling may be too early to drop the reference.
>>> amd_kfd_fence_check_mm could still be called later from
>>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
>>> signaled yet.
>>>
>>> The safe place is problably in amd_kfd_fence_release.
>>>
>>>>> +    get_task_comm(fence->timeline_name, current);
>>>>> +    spin_lock_init(&fence->lock);
>>>>> +
>>>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>>>> +           context, atomic_inc_return(&fence_seq));
>>>>> +
>>>>> +    return fence;
>>>>> +}
>>>>> +
>>>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
>>>>> dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence;
>>>>> +
>>>>> +    if (!f)
>>>>> +        return NULL;
>>>>> +
>>>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>>>> +        return fence;
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
>>>>> *f)
>>>>> +{
>>>>> +    return "amdgpu_amdkfd_fence";
>>>>> +}
>>>>> +
>>>>> +static const char *amd_kfd_fence_get_timeline_name(struct
>>>>> dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    return fence->timeline_name;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>>>>> to evict
>>>>> + *  a KFD BO and schedules a job to move the BO.
>>>>> + *  If fence is already signaled return true.
>>>>> + *  If fence is not signaled schedule a evict KFD process work item.
>>>>> + */
>>>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    if (!fence)
>>>>> +        return false;
>>>>> +
>>>>> +    if (dma_fence_is_signaled(f))
>>>>> +        return true;
>>>>> +
>>>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>>>> +                (struct mm_struct *)fence->mm, f))
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>>>> +{
>>>>> +    unsigned long flags;
>>>>> +    int ret;
>>>>> +
>>>>> +    spin_lock_irqsave(f->lock, flags);
>>>>> +    /* Set enabled bit so cb will called */
>>>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>>>> Mhm, why is that necessary?
>>> This only gets called from fence_release below. I think this is to
>>> avoid
>>> needlessly scheduling an eviction/restore cycle when an eviction fence
>>> gets destroyed that hasn't been triggered before, probably during
>>> process termination.
>>>
>>> Harish, do you remember any other reason for this?
>>>
>>>>> +    ret = dma_fence_signal_locked(f);
>>>>> +    spin_unlock_irqrestore(f->lock, flags);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_release - callback that fence can be freed
>>>>> + *
>>>>> + * @fence: fence
>>>>> + *
>>>>> + * This function is called when the reference count becomes zero.
>>>>> + * It just RCU schedules freeing up the fence.
>>>>> + */
>>>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +    /* Unconditionally signal the fence. The process is getting
>>>>> +     * terminated.
>>>>> +     */
>>>>> +    if (WARN_ON(!fence))
>>>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>>>> +
>>>>> +    amd_kfd_fence_signal(f);
>>>>> +    kfree_rcu(f, rcu);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>>>> fence @f
>>>>> + *  if same return TRUE else return FALSE.
>>>>> + *
>>>>> + * @f: [IN] fence
>>>>> + * @mm: [IN] mm that needs to be verified
>>>>> + */
>>>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    if (!fence)
>>>>> +        return false;
>>>>> +    else if (fence->mm == mm)
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>>>> +    .signaled = NULL,
>>>>> +    .wait = dma_fence_default_wait,
>>>>> +    .release = amd_kfd_fence_release,
>>>>> +};
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> index 65d5a4e..ca00dd2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> @@ -36,8 +36,9 @@
>>>>>    #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>>>      /* some special values for the owner field */
>>>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>>>      #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>>>    #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> index df65c66..0cb31d9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> @@ -31,6 +31,7 @@
>>>>>    #include <drm/drmP.h>
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_trace.h"
>>>>> +#include "amdgpu_amdkfd.h"
>>>>>      struct amdgpu_sync_entry {
>>>>>        struct hlist_node    node;
>>>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>>>> amdgpu_device *adev,
>>>>>    static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>>>    {
>>>>>        struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>>>> +
>>>>> +    if (!f)
>>>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>> When you add the extra NULL check here then please move the
>>>> to_drm_sched_fence() after it as well.
>>> Yeah, makes sense.
>>>
>>> Regards,
>>>    Felix
>>>
>>>> Christian.
>>>>
>>>>>          if (s_fence)
>>>>>            return s_fence->owner;
>>>>>    +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>>>> +    if (kfd_fence)
>>>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>>>> +
>>>>>        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>>    }
>>>>>    @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device
>>>>> *adev,
>>>>>        for (i = 0; i < flist->shared_count; ++i) {
>>>>>            f = rcu_dereference_protected(flist->shared[i],
>>>>>                              reservation_object_held(resv));
>>>>> +        /* We only want to trigger KFD eviction fences on
>>>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>>>> +         */
>>>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>>> +            continue;
>>>>> +
>>>>>            if (amdgpu_sync_same_dev(adev, f)) {
>>>>>                /* VM updates are only interesting
>>>>>                 * for other VM updates and moves.
>>>>>                 */
>>>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>>>                if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>                    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>                    ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index e4bb435..c3f33d3 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -46,6 +46,7 @@
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_object.h"
>>>>>    #include "amdgpu_trace.h"
>>>>> +#include "amdgpu_amdkfd.h"
>>>>>    #include "bif/bif_4_1_d.h"
>>>>>      #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>>> @@ -1170,6 +1171,23 @@ static bool
>>>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>>>    {
>>>>>        unsigned long num_pages = bo->mem.num_pages;
>>>>>        struct drm_mm_node *node = bo->mem.mm_node;
>>>>> +    struct reservation_object_list *flist;
>>>>> +    struct dma_fence *f;
>>>>> +    int i;
>>>>> +
>>>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>>>> process.
>>>>> +     * If true, then return false as any KFD process needs all its
>>>>> BOs to
>>>>> +     * be resident to run successfully
>>>>> +     */
>>>>> +    flist = reservation_object_get_list(bo->resv);
>>>>> +    if (flist) {
>>>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>>>> +            f = rcu_dereference_protected(flist->shared[i],
>>>>> +                reservation_object_held(bo->resv));
>>>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>>>> +                return false;
>>>>> +        }
>>>>> +    }
>>>>>          switch (bo->mem.mem_type) {
>>>>>        case TTM_PL_TT:
>>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> index 94eab54..9e35249 100644
>>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> @@ -30,6 +30,7 @@
>>>>>      #include <linux/types.h>
>>>>>    #include <linux/bitmap.h>
>>>>> +#include <linux/dma-fence.h>
>>>>>      struct pci_dev;
>>>>>    @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>>>     *
>>>>>     * @resume: Notifies amdkfd about a resume action done to a kgd
>>>>> device
>>>>>     *
>>>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>>>> will prepare
>>>>> + * for safe eviction of KFD BOs that belong to the specified
>>>>> process.
>>>>> + *
>>>>>     * This structure contains function callback pointers so the kgd
>>>>> driver
>>>>>     * will notify to the amdkfd about certain status changes.
>>>>>     *
>>>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>>>        void (*interrupt)(struct kfd_dev *kfd, const void
>>>>> *ih_ring_entry);
>>>>>        void (*suspend)(struct kfd_dev *kfd);
>>>>>        int (*resume)(struct kfd_dev *kfd);
>>>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>>>>> +            struct dma_fence *fence);
>>>>>    };
>>>>>      int kgd2kfd_init(unsigned interface_version,
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]                 ` <37bf2205-ca7c-f441-1759-48f2d854dea5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-29 20:25                   ` Felix Kuehling
       [not found]                     ` <4aefa3bc-66b3-5e39-26e6-cc7c1e66adbd-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-29 20:25 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 2018-01-29 06:42 AM, Christian König wrote:
> Am 29.01.2018 um 00:02 schrieb Felix Kuehling:
>> On 2018-01-27 04:19 AM, Christian König wrote:
>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>>> Add GPUVM size and DRM render node. Also add function to query the
>>>> VMID mask to avoid hard-coding it in multiple places later.
>>>>
>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>> ---
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19
>>>> +++++++++++++++++--
>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>>>>    drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>>>>    3 files changed, 25 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> index c9f204d..294c467 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>> @@ -30,6 +30,8 @@
>>>>    const struct kgd2kfd_calls *kgd2kfd;
>>>>    bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>>>>    +static const unsigned int compute_vmid_bitmap = 0xFF00;
>>>> +
>>>>    int amdgpu_amdkfd_init(void)
>>>>    {
>>>>        int ret;
>>>> @@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct
>>>> amdgpu_device *adev)
>>>>        int last_valid_bit;
>>>>        if (adev->kfd) {
>>>>            struct kgd2kfd_shared_resources gpu_resources = {
>>>> -            .compute_vmid_bitmap = 0xFF00,
>>>> +            .compute_vmid_bitmap = compute_vmid_bitmap,
>>>>                .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
>>>> -            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
>>>> +            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
>>>> +            .gpuvm_size = adev->vm_manager.max_pfn
>>>> +                        << AMDGPU_GPU_PAGE_SHIFT,
>>> That most likely doesn't work as intended on Vega10. The address space
>>> is divided into an upper and a lower range, but max_pfn includes both.
>>>
>>> I suggest to use something like min(adev->vm_manager.max_pfn <<
>>> AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).
>> I think this is fine as it is. This just tells the Thunk the size of the
>> virtual address space supported by the GPU. Currently the Thunk only
>> uses 40-bits for SVM. But eventually it will be able to use the entire
>> 47 bits of user address space. Any excess address space will just go
>> unused.
>>
>> I'm also wondering how universal the split 48-bit virtual address space
>> layout is. Even for x86_64 there seems to be a 5-level page table layout
>> that supports 56 bits of user mode addresses
>> (Documentation/x86/x86_64/mm.txt). AArch64 seems to support 48-bit user
>> mode addresses (Documentation/arm64/memory.txt). I haven't found similar
>> information for PowerPC yet.
>>
>> We should avoid coding too much architecture-specific logic into this
>> driver that's supposed to support other architectures as well. I should
>> also review the aperture placement with bigger user mode address spaces
>> in mind.
>
> And that is exactly the reason why I've suggested to change that.
>
> See the split between lower and upper range is not related to the CPU
> configuration, instead it is a property of our GPU setup and hardware
> generation.

How exactly does the GPUVM hardware behave when it sees a virtual
address "in the hole". Does it generate a VM fault? Or does it simply
ignore the high bits?

If it ignored the high bits, it would work OK for both x86_64 (with a
split address space) and ARM64 (with a full 48-bit user mode virtual
address space).

On the other hand, if addresses in the hole generate a VM fault, then I
agree with you, and we should only report 47-bits of virtual address
space to user mode for SVM purposes.

Regards,
  Felix

>
> So we should either split it in gpuvm_size_low/gpuvm_size_high or to
> just use the lower range and limit the value as suggested above.
>
> This should avoid having any GPU generation and CPU configuration
> specific logic in the common interface.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support
       [not found]                     ` <4aefa3bc-66b3-5e39-26e6-cc7c1e66adbd-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-30  9:13                       ` Christian König
  0 siblings, 0 replies; 44+ messages in thread
From: Christian König @ 2018-01-30  9:13 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 29.01.2018 um 21:25 schrieb Felix Kuehling:
> On 2018-01-29 06:42 AM, Christian König wrote:
>> Am 29.01.2018 um 00:02 schrieb Felix Kuehling:
>>> On 2018-01-27 04:19 AM, Christian König wrote:
>>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>>>> Add GPUVM size and DRM render node. Also add function to query the
>>>>> VMID mask to avoid hard-coding it in multiple places later.
>>>>>
>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c      | 19
>>>>> +++++++++++++++++--
>>>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h      |  2 ++
>>>>>     drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++++
>>>>>     3 files changed, 25 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>> index c9f204d..294c467 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>>>> @@ -30,6 +30,8 @@
>>>>>     const struct kgd2kfd_calls *kgd2kfd;
>>>>>     bool (*kgd2kfd_init_p)(unsigned int, const struct kgd2kfd_calls**);
>>>>>     +static const unsigned int compute_vmid_bitmap = 0xFF00;
>>>>> +
>>>>>     int amdgpu_amdkfd_init(void)
>>>>>     {
>>>>>         int ret;
>>>>> @@ -137,9 +139,12 @@ void amdgpu_amdkfd_device_init(struct
>>>>> amdgpu_device *adev)
>>>>>         int last_valid_bit;
>>>>>         if (adev->kfd) {
>>>>>             struct kgd2kfd_shared_resources gpu_resources = {
>>>>> -            .compute_vmid_bitmap = 0xFF00,
>>>>> +            .compute_vmid_bitmap = compute_vmid_bitmap,
>>>>>                 .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
>>>>> -            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
>>>>> +            .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe,
>>>>> +            .gpuvm_size = adev->vm_manager.max_pfn
>>>>> +                        << AMDGPU_GPU_PAGE_SHIFT,
>>>> That most likely doesn't work as intended on Vega10. The address space
>>>> is divided into an upper and a lower range, but max_pfn includes both.
>>>>
>>>> I suggest to use something like min(adev->vm_manager.max_pfn <<
>>>> AMDGPU_GPU_PAGE_SHIFT, AMDGPU_VM_HOLE_START).
>>> I think this is fine as it is. This just tells the Thunk the size of the
>>> virtual address space supported by the GPU. Currently the Thunk only
>>> uses 40-bits for SVM. But eventually it will be able to use the entire
>>> 47 bits of user address space. Any excess address space will just go
>>> unused.
>>>
>>> I'm also wondering how universal the split 48-bit virtual address space
>>> layout is. Even for x86_64 there seems to be a 5-level page table layout
>>> that supports 56 bits of user mode addresses
>>> (Documentation/x86/x86_64/mm.txt). AArch64 seems to support 48-bit user
>>> mode addresses (Documentation/arm64/memory.txt). I haven't found similar
>>> information for PowerPC yet.
>>>
>>> We should avoid coding too much architecture-specific logic into this
>>> driver that's supposed to support other architectures as well. I should
>>> also review the aperture placement with bigger user mode address spaces
>>> in mind.
>> And that is exactly the reason why I've suggested to change that.
>>
>> See the split between lower and upper range is not related to the CPU
>> configuration, instead it is a property of our GPU setup and hardware
>> generation.
> How exactly does the GPUVM hardware behave when it sees a virtual
> address "in the hole". Does it generate a VM fault? Or does it simply
> ignore the high bits?

At least on the Vega10 system I've tested you get a range fault when you 
try to access the hole.

> If it ignored the high bits, it would work OK for both x86_64 (with a
> split address space) and ARM64 (with a full 48-bit user mode virtual
> address space).
>
> On the other hand, if addresses in the hole generate a VM fault, then I
> agree with you, and we should only report 47-bits of virtual address
> space to user mode for SVM purposes.

Yes, according to my testing exactly that is the case here.

The hardware documentation is a bit ambivalent. So I initially thought 
as well that the high bits are just ignored, but that doesn't seem to be 
the case.

Regards,
Christian.

>
> Regards,
>    Felix
>
>> So we should either split it in gpuvm_size_low/gpuvm_size_high or to
>> just use the lower range and limit the value as suggested above.
>>
>> This should avoid having any GPU generation and CPU configuration
>> specific logic in the common interface.
>>
>> Regards,
>> Christian.
>>
>>> Regards,
>>>     Felix
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* RE: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                         ` <d864b662-2212-4aa6-2dac-f0ee3157681e-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-30 15:28                           ` Kasiviswanathan, Harish
       [not found]                             ` <DM3PR1201MB103814597E6C7DB4632E278E8CE40-BBcFnVpqZhWjUUTFdQAMQmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Kasiviswanathan, Harish @ 2018-01-30 15:28 UTC (permalink / raw)
  To: Kuehling, Felix, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

>> [+Harish, forgot to acknowledge him in the commit description, will fix
>> that in v2]
>>
>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>> Did I understand this correctly?

[HK]: Yes the lifetime of eviction fences is tied to the lifetime of the process associated with it. When the process terminates the fence is signaled and released. For all the BOs that belong to this process the eviction should be detached from it when the BO is released. However, this eviction fence could be still attached to shared BOs. So signaling it frees those BOs.


On 2018-01-29 08:43 AM, Christian König wrote:
> Hi Felix & Harish,
>
> maybe explain why I found that odd: dma_fence_add_callback() sets the
> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>
> So the flag should always be set when there are callbacks.
> Did I miss anything?

I don't think we add any callbacks to our eviction fences.

[HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was my oversight. Since, dma_fence_signal() function called cb_list functions only if DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought it was safe to set it. However, the cb_list would be empty if no callbacks are added. So setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is redundant.

Best Regards,
Harish

 

Regards,
  Felix

>
> Regards,
> Christian.
>
> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
>> [+Harish, forgot to acknowledge him in the commit description, will fix
>> that in v2]
>>
>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>> Did I understand this correctly?
>>
>> Regards,
>>    Felix
>>
>> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>>> On 2018-01-27 04:16 AM, Christian König wrote:
>>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>> [snip]
>>>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>>>>> +                               void *mm)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>>>> +
>>>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>>> +    if (fence == NULL)
>>>>> +        return NULL;
>>>>> +
>>>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>>>> +     * KFD process. Don't dereference it. Fence and any threads
>>>>> using
>>>>> +     * mm is guranteed to be released before process termination.
>>>>> +     */
>>>>> +    fence->mm = mm;
>>>> That won't work. Fences can live much longer than any process who
>>>> created them.
>>>>
>>>> I've already found a fence in a BO still living hours after the
>>>> process was killed and the pid long recycled.
>>>>
>>>> I suggest to make fence->mm a real mm_struct pointer with reference
>>>> counting and then set it to NULL and drop the reference in
>>>> enable_signaling.
>>> I agree. But enable_signaling may be too early to drop the reference.
>>> amd_kfd_fence_check_mm could still be called later from
>>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
>>> signaled yet.
>>>
>>> The safe place is problably in amd_kfd_fence_release.
>>>
>>>>> +    get_task_comm(fence->timeline_name, current);
>>>>> +    spin_lock_init(&fence->lock);
>>>>> +
>>>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>>>> +           context, atomic_inc_return(&fence_seq));
>>>>> +
>>>>> +    return fence;
>>>>> +}
>>>>> +
>>>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
>>>>> dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence;
>>>>> +
>>>>> +    if (!f)
>>>>> +        return NULL;
>>>>> +
>>>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>>>> +        return fence;
>>>>> +
>>>>> +    return NULL;
>>>>> +}
>>>>> +
>>>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
>>>>> *f)
>>>>> +{
>>>>> +    return "amdgpu_amdkfd_fence";
>>>>> +}
>>>>> +
>>>>> +static const char *amd_kfd_fence_get_timeline_name(struct
>>>>> dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    return fence->timeline_name;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>>>>> to evict
>>>>> + *  a KFD BO and schedules a job to move the BO.
>>>>> + *  If fence is already signaled return true.
>>>>> + *  If fence is not signaled schedule a evict KFD process work item.
>>>>> + */
>>>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    if (!fence)
>>>>> +        return false;
>>>>> +
>>>>> +    if (dma_fence_is_signaled(f))
>>>>> +        return true;
>>>>> +
>>>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>>>> +                (struct mm_struct *)fence->mm, f))
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>>>> +{
>>>>> +    unsigned long flags;
>>>>> +    int ret;
>>>>> +
>>>>> +    spin_lock_irqsave(f->lock, flags);
>>>>> +    /* Set enabled bit so cb will called */
>>>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>>>> Mhm, why is that necessary?
>>> This only gets called from fence_release below. I think this is to
>>> avoid
>>> needlessly scheduling an eviction/restore cycle when an eviction fence
>>> gets destroyed that hasn't been triggered before, probably during
>>> process termination.
>>>
>>> Harish, do you remember any other reason for this?
>>>
>>>>> +    ret = dma_fence_signal_locked(f);
>>>>> +    spin_unlock_irqrestore(f->lock, flags);
>>>>> +
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_release - callback that fence can be freed
>>>>> + *
>>>>> + * @fence: fence
>>>>> + *
>>>>> + * This function is called when the reference count becomes zero.
>>>>> + * It just RCU schedules freeing up the fence.
>>>>> + */
>>>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +    /* Unconditionally signal the fence. The process is getting
>>>>> +     * terminated.
>>>>> +     */
>>>>> +    if (WARN_ON(!fence))
>>>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>>>> +
>>>>> +    amd_kfd_fence_signal(f);
>>>>> +    kfree_rcu(f, rcu);
>>>>> +}
>>>>> +
>>>>> +/**
>>>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>>>> fence @f
>>>>> + *  if same return TRUE else return FALSE.
>>>>> + *
>>>>> + * @f: [IN] fence
>>>>> + * @mm: [IN] mm that needs to be verified
>>>>> + */
>>>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>>>> +{
>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>> +
>>>>> +    if (!fence)
>>>>> +        return false;
>>>>> +    else if (fence->mm == mm)
>>>>> +        return true;
>>>>> +
>>>>> +    return false;
>>>>> +}
>>>>> +
>>>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>>>> +    .signaled = NULL,
>>>>> +    .wait = dma_fence_default_wait,
>>>>> +    .release = amd_kfd_fence_release,
>>>>> +};
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> index 65d5a4e..ca00dd2 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>> @@ -36,8 +36,9 @@
>>>>>    #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>>>      /* some special values for the owner field */
>>>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>>>      #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>>>    #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> index df65c66..0cb31d9 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>> @@ -31,6 +31,7 @@
>>>>>    #include <drm/drmP.h>
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_trace.h"
>>>>> +#include "amdgpu_amdkfd.h"
>>>>>      struct amdgpu_sync_entry {
>>>>>        struct hlist_node    node;
>>>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>>>> amdgpu_device *adev,
>>>>>    static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>>>    {
>>>>>        struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>>>> +
>>>>> +    if (!f)
>>>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>> When you add the extra NULL check here then please move the
>>>> to_drm_sched_fence() after it as well.
>>> Yeah, makes sense.
>>>
>>> Regards,
>>>    Felix
>>>
>>>> Christian.
>>>>
>>>>>          if (s_fence)
>>>>>            return s_fence->owner;
>>>>>    +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>>>> +    if (kfd_fence)
>>>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>>>> +
>>>>>        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>>    }
>>>>>    @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device
>>>>> *adev,
>>>>>        for (i = 0; i < flist->shared_count; ++i) {
>>>>>            f = rcu_dereference_protected(flist->shared[i],
>>>>>                              reservation_object_held(resv));
>>>>> +        /* We only want to trigger KFD eviction fences on
>>>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>>>> +         */
>>>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>>> +            continue;
>>>>> +
>>>>>            if (amdgpu_sync_same_dev(adev, f)) {
>>>>>                /* VM updates are only interesting
>>>>>                 * for other VM updates and moves.
>>>>>                 */
>>>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>>>                if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>                    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>                    ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> index e4bb435..c3f33d3 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>> @@ -46,6 +46,7 @@
>>>>>    #include "amdgpu.h"
>>>>>    #include "amdgpu_object.h"
>>>>>    #include "amdgpu_trace.h"
>>>>> +#include "amdgpu_amdkfd.h"
>>>>>    #include "bif/bif_4_1_d.h"
>>>>>      #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>>> @@ -1170,6 +1171,23 @@ static bool
>>>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>>>    {
>>>>>        unsigned long num_pages = bo->mem.num_pages;
>>>>>        struct drm_mm_node *node = bo->mem.mm_node;
>>>>> +    struct reservation_object_list *flist;
>>>>> +    struct dma_fence *f;
>>>>> +    int i;
>>>>> +
>>>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>>>> process.
>>>>> +     * If true, then return false as any KFD process needs all its
>>>>> BOs to
>>>>> +     * be resident to run successfully
>>>>> +     */
>>>>> +    flist = reservation_object_get_list(bo->resv);
>>>>> +    if (flist) {
>>>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>>>> +            f = rcu_dereference_protected(flist->shared[i],
>>>>> +                reservation_object_held(bo->resv));
>>>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>>>> +                return false;
>>>>> +        }
>>>>> +    }
>>>>>          switch (bo->mem.mem_type) {
>>>>>        case TTM_PL_TT:
>>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> index 94eab54..9e35249 100644
>>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>> @@ -30,6 +30,7 @@
>>>>>      #include <linux/types.h>
>>>>>    #include <linux/bitmap.h>
>>>>> +#include <linux/dma-fence.h>
>>>>>      struct pci_dev;
>>>>>    @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>>>     *
>>>>>     * @resume: Notifies amdkfd about a resume action done to a kgd
>>>>> device
>>>>>     *
>>>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>>>> will prepare
>>>>> + * for safe eviction of KFD BOs that belong to the specified
>>>>> process.
>>>>> + *
>>>>>     * This structure contains function callback pointers so the kgd
>>>>> driver
>>>>>     * will notify to the amdkfd about certain status changes.
>>>>>     *
>>>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>>>        void (*interrupt)(struct kfd_dev *kfd, const void
>>>>> *ih_ring_entry);
>>>>>        void (*suspend)(struct kfd_dev *kfd);
>>>>>        int (*resume)(struct kfd_dev *kfd);
>>>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>>>>> +            struct dma_fence *fence);
>>>>>    };
>>>>>      int kgd2kfd_init(unsigned interface_version,
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                             ` <DM3PR1201MB103814597E6C7DB4632E278E8CE40-BBcFnVpqZhWjUUTFdQAMQmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
@ 2018-01-30 15:35                               ` Christian König
       [not found]                                 ` <a1c8d096-4cf5-0f36-b0d1-8ed705ba7fb2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Christian König @ 2018-01-30 15:35 UTC (permalink / raw)
  To: Kasiviswanathan, Harish, Kuehling, Felix, Koenig, Christian,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 30.01.2018 um 16:28 schrieb Kasiviswanathan, Harish:
>>> [+Harish, forgot to acknowledge him in the commit description, will fix
>>> that in v2]
>>>
>>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>>> Did I understand this correctly?
> [HK]: Yes the lifetime of eviction fences is tied to the lifetime of the process associated with it. When the process terminates the fence is signaled and released. For all the BOs that belong to this process the eviction should be detached from it when the BO is released. However, this eviction fence could be still attached to shared BOs. So signaling it frees those BOs.
>
>
> On 2018-01-29 08:43 AM, Christian König wrote:
>> Hi Felix & Harish,
>>
>> maybe explain why I found that odd: dma_fence_add_callback() sets the
>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>>
>> So the flag should always be set when there are callbacks.
>> Did I miss anything?
> I don't think we add any callbacks to our eviction fences.
>
> [HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was my oversight. Since, dma_fence_signal() function called cb_list functions only if DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought it was safe to set it. However, the cb_list would be empty if no callbacks are added. So setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is redundant.

Ok in this case let's just remove that and also use the 
dma_fence_signal() function (not the _locked variant) for signaling the 
DMA fence.

Thanks,
Christian.

>
> Best Regards,
> Harish
>
>   
>
> Regards,
>    Felix
>
>> Regards,
>> Christian.
>>
>> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
>>> [+Harish, forgot to acknowledge him in the commit description, will fix
>>> that in v2]
>>>
>>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>>> Did I understand this correctly?
>>>
>>> Regards,
>>>     Felix
>>>
>>> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>>>> On 2018-01-27 04:16 AM, Christian König wrote:
>>>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>>> [snip]
>>>>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
>>>>>> +                               void *mm)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>>>>> +
>>>>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>>>> +    if (fence == NULL)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>>>>> +     * KFD process. Don't dereference it. Fence and any threads
>>>>>> using
>>>>>> +     * mm is guranteed to be released before process termination.
>>>>>> +     */
>>>>>> +    fence->mm = mm;
>>>>> That won't work. Fences can live much longer than any process who
>>>>> created them.
>>>>>
>>>>> I've already found a fence in a BO still living hours after the
>>>>> process was killed and the pid long recycled.
>>>>>
>>>>> I suggest to make fence->mm a real mm_struct pointer with reference
>>>>> counting and then set it to NULL and drop the reference in
>>>>> enable_signaling.
>>>> I agree. But enable_signaling may be too early to drop the reference.
>>>> amd_kfd_fence_check_mm could still be called later from
>>>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
>>>> signaled yet.
>>>>
>>>> The safe place is problably in amd_kfd_fence_release.
>>>>
>>>>>> +    get_task_comm(fence->timeline_name, current);
>>>>>> +    spin_lock_init(&fence->lock);
>>>>>> +
>>>>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>>>>> +           context, atomic_inc_return(&fence_seq));
>>>>>> +
>>>>>> +    return fence;
>>>>>> +}
>>>>>> +
>>>>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
>>>>>> dma_fence *f)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence;
>>>>>> +
>>>>>> +    if (!f)
>>>>>> +        return NULL;
>>>>>> +
>>>>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>>>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>>>>> +        return fence;
>>>>>> +
>>>>>> +    return NULL;
>>>>>> +}
>>>>>> +
>>>>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
>>>>>> *f)
>>>>>> +{
>>>>>> +    return "amdgpu_amdkfd_fence";
>>>>>> +}
>>>>>> +
>>>>>> +static const char *amd_kfd_fence_get_timeline_name(struct
>>>>>> dma_fence *f)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>> +
>>>>>> +    return fence->timeline_name;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants
>>>>>> to evict
>>>>>> + *  a KFD BO and schedules a job to move the BO.
>>>>>> + *  If fence is already signaled return true.
>>>>>> + *  If fence is not signaled schedule a evict KFD process work item.
>>>>>> + */
>>>>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>> +
>>>>>> +    if (!fence)
>>>>>> +        return false;
>>>>>> +
>>>>>> +    if (dma_fence_is_signaled(f))
>>>>>> +        return true;
>>>>>> +
>>>>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>>>>> +                (struct mm_struct *)fence->mm, f))
>>>>>> +        return true;
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>>>>> +{
>>>>>> +    unsigned long flags;
>>>>>> +    int ret;
>>>>>> +
>>>>>> +    spin_lock_irqsave(f->lock, flags);
>>>>>> +    /* Set enabled bit so cb will called */
>>>>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>>>>> Mhm, why is that necessary?
>>>> This only gets called from fence_release below. I think this is to
>>>> avoid
>>>> needlessly scheduling an eviction/restore cycle when an eviction fence
>>>> gets destroyed that hasn't been triggered before, probably during
>>>> process termination.
>>>>
>>>> Harish, do you remember any other reason for this?
>>>>
>>>>>> +    ret = dma_fence_signal_locked(f);
>>>>>> +    spin_unlock_irqrestore(f->lock, flags);
>>>>>> +
>>>>>> +    return ret;
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * amd_kfd_fence_release - callback that fence can be freed
>>>>>> + *
>>>>>> + * @fence: fence
>>>>>> + *
>>>>>> + * This function is called when the reference count becomes zero.
>>>>>> + * It just RCU schedules freeing up the fence.
>>>>>> + */
>>>>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>> +    /* Unconditionally signal the fence. The process is getting
>>>>>> +     * terminated.
>>>>>> +     */
>>>>>> +    if (WARN_ON(!fence))
>>>>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>>>>> +
>>>>>> +    amd_kfd_fence_signal(f);
>>>>>> +    kfree_rcu(f, rcu);
>>>>>> +}
>>>>>> +
>>>>>> +/**
>>>>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>>>>> fence @f
>>>>>> + *  if same return TRUE else return FALSE.
>>>>>> + *
>>>>>> + * @f: [IN] fence
>>>>>> + * @mm: [IN] mm that needs to be verified
>>>>>> + */
>>>>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>>>>> +{
>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>> +
>>>>>> +    if (!fence)
>>>>>> +        return false;
>>>>>> +    else if (fence->mm == mm)
>>>>>> +        return true;
>>>>>> +
>>>>>> +    return false;
>>>>>> +}
>>>>>> +
>>>>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>>>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>>>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>>>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>>>>> +    .signaled = NULL,
>>>>>> +    .wait = dma_fence_default_wait,
>>>>>> +    .release = amd_kfd_fence_release,
>>>>>> +};
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>> index 65d5a4e..ca00dd2 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>> @@ -36,8 +36,9 @@
>>>>>>     #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>>>>       /* some special values for the owner field */
>>>>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>>>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>>>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>>>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>>>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>>>>       #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>>>>     #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>> index df65c66..0cb31d9 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>> @@ -31,6 +31,7 @@
>>>>>>     #include <drm/drmP.h>
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_trace.h"
>>>>>> +#include "amdgpu_amdkfd.h"
>>>>>>       struct amdgpu_sync_entry {
>>>>>>         struct hlist_node    node;
>>>>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>>>>> amdgpu_device *adev,
>>>>>>     static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>>>>     {
>>>>>>         struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>>>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>>>>> +
>>>>>> +    if (!f)
>>>>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>> When you add the extra NULL check here then please move the
>>>>> to_drm_sched_fence() after it as well.
>>>> Yeah, makes sense.
>>>>
>>>> Regards,
>>>>     Felix
>>>>
>>>>> Christian.
>>>>>
>>>>>>           if (s_fence)
>>>>>>             return s_fence->owner;
>>>>>>     +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>>>>> +    if (kfd_fence)
>>>>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>>>>> +
>>>>>>         return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>>>     }
>>>>>>     @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device
>>>>>> *adev,
>>>>>>         for (i = 0; i < flist->shared_count; ++i) {
>>>>>>             f = rcu_dereference_protected(flist->shared[i],
>>>>>>                               reservation_object_held(resv));
>>>>>> +        /* We only want to trigger KFD eviction fences on
>>>>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>>>>> +         */
>>>>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>>>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>>>> +            continue;
>>>>>> +
>>>>>>             if (amdgpu_sync_same_dev(adev, f)) {
>>>>>>                 /* VM updates are only interesting
>>>>>>                  * for other VM updates and moves.
>>>>>>                  */
>>>>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>>>>                 if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>>                     (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>>                     ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>> index e4bb435..c3f33d3 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>> @@ -46,6 +46,7 @@
>>>>>>     #include "amdgpu.h"
>>>>>>     #include "amdgpu_object.h"
>>>>>>     #include "amdgpu_trace.h"
>>>>>> +#include "amdgpu_amdkfd.h"
>>>>>>     #include "bif/bif_4_1_d.h"
>>>>>>       #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>>>> @@ -1170,6 +1171,23 @@ static bool
>>>>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>>>>     {
>>>>>>         unsigned long num_pages = bo->mem.num_pages;
>>>>>>         struct drm_mm_node *node = bo->mem.mm_node;
>>>>>> +    struct reservation_object_list *flist;
>>>>>> +    struct dma_fence *f;
>>>>>> +    int i;
>>>>>> +
>>>>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>>>>> process.
>>>>>> +     * If true, then return false as any KFD process needs all its
>>>>>> BOs to
>>>>>> +     * be resident to run successfully
>>>>>> +     */
>>>>>> +    flist = reservation_object_get_list(bo->resv);
>>>>>> +    if (flist) {
>>>>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>>>>> +            f = rcu_dereference_protected(flist->shared[i],
>>>>>> +                reservation_object_held(bo->resv));
>>>>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>>>>> +                return false;
>>>>>> +        }
>>>>>> +    }
>>>>>>           switch (bo->mem.mem_type) {
>>>>>>         case TTM_PL_TT:
>>>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>> index 94eab54..9e35249 100644
>>>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>> @@ -30,6 +30,7 @@
>>>>>>       #include <linux/types.h>
>>>>>>     #include <linux/bitmap.h>
>>>>>> +#include <linux/dma-fence.h>
>>>>>>       struct pci_dev;
>>>>>>     @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>>>>      *
>>>>>>      * @resume: Notifies amdkfd about a resume action done to a kgd
>>>>>> device
>>>>>>      *
>>>>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>>>>> will prepare
>>>>>> + * for safe eviction of KFD BOs that belong to the specified
>>>>>> process.
>>>>>> + *
>>>>>>      * This structure contains function callback pointers so the kgd
>>>>>> driver
>>>>>>      * will notify to the amdkfd about certain status changes.
>>>>>>      *
>>>>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>>>>         void (*interrupt)(struct kfd_dev *kfd, const void
>>>>>> *ih_ring_entry);
>>>>>>         void (*suspend)(struct kfd_dev *kfd);
>>>>>>         int (*resume)(struct kfd_dev *kfd);
>>>>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
>>>>>> +            struct dma_fence *fence);
>>>>>>     };
>>>>>>       int kgd2kfd_init(unsigned interface_version,
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                                 ` <a1c8d096-4cf5-0f36-b0d1-8ed705ba7fb2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-01-30 23:21                                   ` Felix Kuehling
       [not found]                                     ` <cffd64a9-7222-7f9f-4fe8-e37972de9fd9-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-01-30 23:21 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo, Kasiviswanathan, Harish,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 2018-01-30 10:35 AM, Christian König wrote:
> Am 30.01.2018 um 16:28 schrieb Kasiviswanathan, Harish:
>>>> [+Harish, forgot to acknowledge him in the commit description, will
>>>> fix
>>>> that in v2]
>>>>
>>>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>>>> Did I understand this correctly?
>> [HK]: Yes the lifetime of eviction fences is tied to the lifetime of
>> the process associated with it. When the process terminates the fence
>> is signaled and released. For all the BOs that belong to this process
>> the eviction should be detached from it when the BO is released.
>> However, this eviction fence could be still attached to shared BOs.
>> So signaling it frees those BOs.
>>
>>
>> On 2018-01-29 08:43 AM, Christian König wrote:
>>> Hi Felix & Harish,
>>>
>>> maybe explain why I found that odd: dma_fence_add_callback() sets the
>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>>>
>>> So the flag should always be set when there are callbacks.
>>> Did I miss anything?
>> I don't think we add any callbacks to our eviction fences.
>>
>> [HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was
>> my oversight. Since, dma_fence_signal() function called cb_list
>> functions only if DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought
>> it was safe to set it. However, the cb_list would be empty if no
>> callbacks are added. So setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is
>> redundant.
>
> Ok in this case let's just remove that and also use the
> dma_fence_signal() function (not the _locked variant) for signaling
> the DMA fence.

Sure. Though it makes me wonder why we need to signal the fence at all.
This is when the reference count of the fence is 0. Doesn't that imply
that no one is left waiting for the fence?

Regards,
  Felix

>
> Thanks,
> Christian.
>
>>
>> Best Regards,
>> Harish
>>
>>  
>> Regards,
>>    Felix
>>
>>> Regards,
>>> Christian.
>>>
>>> Am 29.01.2018 um 00:55 schrieb Felix Kuehling:
>>>> [+Harish, forgot to acknowledge him in the commit description, will
>>>> fix
>>>> that in v2]
>>>>
>>>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>>>> Did I understand this correctly?
>>>>
>>>> Regards,
>>>>     Felix
>>>>
>>>> On 2018-01-28 06:42 PM, Felix Kuehling wrote:
>>>>> On 2018-01-27 04:16 AM, Christian König wrote:
>>>>>> Am 27.01.2018 um 02:09 schrieb Felix Kuehling:
>>>>> [snip]
>>>>>>> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64
>>>>>>> context,
>>>>>>> +                               void *mm)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence = NULL;
>>>>>>> +
>>>>>>> +    fence = kzalloc(sizeof(*fence), GFP_KERNEL);
>>>>>>> +    if (fence == NULL)
>>>>>>> +        return NULL;
>>>>>>> +
>>>>>>> +    /* mm_struct mm is used as void pointer to identify the parent
>>>>>>> +     * KFD process. Don't dereference it. Fence and any threads
>>>>>>> using
>>>>>>> +     * mm is guranteed to be released before process termination.
>>>>>>> +     */
>>>>>>> +    fence->mm = mm;
>>>>>> That won't work. Fences can live much longer than any process who
>>>>>> created them.
>>>>>>
>>>>>> I've already found a fence in a BO still living hours after the
>>>>>> process was killed and the pid long recycled.
>>>>>>
>>>>>> I suggest to make fence->mm a real mm_struct pointer with reference
>>>>>> counting and then set it to NULL and drop the reference in
>>>>>> enable_signaling.
>>>>> I agree. But enable_signaling may be too early to drop the reference.
>>>>> amd_kfd_fence_check_mm could still be called later from
>>>>> amdgpu_ttm_bo_eviction_valuable, as long as the fence hasn't
>>>>> signaled yet.
>>>>>
>>>>> The safe place is problably in amd_kfd_fence_release.
>>>>>
>>>>>>> +    get_task_comm(fence->timeline_name, current);
>>>>>>> +    spin_lock_init(&fence->lock);
>>>>>>> +
>>>>>>> +    dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
>>>>>>> +           context, atomic_inc_return(&fence_seq));
>>>>>>> +
>>>>>>> +    return fence;
>>>>>>> +}
>>>>>>> +
>>>>>>> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct
>>>>>>> dma_fence *f)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence;
>>>>>>> +
>>>>>>> +    if (!f)
>>>>>>> +        return NULL;
>>>>>>> +
>>>>>>> +    fence = container_of(f, struct amdgpu_amdkfd_fence, base);
>>>>>>> +    if (fence && f->ops == &amd_kfd_fence_ops)
>>>>>>> +        return fence;
>>>>>>> +
>>>>>>> +    return NULL;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence
>>>>>>> *f)
>>>>>>> +{
>>>>>>> +    return "amdgpu_amdkfd_fence";
>>>>>>> +}
>>>>>>> +
>>>>>>> +static const char *amd_kfd_fence_get_timeline_name(struct
>>>>>>> dma_fence *f)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>>> +
>>>>>>> +    return fence->timeline_name;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * amd_kfd_fence_enable_signaling - This gets called when TTM
>>>>>>> wants
>>>>>>> to evict
>>>>>>> + *  a KFD BO and schedules a job to move the BO.
>>>>>>> + *  If fence is already signaled return true.
>>>>>>> + *  If fence is not signaled schedule a evict KFD process work
>>>>>>> item.
>>>>>>> + */
>>>>>>> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>>> +
>>>>>>> +    if (!fence)
>>>>>>> +        return false;
>>>>>>> +
>>>>>>> +    if (dma_fence_is_signaled(f))
>>>>>>> +        return true;
>>>>>>> +
>>>>>>> +    if (!kgd2kfd->schedule_evict_and_restore_process(
>>>>>>> +                (struct mm_struct *)fence->mm, f))
>>>>>>> +        return true;
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static int amd_kfd_fence_signal(struct dma_fence *f)
>>>>>>> +{
>>>>>>> +    unsigned long flags;
>>>>>>> +    int ret;
>>>>>>> +
>>>>>>> +    spin_lock_irqsave(f->lock, flags);
>>>>>>> +    /* Set enabled bit so cb will called */
>>>>>>> +    set_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &f->flags);
>>>>>> Mhm, why is that necessary?
>>>>> This only gets called from fence_release below. I think this is to
>>>>> avoid
>>>>> needlessly scheduling an eviction/restore cycle when an eviction
>>>>> fence
>>>>> gets destroyed that hasn't been triggered before, probably during
>>>>> process termination.
>>>>>
>>>>> Harish, do you remember any other reason for this?
>>>>>
>>>>>>> +    ret = dma_fence_signal_locked(f);
>>>>>>> +    spin_unlock_irqrestore(f->lock, flags);
>>>>>>> +
>>>>>>> +    return ret;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * amd_kfd_fence_release - callback that fence can be freed
>>>>>>> + *
>>>>>>> + * @fence: fence
>>>>>>> + *
>>>>>>> + * This function is called when the reference count becomes zero.
>>>>>>> + * It just RCU schedules freeing up the fence.
>>>>>>> + */
>>>>>>> +static void amd_kfd_fence_release(struct dma_fence *f)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>>> +    /* Unconditionally signal the fence. The process is getting
>>>>>>> +     * terminated.
>>>>>>> +     */
>>>>>>> +    if (WARN_ON(!fence))
>>>>>>> +        return; /* Not an amdgpu_amdkfd_fence */
>>>>>>> +
>>>>>>> +    amd_kfd_fence_signal(f);
>>>>>>> +    kfree_rcu(f, rcu);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/**
>>>>>>> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the
>>>>>>> fence @f
>>>>>>> + *  if same return TRUE else return FALSE.
>>>>>>> + *
>>>>>>> + * @f: [IN] fence
>>>>>>> + * @mm: [IN] mm that needs to be verified
>>>>>>> + */
>>>>>>> +bool amd_kfd_fence_check_mm(struct dma_fence *f, void *mm)
>>>>>>> +{
>>>>>>> +    struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
>>>>>>> +
>>>>>>> +    if (!fence)
>>>>>>> +        return false;
>>>>>>> +    else if (fence->mm == mm)
>>>>>>> +        return true;
>>>>>>> +
>>>>>>> +    return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +const struct dma_fence_ops amd_kfd_fence_ops = {
>>>>>>> +    .get_driver_name = amd_kfd_fence_get_driver_name,
>>>>>>> +    .get_timeline_name = amd_kfd_fence_get_timeline_name,
>>>>>>> +    .enable_signaling = amd_kfd_fence_enable_signaling,
>>>>>>> +    .signaled = NULL,
>>>>>>> +    .wait = dma_fence_default_wait,
>>>>>>> +    .release = amd_kfd_fence_release,
>>>>>>> +};
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>> index 65d5a4e..ca00dd2 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>>>>>> @@ -36,8 +36,9 @@
>>>>>>>     #define AMDGPU_MAX_UVD_ENC_RINGS    2
>>>>>>>       /* some special values for the owner field */
>>>>>>> -#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void*)0ul)
>>>>>>> -#define AMDGPU_FENCE_OWNER_VM        ((void*)1ul)
>>>>>>> +#define AMDGPU_FENCE_OWNER_UNDEFINED    ((void *)0ul)
>>>>>>> +#define AMDGPU_FENCE_OWNER_VM        ((void *)1ul)
>>>>>>> +#define AMDGPU_FENCE_OWNER_KFD        ((void *)2ul)
>>>>>>>       #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>>>>>>>     #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>>> index df65c66..0cb31d9 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>>>>>>> @@ -31,6 +31,7 @@
>>>>>>>     #include <drm/drmP.h>
>>>>>>>     #include "amdgpu.h"
>>>>>>>     #include "amdgpu_trace.h"
>>>>>>> +#include "amdgpu_amdkfd.h"
>>>>>>>       struct amdgpu_sync_entry {
>>>>>>>         struct hlist_node    node;
>>>>>>> @@ -86,10 +87,18 @@ static bool amdgpu_sync_same_dev(struct
>>>>>>> amdgpu_device *adev,
>>>>>>>     static void *amdgpu_sync_get_owner(struct dma_fence *f)
>>>>>>>     {
>>>>>>>         struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
>>>>>>> +    struct amdgpu_amdkfd_fence *kfd_fence;
>>>>>>> +
>>>>>>> +    if (!f)
>>>>>>> +        return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>>> When you add the extra NULL check here then please move the
>>>>>> to_drm_sched_fence() after it as well.
>>>>> Yeah, makes sense.
>>>>>
>>>>> Regards,
>>>>>     Felix
>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>           if (s_fence)
>>>>>>>             return s_fence->owner;
>>>>>>>     +    kfd_fence = to_amdgpu_amdkfd_fence(f);
>>>>>>> +    if (kfd_fence)
>>>>>>> +        return AMDGPU_FENCE_OWNER_KFD;
>>>>>>> +
>>>>>>>         return AMDGPU_FENCE_OWNER_UNDEFINED;
>>>>>>>     }
>>>>>>>     @@ -204,11 +213,18 @@ int amdgpu_sync_resv(struct amdgpu_device
>>>>>>> *adev,
>>>>>>>         for (i = 0; i < flist->shared_count; ++i) {
>>>>>>>             f = rcu_dereference_protected(flist->shared[i],
>>>>>>>                               reservation_object_held(resv));
>>>>>>> +        /* We only want to trigger KFD eviction fences on
>>>>>>> +         * evict or move jobs. Skip KFD fences otherwise.
>>>>>>> +         */
>>>>>>> +        fence_owner = amdgpu_sync_get_owner(f);
>>>>>>> +        if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
>>>>>>> +            owner != AMDGPU_FENCE_OWNER_UNDEFINED)
>>>>>>> +            continue;
>>>>>>> +
>>>>>>>             if (amdgpu_sync_same_dev(adev, f)) {
>>>>>>>                 /* VM updates are only interesting
>>>>>>>                  * for other VM updates and moves.
>>>>>>>                  */
>>>>>>> -            fence_owner = amdgpu_sync_get_owner(f);
>>>>>>>                 if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>>>                     (fence_owner !=
>>>>>>> AMDGPU_FENCE_OWNER_UNDEFINED) &&
>>>>>>>                     ((owner == AMDGPU_FENCE_OWNER_VM) !=
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>>> index e4bb435..c3f33d3 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>>>>> @@ -46,6 +46,7 @@
>>>>>>>     #include "amdgpu.h"
>>>>>>>     #include "amdgpu_object.h"
>>>>>>>     #include "amdgpu_trace.h"
>>>>>>> +#include "amdgpu_amdkfd.h"
>>>>>>>     #include "bif/bif_4_1_d.h"
>>>>>>>       #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
>>>>>>> @@ -1170,6 +1171,23 @@ static bool
>>>>>>> amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>>>>>>>     {
>>>>>>>         unsigned long num_pages = bo->mem.num_pages;
>>>>>>>         struct drm_mm_node *node = bo->mem.mm_node;
>>>>>>> +    struct reservation_object_list *flist;
>>>>>>> +    struct dma_fence *f;
>>>>>>> +    int i;
>>>>>>> +
>>>>>>> +    /* If bo is a KFD BO, check if the bo belongs to the current
>>>>>>> process.
>>>>>>> +     * If true, then return false as any KFD process needs all its
>>>>>>> BOs to
>>>>>>> +     * be resident to run successfully
>>>>>>> +     */
>>>>>>> +    flist = reservation_object_get_list(bo->resv);
>>>>>>> +    if (flist) {
>>>>>>> +        for (i = 0; i < flist->shared_count; ++i) {
>>>>>>> +            f = rcu_dereference_protected(flist->shared[i],
>>>>>>> +                reservation_object_held(bo->resv));
>>>>>>> +            if (amd_kfd_fence_check_mm(f, current->mm))
>>>>>>> +                return false;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>>           switch (bo->mem.mem_type) {
>>>>>>>         case TTM_PL_TT:
>>>>>>> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>>> b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>>> index 94eab54..9e35249 100644
>>>>>>> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>>> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
>>>>>>> @@ -30,6 +30,7 @@
>>>>>>>       #include <linux/types.h>
>>>>>>>     #include <linux/bitmap.h>
>>>>>>> +#include <linux/dma-fence.h>
>>>>>>>       struct pci_dev;
>>>>>>>     @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>>>>>>>      *
>>>>>>>      * @resume: Notifies amdkfd about a resume action done to a kgd
>>>>>>> device
>>>>>>>      *
>>>>>>> + * @schedule_evict_and_restore_process: Schedules work queue that
>>>>>>> will prepare
>>>>>>> + * for safe eviction of KFD BOs that belong to the specified
>>>>>>> process.
>>>>>>> + *
>>>>>>>      * This structure contains function callback pointers so the
>>>>>>> kgd
>>>>>>> driver
>>>>>>>      * will notify to the amdkfd about certain status changes.
>>>>>>>      *
>>>>>>> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>>>>>>>         void (*interrupt)(struct kfd_dev *kfd, const void
>>>>>>> *ih_ring_entry);
>>>>>>>         void (*suspend)(struct kfd_dev *kfd);
>>>>>>>         int (*resume)(struct kfd_dev *kfd);
>>>>>>> +    int (*schedule_evict_and_restore_process)(struct mm_struct
>>>>>>> *mm,
>>>>>>> +            struct dma_fence *fence);
>>>>>>>     };
>>>>>>>       int kgd2kfd_init(unsigned interface_version,
>>>> _______________________________________________
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]                                     ` <cffd64a9-7222-7f9f-4fe8-e37972de9fd9-5C7GfCeVMHo@public.gmane.org>
@ 2018-01-31  8:09                                       ` Christian König
  0 siblings, 0 replies; 44+ messages in thread
From: Christian König @ 2018-01-31  8:09 UTC (permalink / raw)
  To: Felix Kuehling, christian.koenig-5C7GfCeVMHo, Kasiviswanathan,
	Harish, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

Am 31.01.2018 um 00:21 schrieb Felix Kuehling:
> On 2018-01-30 10:35 AM, Christian König wrote:
>> Am 30.01.2018 um 16:28 schrieb Kasiviswanathan, Harish:
>>>>> [+Harish, forgot to acknowledge him in the commit description, will
>>>>> fix
>>>>> that in v2]
>>>>>
>>>>> Harish, please see Christian's question below in amd_kfd_fence_signal.
>>>>> Did I understand this correctly?
>>> [HK]: Yes the lifetime of eviction fences is tied to the lifetime of
>>> the process associated with it. When the process terminates the fence
>>> is signaled and released. For all the BOs that belong to this process
>>> the eviction should be detached from it when the BO is released.
>>> However, this eviction fence could be still attached to shared BOs.
>>> So signaling it frees those BOs.
>>>
>>>
>>> On 2018-01-29 08:43 AM, Christian König wrote:
>>>> Hi Felix & Harish,
>>>>
>>>> maybe explain why I found that odd: dma_fence_add_callback() sets the
>>>> DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT flag before adding the callback.
>>>>
>>>> So the flag should always be set when there are callbacks.
>>>> Did I miss anything?
>>> I don't think we add any callbacks to our eviction fences.
>>>
>>> [HK] Setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is not required. It was
>>> my oversight. Since, dma_fence_signal() function called cb_list
>>> functions only if DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is set, I thought
>>> it was safe to set it. However, the cb_list would be empty if no
>>> callbacks are added. So setting DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT is
>>> redundant.
>> Ok in this case let's just remove that and also use the
>> dma_fence_signal() function (not the _locked variant) for signaling
>> the DMA fence.
> Sure. Though it makes me wonder why we need to signal the fence at all.
> This is when the reference count of the fence is 0. Doesn't that imply
> that no one is left waiting for the fence?

Good point as well, yeah when the fences reference count becomes zero 
there is no point in signaling it.

So the whole function can be removed.

Regards,
Christian.

>
> Regards,
>    Felix
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
  2018-02-11 12:42       ` Oded Gabbay
@ 2018-02-12 19:19         ` Felix Kuehling
  0 siblings, 0 replies; 44+ messages in thread
From: Felix Kuehling @ 2018-02-12 19:19 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Harish Kasiviswanathan, amd-gfx list

On 2018-02-11 07:42 AM, Oded Gabbay wrote:
> Hi Felix,
> Do you object to me changing amd_kfd_ to amdkfd_ in the various
> structures and functions ?
> So far, we don't have anything with prefix of amd_kfd_ so I would like
> to keep on consistency.

We use the prefix amdgpu_amdkfd_ throughout the KFD-related amdgpu code.
Not sure why we did something different here. I'm OK with changing it
for consistency.

Thanks,
  Felix

>
> Other then that, this patch is:
> Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
>
>
> Oded

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-11 12:42       ` Oded Gabbay
  2018-02-12 19:19         ` Felix Kuehling
  0 siblings, 1 reply; 44+ messages in thread
From: Oded Gabbay @ 2018-02-11 12:42 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Harish Kasiviswanathan, amd-gfx list

On Wed, Feb 7, 2018 at 3:32 AM, Felix Kuehling <Felix.Kuehling@amd.com> wrote:
> This fence is used by KFD to keep memory resident while user mode
> queues are enabled. Trying to evict memory will trigger the
> enable_signaling callback, which starts a KFD eviction, which
> involves preempting user mode queues before signaling the fence.
> There is one such fence per process.
>
> v2:
> * Grab a reference to mm_struct
> * Dereference fence after NULL check
> * Simplify fence release, no need to signal without anyone waiting
> * Added signed-off-by Harish, who is the original author of this code
>
> Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 179 +++++++++++++++++++++++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  21 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
>  drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
>  7 files changed, 241 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
> index d6e5b72..43dc3f9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -130,6 +130,7 @@ amdgpu-y += \
>  # add amdkfd interfaces
>  amdgpu-y += \
>          amdgpu_amdkfd.o \
> +        amdgpu_amdkfd_fence.o \
>          amdgpu_amdkfd_gfx_v8.o
>
>  # add cgs
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 2a519f9..492c7af 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -29,6 +29,8 @@
>  #include <linux/mmu_context.h>
>  #include <kgd_kfd_interface.h>
>
> +extern const struct kgd2kfd_calls *kgd2kfd;
> +
>  struct amdgpu_device;
>
>  struct kgd_mem {
> @@ -37,6 +39,19 @@ struct kgd_mem {
>         void *cpu_ptr;
>  };
>
> +/* KFD Memory Eviction */
> +struct amdgpu_amdkfd_fence {
> +       struct dma_fence base;
> +       struct mm_struct *mm;
> +       spinlock_t lock;
> +       char timeline_name[TASK_COMM_LEN];
> +};
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +                                                      struct mm_struct *mm);
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
> +
>  int amdgpu_amdkfd_init(void);
>  void amdgpu_amdkfd_fini(void);
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> new file mode 100644
> index 0000000..cf2f1e9
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
> @@ -0,0 +1,179 @@
> +/*
> + * Copyright 2016-2018 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + */
> +
> +#include <linux/dma-fence.h>
> +#include <linux/spinlock.h>
> +#include <linux/atomic.h>
> +#include <linux/stacktrace.h>
> +#include <linux/sched.h>
> +#include <linux/slab.h>
> +#include <linux/sched/mm.h>
> +#include "amdgpu_amdkfd.h"
> +
> +const struct dma_fence_ops amd_kfd_fence_ops;
> +static atomic_t fence_seq = ATOMIC_INIT(0);
> +
> +/* Eviction Fence
> + * Fence helper functions to deal with KFD memory eviction.
> + * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
> + *  evicted unless all the user queues for that process are evicted.
> + *
> + * All the BOs in a process share an eviction fence. When process X wants
> + * to map VRAM memory but TTM can't find enough space, TTM will attempt to
> + * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
> + * by calling ttm_bo_driver->eviction_valuable().
> + *
> + * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
> + *  to process X. Otherwise, it will return true to indicate BO can be
> + *  evicted by TTM.
> + *
> + * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
> + * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
> + * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
> + *
> + * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
> + *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
> + *  --> amdgpu_amdkfd_fence.enable_signaling
> + *
> + * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
> + * user queues and signal fence. The work item will also start another delayed
> + * work item to restore BOs
> + */
> +
> +struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
> +                                                      struct mm_struct *mm)
> +{
> +       struct amdgpu_amdkfd_fence *fence = NULL;
> +
> +       fence = kzalloc(sizeof(*fence), GFP_KERNEL);
> +       if (fence == NULL)
> +               return NULL;
> +
> +       /* This reference gets released in amd_kfd_fence_release */
> +       mmgrab(mm);
> +       fence->mm = mm;
> +       get_task_comm(fence->timeline_name, current);
> +       spin_lock_init(&fence->lock);
> +
> +       dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
> +                  context, atomic_inc_return(&fence_seq));
> +
> +       return fence;
> +}
> +
> +struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence;
> +
> +       if (!f)
> +               return NULL;
> +
> +       fence = container_of(f, struct amdgpu_amdkfd_fence, base);
> +       if (fence && f->ops == &amd_kfd_fence_ops)
> +               return fence;
> +
> +       return NULL;
> +}
> +
> +static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
> +{
> +       return "amdgpu_amdkfd_fence";
> +}
> +
> +static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       return fence->timeline_name;
> +}
> +
> +/**
> + * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
> + *  a KFD BO and schedules a job to move the BO.
> + *  If fence is already signaled return true.
> + *  If fence is not signaled schedule a evict KFD process work item.
> + */
> +static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       if (!fence)
> +               return false;
> +
> +       if (dma_fence_is_signaled(f))
> +               return true;
> +
> +       if (!kgd2kfd->schedule_evict_and_restore_process(fence->mm, f))
> +               return true;
> +
> +       return false;
> +}
> +
> +/**
> + * amd_kfd_fence_release - callback that fence can be freed
> + *
> + * @fence: fence
> + *
> + * This function is called when the reference count becomes zero.
> + * Drops the mm_struct reference and RCU schedules freeing up the fence.
> + */
> +static void amd_kfd_fence_release(struct dma_fence *f)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       /* Unconditionally signal the fence. The process is getting
> +        * terminated.
> +        */
> +       if (WARN_ON(!fence))
> +               return; /* Not an amdgpu_amdkfd_fence */
> +
> +       mmdrop(fence->mm);
> +       kfree_rcu(f, rcu);
> +}
> +
> +/**
> + * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
> + *  if same return TRUE else return FALSE.
> + *
> + * @f: [IN] fence
> + * @mm: [IN] mm that needs to be verified
> + */
> +bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
> +{
> +       struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
> +
> +       if (!fence)
> +               return false;
> +       else if (fence->mm == mm)
> +               return true;
> +
> +       return false;
> +}
> +
> +const struct dma_fence_ops amd_kfd_fence_ops = {
> +       .get_driver_name = amd_kfd_fence_get_driver_name,
> +       .get_timeline_name = amd_kfd_fence_get_timeline_name,
> +       .enable_signaling = amd_kfd_fence_enable_signaling,
> +       .signaled = NULL,
> +       .wait = dma_fence_default_wait,
> +       .release = amd_kfd_fence_release,
> +};
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 65d5a4e..ca00dd2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -36,8 +36,9 @@
>  #define AMDGPU_MAX_UVD_ENC_RINGS       2
>
>  /* some special values for the owner field */
> -#define AMDGPU_FENCE_OWNER_UNDEFINED   ((void*)0ul)
> -#define AMDGPU_FENCE_OWNER_VM          ((void*)1ul)
> +#define AMDGPU_FENCE_OWNER_UNDEFINED   ((void *)0ul)
> +#define AMDGPU_FENCE_OWNER_VM          ((void *)1ul)
> +#define AMDGPU_FENCE_OWNER_KFD         ((void *)2ul)
>
>  #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
>  #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> index df65c66..b8d3b87 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
> @@ -31,6 +31,7 @@
>  #include <drm/drmP.h>
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>
>  struct amdgpu_sync_entry {
>         struct hlist_node       node;
> @@ -85,11 +86,20 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
>   */
>  static void *amdgpu_sync_get_owner(struct dma_fence *f)
>  {
> -       struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
> +       struct drm_sched_fence *s_fence;
> +       struct amdgpu_amdkfd_fence *kfd_fence;
> +
> +       if (!f)
> +               return AMDGPU_FENCE_OWNER_UNDEFINED;
>
> +       s_fence = to_drm_sched_fence(f);
>         if (s_fence)
>                 return s_fence->owner;
>
> +       kfd_fence = to_amdgpu_amdkfd_fence(f);
> +       if (kfd_fence)
> +               return AMDGPU_FENCE_OWNER_KFD;
> +
>         return AMDGPU_FENCE_OWNER_UNDEFINED;
>  }
>
> @@ -204,11 +214,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
>         for (i = 0; i < flist->shared_count; ++i) {
>                 f = rcu_dereference_protected(flist->shared[i],
>                                               reservation_object_held(resv));
> +               /* We only want to trigger KFD eviction fences on
> +                * evict or move jobs. Skip KFD fences otherwise.
> +                */
> +               fence_owner = amdgpu_sync_get_owner(f);
> +               if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
> +                   owner != AMDGPU_FENCE_OWNER_UNDEFINED)
> +                       continue;
> +
>                 if (amdgpu_sync_same_dev(adev, f)) {
>                         /* VM updates are only interesting
>                          * for other VM updates and moves.
>                          */
> -                       fence_owner = amdgpu_sync_get_owner(f);
>                         if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>                             (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
>                             ((owner == AMDGPU_FENCE_OWNER_VM) !=
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index e4bb435..c3f33d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -46,6 +46,7 @@
>  #include "amdgpu.h"
>  #include "amdgpu_object.h"
>  #include "amdgpu_trace.h"
> +#include "amdgpu_amdkfd.h"
>  #include "bif/bif_4_1_d.h"
>
>  #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
> @@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
>  {
>         unsigned long num_pages = bo->mem.num_pages;
>         struct drm_mm_node *node = bo->mem.mm_node;
> +       struct reservation_object_list *flist;
> +       struct dma_fence *f;
> +       int i;
> +
> +       /* If bo is a KFD BO, check if the bo belongs to the current process.
> +        * If true, then return false as any KFD process needs all its BOs to
> +        * be resident to run successfully
> +        */
> +       flist = reservation_object_get_list(bo->resv);
> +       if (flist) {
> +               for (i = 0; i < flist->shared_count; ++i) {
> +                       f = rcu_dereference_protected(flist->shared[i],
> +                               reservation_object_held(bo->resv));
> +                       if (amd_kfd_fence_check_mm(f, current->mm))
> +                               return false;
> +               }
> +       }
>
>         switch (bo->mem.mem_type) {
>         case TTM_PL_TT:
> diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> index 94eab548..9e35249 100644
> --- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> +++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
> @@ -30,6 +30,7 @@
>
>  #include <linux/types.h>
>  #include <linux/bitmap.h>
> +#include <linux/dma-fence.h>
>
>  struct pci_dev;
>
> @@ -286,6 +287,9 @@ struct kfd2kgd_calls {
>   *
>   * @resume: Notifies amdkfd about a resume action done to a kgd device
>   *
> + * @schedule_evict_and_restore_process: Schedules work queue that will prepare
> + * for safe eviction of KFD BOs that belong to the specified process.
> + *
>   * This structure contains function callback pointers so the kgd driver
>   * will notify to the amdkfd about certain status changes.
>   *
> @@ -300,6 +304,8 @@ struct kgd2kfd_calls {
>         void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
>         void (*suspend)(struct kfd_dev *kfd);
>         int (*resume)(struct kfd_dev *kfd);
> +       int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
> +                       struct dma_fence *fence);
>  };
>
>  int kgd2kfd_init(unsigned interface_version,
> --
> 2.7.4
>

Hi Felix,
Do you object to me changing amd_kfd_ to amdkfd_ in the various
structures and functions ?
So far, we don't have anything with prefix of amd_kfd_ so I would like
to keep on consistency.

Other then that, this patch is:
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>


Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 06/25] drm/amdgpu: Add KFD eviction fence
       [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-02-07  1:32   ` Felix Kuehling
       [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 44+ messages in thread
From: Felix Kuehling @ 2018-02-07  1:32 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling, Harish Kasiviswanathan

This fence is used by KFD to keep memory resident while user mode
queues are enabled. Trying to evict memory will trigger the
enable_signaling callback, which starts a KFD eviction, which
involves preempting user mode queues before signaling the fence.
There is one such fence per process.

v2:
* Grab a reference to mm_struct
* Dereference fence after NULL check
* Simplify fence release, no need to signal without anyone waiting
* Added signed-off-by Harish, who is the original author of this code

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/Makefile              |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  15 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c | 179 +++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h         |   5 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c         |  21 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c          |  18 +++
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h  |   6 +
 7 files changed, 241 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index d6e5b72..43dc3f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -130,6 +130,7 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += \
 	 amdgpu_amdkfd.o \
+	 amdgpu_amdkfd_fence.o \
 	 amdgpu_amdkfd_gfx_v8.o
 
 # add cgs
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 2a519f9..492c7af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -29,6 +29,8 @@
 #include <linux/mmu_context.h>
 #include <kgd_kfd_interface.h>
 
+extern const struct kgd2kfd_calls *kgd2kfd;
+
 struct amdgpu_device;
 
 struct kgd_mem {
@@ -37,6 +39,19 @@ struct kgd_mem {
 	void *cpu_ptr;
 };
 
+/* KFD Memory Eviction */
+struct amdgpu_amdkfd_fence {
+	struct dma_fence base;
+	struct mm_struct *mm;
+	spinlock_t lock;
+	char timeline_name[TASK_COMM_LEN];
+};
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       struct mm_struct *mm);
+bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm);
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f);
+
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
new file mode 100644
index 0000000..cf2f1e9
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_fence.c
@@ -0,0 +1,179 @@
+/*
+ * Copyright 2016-2018 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/spinlock.h>
+#include <linux/atomic.h>
+#include <linux/stacktrace.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+#include "amdgpu_amdkfd.h"
+
+const struct dma_fence_ops amd_kfd_fence_ops;
+static atomic_t fence_seq = ATOMIC_INIT(0);
+
+/* Eviction Fence
+ * Fence helper functions to deal with KFD memory eviction.
+ * Big Idea - Since KFD submissions are done by user queues, a BO cannot be
+ *  evicted unless all the user queues for that process are evicted.
+ *
+ * All the BOs in a process share an eviction fence. When process X wants
+ * to map VRAM memory but TTM can't find enough space, TTM will attempt to
+ * evict BOs from its LRU list. TTM checks if the BO is valuable to evict
+ * by calling ttm_bo_driver->eviction_valuable().
+ *
+ * ttm_bo_driver->eviction_valuable() - will return false if the BO belongs
+ *  to process X. Otherwise, it will return true to indicate BO can be
+ *  evicted by TTM.
+ *
+ * If ttm_bo_driver->eviction_valuable returns true, then TTM will continue
+ * the evcition process for that BO by calling ttm_bo_evict --> amdgpu_bo_move
+ * --> amdgpu_copy_buffer(). This sets up job in GPU scheduler.
+ *
+ * GPU Scheduler (amd_sched_main) - sets up a cb (fence_add_callback) to
+ *  nofity when the BO is free to move. fence_add_callback --> enable_signaling
+ *  --> amdgpu_amdkfd_fence.enable_signaling
+ *
+ * amdgpu_amdkfd_fence.enable_signaling - Start a work item that will quiesce
+ * user queues and signal fence. The work item will also start another delayed
+ * work item to restore BOs
+ */
+
+struct amdgpu_amdkfd_fence *amdgpu_amdkfd_fence_create(u64 context,
+						       struct mm_struct *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = NULL;
+
+	fence = kzalloc(sizeof(*fence), GFP_KERNEL);
+	if (fence == NULL)
+		return NULL;
+
+	/* This reference gets released in amd_kfd_fence_release */
+	mmgrab(mm);
+	fence->mm = mm;
+	get_task_comm(fence->timeline_name, current);
+	spin_lock_init(&fence->lock);
+
+	dma_fence_init(&fence->base, &amd_kfd_fence_ops, &fence->lock,
+		   context, atomic_inc_return(&fence_seq));
+
+	return fence;
+}
+
+struct amdgpu_amdkfd_fence *to_amdgpu_amdkfd_fence(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence;
+
+	if (!f)
+		return NULL;
+
+	fence = container_of(f, struct amdgpu_amdkfd_fence, base);
+	if (fence && f->ops == &amd_kfd_fence_ops)
+		return fence;
+
+	return NULL;
+}
+
+static const char *amd_kfd_fence_get_driver_name(struct dma_fence *f)
+{
+	return "amdgpu_amdkfd_fence";
+}
+
+static const char *amd_kfd_fence_get_timeline_name(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	return fence->timeline_name;
+}
+
+/**
+ * amd_kfd_fence_enable_signaling - This gets called when TTM wants to evict
+ *  a KFD BO and schedules a job to move the BO.
+ *  If fence is already signaled return true.
+ *  If fence is not signaled schedule a evict KFD process work item.
+ */
+static bool amd_kfd_fence_enable_signaling(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+
+	if (dma_fence_is_signaled(f))
+		return true;
+
+	if (!kgd2kfd->schedule_evict_and_restore_process(fence->mm, f))
+		return true;
+
+	return false;
+}
+
+/**
+ * amd_kfd_fence_release - callback that fence can be freed
+ *
+ * @fence: fence
+ *
+ * This function is called when the reference count becomes zero.
+ * Drops the mm_struct reference and RCU schedules freeing up the fence.
+ */
+static void amd_kfd_fence_release(struct dma_fence *f)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	/* Unconditionally signal the fence. The process is getting
+	 * terminated.
+	 */
+	if (WARN_ON(!fence))
+		return; /* Not an amdgpu_amdkfd_fence */
+
+	mmdrop(fence->mm);
+	kfree_rcu(f, rcu);
+}
+
+/**
+ * amd_kfd_fence_check_mm - Check if @mm is same as that of the fence @f
+ *  if same return TRUE else return FALSE.
+ *
+ * @f: [IN] fence
+ * @mm: [IN] mm that needs to be verified
+ */
+bool amd_kfd_fence_check_mm(struct dma_fence *f, struct mm_struct *mm)
+{
+	struct amdgpu_amdkfd_fence *fence = to_amdgpu_amdkfd_fence(f);
+
+	if (!fence)
+		return false;
+	else if (fence->mm == mm)
+		return true;
+
+	return false;
+}
+
+const struct dma_fence_ops amd_kfd_fence_ops = {
+	.get_driver_name = amd_kfd_fence_get_driver_name,
+	.get_timeline_name = amd_kfd_fence_get_timeline_name,
+	.enable_signaling = amd_kfd_fence_enable_signaling,
+	.signaled = NULL,
+	.wait = dma_fence_default_wait,
+	.release = amd_kfd_fence_release,
+};
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 65d5a4e..ca00dd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -36,8 +36,9 @@
 #define AMDGPU_MAX_UVD_ENC_RINGS	2
 
 /* some special values for the owner field */
-#define AMDGPU_FENCE_OWNER_UNDEFINED	((void*)0ul)
-#define AMDGPU_FENCE_OWNER_VM		((void*)1ul)
+#define AMDGPU_FENCE_OWNER_UNDEFINED	((void *)0ul)
+#define AMDGPU_FENCE_OWNER_VM		((void *)1ul)
+#define AMDGPU_FENCE_OWNER_KFD		((void *)2ul)
 
 #define AMDGPU_FENCE_FLAG_64BIT         (1 << 0)
 #define AMDGPU_FENCE_FLAG_INT           (1 << 1)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
index df65c66..b8d3b87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
@@ -31,6 +31,7 @@
 #include <drm/drmP.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 struct amdgpu_sync_entry {
 	struct hlist_node	node;
@@ -85,11 +86,20 @@ static bool amdgpu_sync_same_dev(struct amdgpu_device *adev,
  */
 static void *amdgpu_sync_get_owner(struct dma_fence *f)
 {
-	struct drm_sched_fence *s_fence = to_drm_sched_fence(f);
+	struct drm_sched_fence *s_fence;
+	struct amdgpu_amdkfd_fence *kfd_fence;
+
+	if (!f)
+		return AMDGPU_FENCE_OWNER_UNDEFINED;
 
+	s_fence = to_drm_sched_fence(f);
 	if (s_fence)
 		return s_fence->owner;
 
+	kfd_fence = to_amdgpu_amdkfd_fence(f);
+	if (kfd_fence)
+		return AMDGPU_FENCE_OWNER_KFD;
+
 	return AMDGPU_FENCE_OWNER_UNDEFINED;
 }
 
@@ -204,11 +214,18 @@ int amdgpu_sync_resv(struct amdgpu_device *adev,
 	for (i = 0; i < flist->shared_count; ++i) {
 		f = rcu_dereference_protected(flist->shared[i],
 					      reservation_object_held(resv));
+		/* We only want to trigger KFD eviction fences on
+		 * evict or move jobs. Skip KFD fences otherwise.
+		 */
+		fence_owner = amdgpu_sync_get_owner(f);
+		if (fence_owner == AMDGPU_FENCE_OWNER_KFD &&
+		    owner != AMDGPU_FENCE_OWNER_UNDEFINED)
+			continue;
+
 		if (amdgpu_sync_same_dev(adev, f)) {
 			/* VM updates are only interesting
 			 * for other VM updates and moves.
 			 */
-			fence_owner = amdgpu_sync_get_owner(f);
 			if ((owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    (fence_owner != AMDGPU_FENCE_OWNER_UNDEFINED) &&
 			    ((owner == AMDGPU_FENCE_OWNER_VM) !=
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index e4bb435..c3f33d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -46,6 +46,7 @@
 #include "amdgpu.h"
 #include "amdgpu_object.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 #include "bif/bif_4_1_d.h"
 
 #define DRM_FILE_PAGE_OFFSET (0x100000000ULL >> PAGE_SHIFT)
@@ -1170,6 +1171,23 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
 {
 	unsigned long num_pages = bo->mem.num_pages;
 	struct drm_mm_node *node = bo->mem.mm_node;
+	struct reservation_object_list *flist;
+	struct dma_fence *f;
+	int i;
+
+	/* If bo is a KFD BO, check if the bo belongs to the current process.
+	 * If true, then return false as any KFD process needs all its BOs to
+	 * be resident to run successfully
+	 */
+	flist = reservation_object_get_list(bo->resv);
+	if (flist) {
+		for (i = 0; i < flist->shared_count; ++i) {
+			f = rcu_dereference_protected(flist->shared[i],
+				reservation_object_held(bo->resv));
+			if (amd_kfd_fence_check_mm(f, current->mm))
+				return false;
+		}
+	}
 
 	switch (bo->mem.mem_type) {
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 94eab548..9e35249 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -30,6 +30,7 @@
 
 #include <linux/types.h>
 #include <linux/bitmap.h>
+#include <linux/dma-fence.h>
 
 struct pci_dev;
 
@@ -286,6 +287,9 @@ struct kfd2kgd_calls {
  *
  * @resume: Notifies amdkfd about a resume action done to a kgd device
  *
+ * @schedule_evict_and_restore_process: Schedules work queue that will prepare
+ * for safe eviction of KFD BOs that belong to the specified process.
+ *
  * This structure contains function callback pointers so the kgd driver
  * will notify to the amdkfd about certain status changes.
  *
@@ -300,6 +304,8 @@ struct kgd2kfd_calls {
 	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
 	void (*suspend)(struct kfd_dev *kfd);
 	int (*resume)(struct kfd_dev *kfd);
+	int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
+			struct dma_fence *fence);
 };
 
 int kgd2kfd_init(unsigned interface_version,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2018-02-12 19:19 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-27  1:09 [PATCH 00/25] Add KFD GPUVM support for dGPUs Felix Kuehling
     [not found] ` <1517015381-1080-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-01-27  1:09   ` [PATCH 01/25] drm/amdgpu: remove useless BUG_ONs Felix Kuehling
2018-01-27  1:09   ` [PATCH 02/25] drm/amdgpu: Replace kgd_mem with amdgpu_bo for kernel pinned gtt mem Felix Kuehling
2018-01-27  1:09   ` [PATCH 03/25] drm/amdgpu: Fix header file dependencies Felix Kuehling
2018-01-27  1:09   ` [PATCH 04/25] drm/amdgpu: Fix wrong mask in get_atc_vmid_pasid_mapping_pasid Felix Kuehling
2018-01-27  1:09   ` [PATCH 05/25] drm/amdgpu: Remove unused kfd2kgd interface Felix Kuehling
2018-01-27  1:09   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
     [not found]     ` <1517015381-1080-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-01-27  9:16       ` Christian König
     [not found]         ` <11f5f33b-0c0e-44c2-5be9-5d0d25204c2e-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-28 23:42           ` Felix Kuehling
     [not found]             ` <05cc2831-a338-ddae-42c5-8be381787a5e-5C7GfCeVMHo@public.gmane.org>
2018-01-28 23:55               ` Felix Kuehling
     [not found]                 ` <9697c103-f6cd-b7c9-a0a1-5f9ff080f789-5C7GfCeVMHo@public.gmane.org>
2018-01-29 13:43                   ` Christian König
     [not found]                     ` <fa409dd6-6a4e-ea4b-6570-9b16ed4cb4a4-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-29 19:39                       ` Felix Kuehling
     [not found]                         ` <d864b662-2212-4aa6-2dac-f0ee3157681e-5C7GfCeVMHo@public.gmane.org>
2018-01-30 15:28                           ` Kasiviswanathan, Harish
     [not found]                             ` <DM3PR1201MB103814597E6C7DB4632E278E8CE40-BBcFnVpqZhWjUUTFdQAMQmrFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>
2018-01-30 15:35                               ` Christian König
     [not found]                                 ` <a1c8d096-4cf5-0f36-b0d1-8ed705ba7fb2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-30 23:21                                   ` Felix Kuehling
     [not found]                                     ` <cffd64a9-7222-7f9f-4fe8-e37972de9fd9-5C7GfCeVMHo@public.gmane.org>
2018-01-31  8:09                                       ` Christian König
2018-01-27  1:09   ` [PATCH 07/25] drm/amdgpu: Update kgd2kfd_shared_resources for dGPU support Felix Kuehling
     [not found]     ` <1517015381-1080-8-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-01-27  9:19       ` Christian König
     [not found]         ` <de92f17a-5278-1b55-2a22-af17a82f7471-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-28 23:02           ` Felix Kuehling
     [not found]             ` <7425b235-e354-e9b7-0b83-623d9148c61b-5C7GfCeVMHo@public.gmane.org>
2018-01-29 11:42               ` Christian König
     [not found]                 ` <37bf2205-ca7c-f441-1759-48f2d854dea5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-01-29 20:25                   ` Felix Kuehling
     [not found]                     ` <4aefa3bc-66b3-5e39-26e6-cc7c1e66adbd-5C7GfCeVMHo@public.gmane.org>
2018-01-30  9:13                       ` Christian König
2018-01-27  1:09   ` [PATCH 08/25] drm/amdgpu: add amdgpu_sync_clone Felix Kuehling
2018-01-27  1:09   ` [PATCH 09/25] drm/amdgpu: Add GPUVM memory management functions for KFD Felix Kuehling
2018-01-27  1:09   ` [PATCH 10/25] drm/amdgpu: Add submit IB function " Felix Kuehling
2018-01-27  1:09   ` [PATCH 11/25] drm/amdkfd: Add missing #ifdef CONFIG_AMD_IOMMU_V2 guard Felix Kuehling
2018-01-27  1:09   ` [PATCH 12/25] drm/amdkfd: Use per-device sched_policy Felix Kuehling
2018-01-27  1:09   ` [PATCH 13/25] drm/amdkfd: Remove unaligned memory access Felix Kuehling
2018-01-27  1:09   ` [PATCH 14/25] drm/amdkfd: Populate DRM render device minor Felix Kuehling
2018-01-27  1:09   ` [PATCH 15/25] drm/amdkfd: Add GPUVM virtual address space to PDD Felix Kuehling
2018-01-27  1:09   ` [PATCH 16/25] drm/amdkfd: Implement KFD process eviction/restore Felix Kuehling
2018-01-27  1:09   ` [PATCH 17/25] uapi: Fix type used in ioctl parameter structures Felix Kuehling
2018-01-27  1:09   ` [PATCH 18/25] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
2018-01-27  1:09   ` [PATCH 19/25] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
2018-01-27  1:09   ` [PATCH 20/25] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
2018-01-27  1:09   ` [PATCH 21/25] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
2018-01-27  1:09   ` [PATCH 22/25] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
2018-01-27  1:09   ` [PATCH 23/25] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
2018-01-27  1:09   ` [PATCH 24/25] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
2018-01-27  1:09   ` [PATCH 25/25] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
2018-01-27  9:08   ` [PATCH 00/25] Add KFD GPUVM support for dGPUs Christian König
2018-02-07  1:32 [PATCH 00/25] Add KFD GPUVM support for dGPUs v2 Felix Kuehling
     [not found] ` <1517967174-21709-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-07  1:32   ` [PATCH 06/25] drm/amdgpu: Add KFD eviction fence Felix Kuehling
     [not found]     ` <1517967174-21709-7-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-02-11 12:42       ` Oded Gabbay
2018-02-12 19:19         ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.