All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/20] Add KFD GPUVM support for dGPUs v4
@ 2018-03-15 21:27 Felix Kuehling
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Rebased and integrated review feedback from v3:
* Removed vm->vm_context field
* Use uninterruptible waiting in initial PD validation to avoid ERESTARTSYS
* Return number of successful map/unmap operations in failure cases
* Facilitate partial retry after failed map/unmap
* Added comments with parameter descriptions to new APIs
* Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only

This patch series also adds Userptr support in patches 15-20.

Felix Kuehling (19):
  drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
  drm/amdgpu: Fix initial validation of PD BO for KFD VMs
  drm/amdgpu: Add helper to turn an existing VM into a compute VM
  drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
  drm/amdkfd: Create KFD VMs on demand
  drm/amdkfd: Remove limit on number of GPUs
  drm/amdkfd: Aperture setup for dGPUs
  drm/amdkfd: Add per-process IDR for buffer handles
  drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
  drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
  drm/amdkfd: Add ioctls for GPUVM memory management
  drm/amdkfd: Kmap event page for dGPUs
  drm/amdkfd: Add module option for testing large-BAR functionality
  drm/amdgpu: Add MMU notifier type for KFD userptr
  drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
  drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
  drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
  drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
  drm/amdgpu: Add userptr support for KFD

Oak Zeng (1):
  drm/amdkfd: Populate DRM render device minor

 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  37 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818 ++++++++++++++++++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  96 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h             |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  30 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  70 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  10 +
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           | 532 ++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   3 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  40 +-
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  22 +-
 drivers/gpu/drm/amd/amdkfd/kfd_events.c            |  31 +-
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  59 +-
 drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   7 +
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  37 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  41 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 314 +++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |   4 +
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   1 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  10 +
 include/uapi/linux/kfd_ioctl.h                     | 122 ++-
 26 files changed, 2090 insertions(+), 213 deletions(-)

-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 01/20] drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 02/20] drm/amdgpu: Fix initial validation of PD BO for KFD VMs Felix Kuehling
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Remove struct amdkfd_vm and move the fields into struct amdgpu_vm.
This will allow turning a VM created by a DRM render node into a
KFD VM.

v2: Removed vm_context field

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 21 ------
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 88 +++++++++++-------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h           |  9 +++
 3 files changed, 50 insertions(+), 68 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ad7292f..6ba3dd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -93,27 +93,6 @@ struct amdkfd_process_info {
 	struct amdgpu_amdkfd_fence *eviction_fence;
 };
 
-/* struct amdkfd_vm -
- * For Memory Eviction KGD requires a mechanism to keep track of all KFD BOs
- * belonging to a KFD process. All the VMs belonging to the same process point
- * to the same amdkfd_process_info.
- */
-struct amdkfd_vm {
-	/* Keep base as the first parameter for pointer compatibility between
-	 * amdkfd_vm and amdgpu_vm.
-	 */
-	struct amdgpu_vm base;
-
-	/* List node in amdkfd_process_info.vm_list_head*/
-	struct list_head vm_list_node;
-
-	struct amdgpu_device *adev;
-	/* Points to the KFD process VM info*/
-	struct amdkfd_process_info *process_info;
-
-	uint64_t pd_phys_addr;
-};
-
 int amdgpu_amdkfd_init(void);
 void amdgpu_amdkfd_fini(void);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index e0371a9..d712809 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -333,9 +333,9 @@ static int amdgpu_amdkfd_validate(void *param, struct amdgpu_bo *bo)
  * again. Page directories are only updated after updating page
  * tables.
  */
-static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
+static int vm_validate_pt_pd_bos(struct amdgpu_vm *vm)
 {
-	struct amdgpu_bo *pd = vm->base.root.base.bo;
+	struct amdgpu_bo *pd = vm->root.base.bo;
 	struct amdgpu_device *adev = amdgpu_ttm_adev(pd->tbo.bdev);
 	struct amdgpu_vm_parser param;
 	uint64_t addr, flags = AMDGPU_PTE_VALID;
@@ -344,7 +344,7 @@ static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
 	param.domain = AMDGPU_GEM_DOMAIN_VRAM;
 	param.wait = false;
 
-	ret = amdgpu_vm_validate_pt_bos(adev, &vm->base, amdgpu_amdkfd_validate,
+	ret = amdgpu_vm_validate_pt_bos(adev, vm, amdgpu_amdkfd_validate,
 					&param);
 	if (ret) {
 		pr_err("amdgpu: failed to validate PT BOs\n");
@@ -357,11 +357,11 @@ static int vm_validate_pt_pd_bos(struct amdkfd_vm *vm)
 		return ret;
 	}
 
-	addr = amdgpu_bo_gpu_offset(vm->base.root.base.bo);
+	addr = amdgpu_bo_gpu_offset(vm->root.base.bo);
 	amdgpu_gmc_get_vm_pde(adev, -1, &addr, &flags);
 	vm->pd_phys_addr = addr;
 
-	if (vm->base.use_cpu_for_update) {
+	if (vm->use_cpu_for_update) {
 		ret = amdgpu_bo_kmap(pd, NULL);
 		if (ret) {
 			pr_err("amdgpu: failed to kmap PD, ret=%d\n", ret);
@@ -415,14 +415,12 @@ static int vm_update_pds(struct amdgpu_vm *vm, struct amdgpu_sync *sync)
  * 4a.  Validate new page tables and directories
  */
 static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
-		struct amdgpu_vm *avm, bool is_aql,
+		struct amdgpu_vm *vm, bool is_aql,
 		struct kfd_bo_va_list **p_bo_va_entry)
 {
 	int ret;
 	struct kfd_bo_va_list *bo_va_entry;
-	struct amdkfd_vm *kvm = container_of(avm,
-					     struct amdkfd_vm, base);
-	struct amdgpu_bo *pd = avm->root.base.bo;
+	struct amdgpu_bo *pd = vm->root.base.bo;
 	struct amdgpu_bo *bo = mem->bo;
 	uint64_t va = mem->va;
 	struct list_head *list_bo_va = &mem->bo_va_list;
@@ -441,10 +439,10 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 		return -ENOMEM;
 
 	pr_debug("\t add VA 0x%llx - 0x%llx to vm %p\n", va,
-			va + bo_size, avm);
+			va + bo_size, vm);
 
 	/* Add BO to VM internal data structures*/
-	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, avm, bo);
+	bo_va_entry->bo_va = amdgpu_vm_bo_add(adev, vm, bo);
 	if (!bo_va_entry->bo_va) {
 		ret = -EINVAL;
 		pr_err("Failed to add BO object to VM. ret == %d\n",
@@ -467,28 +465,28 @@ static int add_bo_to_vm(struct amdgpu_device *adev, struct kgd_mem *mem,
 	 * fence, so remove it temporarily.
 	 */
 	amdgpu_amdkfd_remove_eviction_fence(pd,
-					kvm->process_info->eviction_fence,
+					vm->process_info->eviction_fence,
 					NULL, NULL);
 
-	ret = amdgpu_vm_alloc_pts(adev, avm, va, amdgpu_bo_size(bo));
+	ret = amdgpu_vm_alloc_pts(adev, vm, va, amdgpu_bo_size(bo));
 	if (ret) {
 		pr_err("Failed to allocate pts, err=%d\n", ret);
 		goto err_alloc_pts;
 	}
 
-	ret = vm_validate_pt_pd_bos(kvm);
+	ret = vm_validate_pt_pd_bos(vm);
 	if (ret) {
 		pr_err("validate_pt_pd_bos() failed\n");
 		goto err_alloc_pts;
 	}
 
 	/* Add the eviction fence back */
-	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+	amdgpu_bo_fence(pd, &vm->process_info->eviction_fence->base, true);
 
 	return 0;
 
 err_alloc_pts:
-	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+	amdgpu_bo_fence(pd, &vm->process_info->eviction_fence->base, true);
 	amdgpu_vm_bo_rmv(adev, bo_va_entry->bo_va);
 	list_del(&bo_va_entry->bo_list);
 err_vmadd:
@@ -703,7 +701,6 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
 {
 	struct amdgpu_bo_va *bo_va = entry->bo_va;
 	struct amdgpu_vm *vm = bo_va->base.vm;
-	struct amdkfd_vm *kvm = container_of(vm, struct amdkfd_vm, base);
 	struct amdgpu_bo *pd = vm->root.base.bo;
 
 	/* Remove eviction fence from PD (and thereby from PTs too as
@@ -713,14 +710,14 @@ static int unmap_bo_from_gpuvm(struct amdgpu_device *adev,
 	 * trigger the eviction fence.
 	 */
 	amdgpu_amdkfd_remove_eviction_fence(pd,
-					    kvm->process_info->eviction_fence,
+					    vm->process_info->eviction_fence,
 					    NULL, NULL);
 	amdgpu_vm_bo_unmap(adev, bo_va, entry->va);
 
 	amdgpu_vm_clear_freed(adev, vm, &bo_va->last_pt_update);
 
 	/* Add the eviction fence back */
-	amdgpu_bo_fence(pd, &kvm->process_info->eviction_fence->base, true);
+	amdgpu_bo_fence(pd, &vm->process_info->eviction_fence->base, true);
 
 	sync_vm_fence(adev, sync, bo_va->last_pt_update);
 
@@ -780,7 +777,7 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
 
 static int process_validate_vms(struct amdkfd_process_info *process_info)
 {
-	struct amdkfd_vm *peer_vm;
+	struct amdgpu_vm *peer_vm;
 	int ret;
 
 	list_for_each_entry(peer_vm, &process_info->vm_list_head,
@@ -796,12 +793,12 @@ static int process_validate_vms(struct amdkfd_process_info *process_info)
 static int process_update_pds(struct amdkfd_process_info *process_info,
 			      struct amdgpu_sync *sync)
 {
-	struct amdkfd_vm *peer_vm;
+	struct amdgpu_vm *peer_vm;
 	int ret;
 
 	list_for_each_entry(peer_vm, &process_info->vm_list_head,
 			    vm_list_node) {
-		ret = vm_update_pds(&peer_vm->base, sync);
+		ret = vm_update_pds(peer_vm, sync);
 		if (ret)
 			return ret;
 	}
@@ -814,7 +811,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 					  struct dma_fence **ef)
 {
 	int ret;
-	struct amdkfd_vm *new_vm;
+	struct amdgpu_vm *new_vm;
 	struct amdkfd_process_info *info;
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
@@ -823,12 +820,11 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 		return -ENOMEM;
 
 	/* Initialize the VM context, allocate the page directory and zero it */
-	ret = amdgpu_vm_init(adev, &new_vm->base, AMDGPU_VM_CONTEXT_COMPUTE, 0);
+	ret = amdgpu_vm_init(adev, new_vm, AMDGPU_VM_CONTEXT_COMPUTE, 0);
 	if (ret) {
 		pr_err("Failed init vm ret %d\n", ret);
 		goto vm_init_fail;
 	}
-	new_vm->adev = adev;
 
 	if (!*process_info) {
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
@@ -871,7 +867,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 	mutex_destroy(&info->lock);
 	kfree(info);
 alloc_process_info_fail:
-	amdgpu_vm_fini(adev, &new_vm->base);
+	amdgpu_vm_fini(adev, new_vm);
 vm_init_fail:
 	kfree(new_vm);
 	return ret;
@@ -881,8 +877,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *) vm;
-	struct amdgpu_vm *avm = &kfd_vm->base;
+	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
 	struct amdgpu_bo *pd;
 	struct amdkfd_process_info *process_info;
 
@@ -896,11 +891,11 @@ void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
 	amdgpu_bo_fence(pd, NULL, false);
 	amdgpu_bo_unreserve(pd);
 
-	process_info = kfd_vm->process_info;
+	process_info = avm->process_info;
 
 	mutex_lock(&process_info->lock);
 	process_info->n_vms--;
-	list_del(&kfd_vm->vm_list_node);
+	list_del(&avm->vm_list_node);
 	mutex_unlock(&process_info->lock);
 
 	/* Release per-process resources */
@@ -919,7 +914,7 @@ void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
 
 uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm)
 {
-	struct amdkfd_vm *avm = (struct amdkfd_vm *)vm;
+	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
 
 	return avm->pd_phys_addr >> AMDGPU_GPU_PAGE_SHIFT;
 }
@@ -930,7 +925,7 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 		uint64_t *offset, uint32_t flags)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
 	struct amdgpu_bo *bo;
 	int byte_align;
 	u32 alloc_domain;
@@ -1010,8 +1005,8 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	(*mem)->va = va;
 	(*mem)->domain = alloc_domain;
 	(*mem)->mapped_to_gpu_memory = 0;
-	(*mem)->process_info = kfd_vm->process_info;
-	add_kgd_mem_to_kfd_bo_list(*mem, kfd_vm->process_info);
+	(*mem)->process_info = avm->process_info;
+	add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info);
 
 	if (offset)
 		*offset = amdgpu_bo_mmap_offset(bo);
@@ -1092,7 +1087,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 		struct kgd_dev *kgd, struct kgd_mem *mem, void *vm)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-	struct amdkfd_vm *kfd_vm = (struct amdkfd_vm *)vm;
+	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
 	int ret;
 	struct amdgpu_bo *bo;
 	uint32_t domain;
@@ -1128,19 +1123,19 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	if (unlikely(ret))
 		goto out;
 
-	if (check_if_add_bo_to_vm((struct amdgpu_vm *)vm, mem)) {
-		ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm, false,
+	if (check_if_add_bo_to_vm(avm, mem)) {
+		ret = add_bo_to_vm(adev, mem, avm, false,
 				&bo_va_entry);
 		if (ret)
 			goto add_bo_to_vm_failed;
 		if (mem->aql_queue) {
-			ret = add_bo_to_vm(adev, mem, (struct amdgpu_vm *)vm,
+			ret = add_bo_to_vm(adev, mem, avm,
 					true, &bo_va_entry_aql);
 			if (ret)
 				goto add_bo_to_vm_failed_aql;
 		}
 	} else {
-		ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+		ret = vm_validate_pt_pd_bos(avm);
 		if (unlikely(ret))
 			goto add_bo_to_vm_failed;
 	}
@@ -1184,7 +1179,7 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 
 	if (!amdgpu_ttm_tt_get_usermm(bo->tbo.ttm) && !bo->pin_count)
 		amdgpu_bo_fence(bo,
-				&kfd_vm->process_info->eviction_fence->base,
+				&avm->process_info->eviction_fence->base,
 				true);
 	ret = unreserve_bo_and_vms(&ctx, false, false);
 
@@ -1209,7 +1204,7 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	struct amdkfd_process_info *process_info =
-		((struct amdkfd_vm *)vm)->process_info;
+		((struct amdgpu_vm *)vm)->process_info;
 	unsigned long bo_size = mem->bo->tbo.mem.size;
 	struct kfd_bo_va_list *entry;
 	struct bo_vm_reservation_context ctx;
@@ -1226,7 +1221,7 @@ int amdgpu_amdkfd_gpuvm_unmap_memory_from_gpu(
 		goto unreserve_out;
 	}
 
-	ret = vm_validate_pt_pd_bos((struct amdkfd_vm *)vm);
+	ret = vm_validate_pt_pd_bos((struct amdgpu_vm *)vm);
 	if (unlikely(ret))
 		goto unreserve_out;
 
@@ -1368,7 +1363,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 {
 	struct amdgpu_bo_list_entry *pd_bo_list;
 	struct amdkfd_process_info *process_info = info;
-	struct amdkfd_vm *peer_vm;
+	struct amdgpu_vm *peer_vm;
 	struct kgd_mem *mem;
 	struct bo_vm_reservation_context ctx;
 	struct amdgpu_amdkfd_fence *new_fence;
@@ -1390,8 +1385,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	mutex_lock(&process_info->lock);
 	list_for_each_entry(peer_vm, &process_info->vm_list_head,
 			vm_list_node)
-		amdgpu_vm_get_pd_bo(&peer_vm->base, &ctx.list,
-				    &pd_bo_list[i++]);
+		amdgpu_vm_get_pd_bo(peer_vm, &ctx.list, &pd_bo_list[i++]);
 
 	/* Reserve all BOs and page tables/directory. Add all BOs from
 	 * kfd_bo_list to ctx.list
@@ -1422,7 +1416,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	/* FIXME: I think this isn't needed */
 	list_for_each_entry(peer_vm, &process_info->vm_list_head,
 			    vm_list_node) {
-		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+		struct amdgpu_bo *bo = peer_vm->root.base.bo;
 
 		ttm_bo_wait(&bo->tbo, false, false);
 	}
@@ -1491,7 +1485,7 @@ int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
 	/* Attach eviction fence to PD / PT BOs */
 	list_for_each_entry(peer_vm, &process_info->vm_list_head,
 			    vm_list_node) {
-		struct amdgpu_bo *bo = peer_vm->base.root.base.bo;
+		struct amdgpu_bo *bo = peer_vm->root.base.bo;
 
 		amdgpu_bo_fence(bo, &process_info->eviction_fence->base, true);
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index e984151..4dfdefb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -207,6 +207,15 @@ struct amdgpu_vm {
 
 	/* Limit non-retry fault storms */
 	unsigned int		fault_credit;
+
+	/* Points to the KFD process VM info */
+	struct amdkfd_process_info *process_info;
+
+	/* List node in amdkfd_process_info.vm_list_head */
+	struct list_head	vm_list_node;
+
+	/* Valid while the PD is reserved or fenced */
+	uint64_t		pd_phys_addr;
 };
 
 struct amdgpu_vm_manager {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 02/20] drm/amdgpu: Fix initial validation of PD BO for KFD VMs
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-03-15 21:27   ` [PATCH 01/20] drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 03/20] drm/amdgpu: Add helper to turn an existing VM into a compute VM Felix Kuehling
                     ` (18 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Make sure the PD BO is valid and attach the eviction fence during VM
creation. This ensures that the pd_phys_address is actually valid
and an eviction that would invalidate it triggers a KFD process
eviction like it should.

v2: Use uninterruptible waiting in initial PD validation

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d712809..c578107 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -812,7 +812,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 {
 	int ret;
 	struct amdgpu_vm *new_vm;
-	struct amdkfd_process_info *info;
+	struct amdkfd_process_info *info = NULL;
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
 	new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
@@ -851,6 +851,23 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 
 	new_vm->process_info = *process_info;
 
+	/* Validate page directory and attach eviction fence */
+	ret = amdgpu_bo_reserve(new_vm->root.base.bo, true);
+	if (ret)
+		goto reserve_pd_fail;
+	ret = vm_validate_pt_pd_bos(new_vm);
+	if (ret) {
+		pr_err("validate_pt_pd_bos() failed\n");
+		goto validate_pd_fail;
+	}
+	ret = ttm_bo_wait(&new_vm->root.base.bo->tbo, false, false);
+	if (ret)
+		goto wait_pd_fail;
+	amdgpu_bo_fence(new_vm->root.base.bo,
+			&new_vm->process_info->eviction_fence->base, true);
+	amdgpu_bo_unreserve(new_vm->root.base.bo);
+
+	/* Update process info */
 	mutex_lock(&new_vm->process_info->lock);
 	list_add_tail(&new_vm->vm_list_node,
 			&(new_vm->process_info->vm_list_head));
@@ -863,6 +880,10 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 
 	return ret;
 
+wait_pd_fail:
+validate_pd_fail:
+	amdgpu_bo_unreserve(new_vm->root.base.bo);
+reserve_pd_fail:
 create_evict_fence_fail:
 	mutex_destroy(&info->lock);
 	kfree(info);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 03/20] drm/amdgpu: Add helper to turn an existing VM into a compute VM
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
  2018-03-15 21:27   ` [PATCH 01/20] drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm Felix Kuehling
  2018-03-15 21:27   ` [PATCH 02/20] drm/amdgpu: Fix initial validation of PD BO for KFD VMs Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 04/20] drm/amdgpu: Add kfd2kgd interface to acquire an existing VM Felix Kuehling
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

v2: Removed updating and checking of vm->vm_context
v3: Enable amdgpu_vm_clear_bo in amdgpu_vm_make_compute

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 67 ++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |  1 +
 2 files changed, 68 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 0b237e0..bdf961a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2462,6 +2462,73 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 }
 
 /**
+ * amdgpu_vm_make_compute - Turn a GFX VM into a compute VM
+ *
+ * This only works on GFX VMs that don't have any BOs added and no
+ * page tables allocated yet.
+ *
+ * Changes the following VM parameters:
+ * - use_cpu_for_update
+ * - pte_supports_ats
+ * - pasid (old PASID is released, because compute manages its own PASIDs)
+ *
+ * Reinitializes the page directory to reflect the changed ATS
+ * setting. May leave behind an unused shadow BO for the page
+ * directory when switching from SDMA updates to CPU updates.
+ *
+ * Returns 0 for success, -errno for errors.
+ */
+int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+{
+	bool pte_support_ats = (adev->asic_type == CHIP_RAVEN);
+	int r;
+
+	r = amdgpu_bo_reserve(vm->root.base.bo, true);
+	if (r)
+		return r;
+
+	/* Sanity checks */
+	if (!RB_EMPTY_ROOT(&vm->va.rb_root) || vm->root.entries) {
+		r = -EINVAL;
+		goto error;
+	}
+
+	/* Check if PD needs to be reinitialized and do it before
+	 * changing any other state, in case it fails.
+	 */
+	if (pte_support_ats != vm->pte_support_ats) {
+		r = amdgpu_vm_clear_bo(adev, vm, vm->root.base.bo,
+			       adev->vm_manager.root_level,
+			       pte_support_ats);
+		if (r)
+			goto error;
+	}
+
+	/* Update VM state */
+	vm->use_cpu_for_update = !!(adev->vm_manager.vm_update_mode &
+				    AMDGPU_VM_USE_CPU_FOR_COMPUTE);
+	vm->pte_support_ats = pte_support_ats;
+	DRM_DEBUG_DRIVER("VM update mode is %s\n",
+			 vm->use_cpu_for_update ? "CPU" : "SDMA");
+	WARN_ONCE((vm->use_cpu_for_update & !amdgpu_vm_is_large_bar(adev)),
+		  "CPU update of VM recommended only for large BAR system\n");
+
+	if (vm->pasid) {
+		unsigned long flags;
+
+		spin_lock_irqsave(&adev->vm_manager.pasid_lock, flags);
+		idr_remove(&adev->vm_manager.pasid_idr, vm->pasid);
+		spin_unlock_irqrestore(&adev->vm_manager.pasid_lock, flags);
+
+		vm->pasid = 0;
+	}
+
+error:
+	amdgpu_bo_unreserve(vm->root.base.bo);
+	return r;
+}
+
+/**
  * amdgpu_vm_free_levels - free PD/PT levels
  *
  * @adev: amdgpu device structure
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 4dfdefb..30f0803 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -260,6 +260,7 @@ void amdgpu_vm_manager_init(struct amdgpu_device *adev);
 void amdgpu_vm_manager_fini(struct amdgpu_device *adev);
 int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 		   int vm_context, unsigned int pasid);
+int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm);
 bool amdgpu_vm_pasid_fault_credit(struct amdgpu_device *adev,
 				  unsigned int pasid);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 04/20] drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (2 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 03/20] drm/amdgpu: Add helper to turn an existing VM into a compute VM Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 05/20] drm/amdkfd: Create KFD VMs on demand Felix Kuehling
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This allows acquiring an existing VM from a render node FD to use it
for a compute process.

Such VMs get destroyed when the original file descriptor is released.
Added a callback from amdgpu_vm_fini to handle KFD VM destruction
correctly in this case.

v2:
* Removed vm->vm_context check in amdgpu_amdkfd_gpuvm_destroy_cb,
  check vm->process_info earlier instead

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h        |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 166 +++++++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c            |   3 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h   |   2 +
 6 files changed, 124 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 6ba3dd6..c2c2bea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -145,6 +145,12 @@ uint64_t amdgpu_amdkfd_get_vram_usage(struct kgd_dev *kgd);
 int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 					  void **process_info,
 					  struct dma_fence **ef);
+int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
+					   struct file *filp,
+					   void **vm, void **process_info,
+					   struct dma_fence **ef);
+void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
+				    struct amdgpu_vm *vm);
 void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm);
 uint32_t amdgpu_amdkfd_gpuvm_get_process_page_dir(void *vm);
 int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 7485c37..ea54e53 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -205,6 +205,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_cu_info = get_cu_info,
 	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
 	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
 	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
 	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
 	.set_vm_context_page_table_base = set_vm_context_page_table_base,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 7be4534..89264c9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -165,6 +165,7 @@ static const struct kfd2kgd_calls kfd2kgd = {
 	.get_cu_info = get_cu_info,
 	.get_vram_usage = amdgpu_amdkfd_get_vram_usage,
 	.create_process_vm = amdgpu_amdkfd_gpuvm_create_process_vm,
+	.acquire_process_vm = amdgpu_amdkfd_gpuvm_acquire_process_vm,
 	.destroy_process_vm = amdgpu_amdkfd_gpuvm_destroy_process_vm,
 	.get_process_page_dir = amdgpu_amdkfd_gpuvm_get_process_page_dir,
 	.set_vm_context_page_table_base = set_vm_context_page_table_base,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index c578107..2f42c60 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -806,32 +806,16 @@ static int process_update_pds(struct amdkfd_process_info *process_info,
 	return 0;
 }
 
-int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
-					  void **process_info,
-					  struct dma_fence **ef)
+static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
+		       struct dma_fence **ef)
 {
-	int ret;
-	struct amdgpu_vm *new_vm;
 	struct amdkfd_process_info *info = NULL;
-	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-
-	new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
-	if (!new_vm)
-		return -ENOMEM;
-
-	/* Initialize the VM context, allocate the page directory and zero it */
-	ret = amdgpu_vm_init(adev, new_vm, AMDGPU_VM_CONTEXT_COMPUTE, 0);
-	if (ret) {
-		pr_err("Failed init vm ret %d\n", ret);
-		goto vm_init_fail;
-	}
+	int ret;
 
 	if (!*process_info) {
 		info = kzalloc(sizeof(*info), GFP_KERNEL);
-		if (!info) {
-			ret = -ENOMEM;
-			goto alloc_process_info_fail;
-		}
+		if (!info)
+			return -ENOMEM;
 
 		mutex_init(&info->lock);
 		INIT_LIST_HEAD(&info->vm_list_head);
@@ -842,6 +826,7 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 						   current->mm);
 		if (!info->eviction_fence) {
 			pr_err("Failed to create eviction fence\n");
+			ret = -ENOMEM;
 			goto create_evict_fence_fail;
 		}
 
@@ -849,77 +834,137 @@ int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
 		*ef = dma_fence_get(&info->eviction_fence->base);
 	}
 
-	new_vm->process_info = *process_info;
+	vm->process_info = *process_info;
 
 	/* Validate page directory and attach eviction fence */
-	ret = amdgpu_bo_reserve(new_vm->root.base.bo, true);
+	ret = amdgpu_bo_reserve(vm->root.base.bo, true);
 	if (ret)
 		goto reserve_pd_fail;
-	ret = vm_validate_pt_pd_bos(new_vm);
+	ret = vm_validate_pt_pd_bos(vm);
 	if (ret) {
 		pr_err("validate_pt_pd_bos() failed\n");
 		goto validate_pd_fail;
 	}
-	ret = ttm_bo_wait(&new_vm->root.base.bo->tbo, false, false);
+	ret = ttm_bo_wait(&vm->root.base.bo->tbo, false, false);
 	if (ret)
 		goto wait_pd_fail;
-	amdgpu_bo_fence(new_vm->root.base.bo,
-			&new_vm->process_info->eviction_fence->base, true);
-	amdgpu_bo_unreserve(new_vm->root.base.bo);
+	amdgpu_bo_fence(vm->root.base.bo,
+			&vm->process_info->eviction_fence->base, true);
+	amdgpu_bo_unreserve(vm->root.base.bo);
 
 	/* Update process info */
-	mutex_lock(&new_vm->process_info->lock);
-	list_add_tail(&new_vm->vm_list_node,
-			&(new_vm->process_info->vm_list_head));
-	new_vm->process_info->n_vms++;
-	mutex_unlock(&new_vm->process_info->lock);
-
-	*vm = (void *) new_vm;
-
-	pr_debug("Created process vm %p\n", *vm);
+	mutex_lock(&vm->process_info->lock);
+	list_add_tail(&vm->vm_list_node,
+			&(vm->process_info->vm_list_head));
+	vm->process_info->n_vms++;
+	mutex_unlock(&vm->process_info->lock);
 
-	return ret;
+	return 0;
 
 wait_pd_fail:
 validate_pd_fail:
-	amdgpu_bo_unreserve(new_vm->root.base.bo);
+	amdgpu_bo_unreserve(vm->root.base.bo);
 reserve_pd_fail:
+	vm->process_info = NULL;
+	if (info) {
+		/* Two fence references: one in info and one in *ef */
+		dma_fence_put(&info->eviction_fence->base);
+		dma_fence_put(*ef);
+		*ef = NULL;
+		*process_info = NULL;
 create_evict_fence_fail:
-	mutex_destroy(&info->lock);
-	kfree(info);
-alloc_process_info_fail:
+		mutex_destroy(&info->lock);
+		kfree(info);
+	}
+	return ret;
+}
+
+int amdgpu_amdkfd_gpuvm_create_process_vm(struct kgd_dev *kgd, void **vm,
+					  void **process_info,
+					  struct dma_fence **ef)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdgpu_vm *new_vm;
+	int ret;
+
+	new_vm = kzalloc(sizeof(*new_vm), GFP_KERNEL);
+	if (!new_vm)
+		return -ENOMEM;
+
+	/* Initialize AMDGPU part of the VM */
+	ret = amdgpu_vm_init(adev, new_vm, AMDGPU_VM_CONTEXT_COMPUTE, 0);
+	if (ret) {
+		pr_err("Failed init vm ret %d\n", ret);
+		goto amdgpu_vm_init_fail;
+	}
+
+	/* Initialize KFD part of the VM and process info */
+	ret = init_kfd_vm(new_vm, process_info, ef);
+	if (ret)
+		goto init_kfd_vm_fail;
+
+	*vm = (void *) new_vm;
+
+	return 0;
+
+init_kfd_vm_fail:
 	amdgpu_vm_fini(adev, new_vm);
-vm_init_fail:
+amdgpu_vm_init_fail:
 	kfree(new_vm);
 	return ret;
-
 }
 
-void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
+int amdgpu_amdkfd_gpuvm_acquire_process_vm(struct kgd_dev *kgd,
+					   struct file *filp,
+					   void **vm, void **process_info,
+					   struct dma_fence **ef)
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
-	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
-	struct amdgpu_bo *pd;
-	struct amdkfd_process_info *process_info;
+	struct drm_file *drm_priv = filp->private_data;
+	struct amdgpu_fpriv *drv_priv = drm_priv->driver_priv;
+	struct amdgpu_vm *avm = &drv_priv->vm;
+	int ret;
 
-	if (WARN_ON(!kgd || !vm))
+	/* Already a compute VM? */
+	if (avm->process_info)
+		return -EINVAL;
+
+	/* Convert VM into a compute VM */
+	ret = amdgpu_vm_make_compute(adev, avm);
+	if (ret)
+		return ret;
+
+	/* Initialize KFD part of the VM and process info */
+	ret = init_kfd_vm(avm, process_info, ef);
+	if (ret)
+		return ret;
+
+	*vm = (void *)avm;
+
+	return 0;
+}
+
+void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
+				    struct amdgpu_vm *vm)
+{
+	struct amdkfd_process_info *process_info = vm->process_info;
+	struct amdgpu_bo *pd = vm->root.base.bo;
+
+	if (!process_info)
 		return;
 
-	pr_debug("Destroying process vm %p\n", vm);
 	/* Release eviction fence from PD */
-	pd = avm->root.base.bo;
 	amdgpu_bo_reserve(pd, false);
 	amdgpu_bo_fence(pd, NULL, false);
 	amdgpu_bo_unreserve(pd);
 
-	process_info = avm->process_info;
-
+	/* Update process info */
 	mutex_lock(&process_info->lock);
 	process_info->n_vms--;
-	list_del(&avm->vm_list_node);
+	list_del(&vm->vm_list_node);
 	mutex_unlock(&process_info->lock);
 
-	/* Release per-process resources */
+	/* Release per-process resources when last compute VM is destroyed */
 	if (!process_info->n_vms) {
 		WARN_ON(!list_empty(&process_info->kfd_bo_list));
 
@@ -927,6 +972,17 @@ void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
 		mutex_destroy(&process_info->lock);
 		kfree(process_info);
 	}
+}
+
+void amdgpu_amdkfd_gpuvm_destroy_process_vm(struct kgd_dev *kgd, void *vm)
+{
+	struct amdgpu_device *adev = get_amdgpu_device(kgd);
+	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
+
+	if (WARN_ON(!kgd || !vm))
+		return;
+
+	pr_debug("Destroying process vm %p\n", vm);
 
 	/* Release the VM context */
 	amdgpu_vm_fini(adev, avm);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bdf961a..90ff79e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -32,6 +32,7 @@
 #include <drm/amdgpu_drm.h>
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
+#include "amdgpu_amdkfd.h"
 
 /*
  * GPUVM
@@ -2575,6 +2576,8 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 	u64 fault;
 	int i, r;
 
+	amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
+
 	/* Clear pending page faults from IH when the VM is destroyed */
 	while (kfifo_get(&vm->faults, &fault))
 		amdgpu_ih_clear_fault(adev, fault);
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 1e5c22ce..b1f35c8 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -336,6 +336,8 @@ struct kfd2kgd_calls {
 
 	int (*create_process_vm)(struct kgd_dev *kgd, void **vm,
 			void **process_info, struct dma_fence **ef);
+	int (*acquire_process_vm)(struct kgd_dev *kgd, struct file *filp,
+			void **vm, void **process_info, struct dma_fence **ef);
 	void (*destroy_process_vm)(struct kgd_dev *kgd, void *vm);
 	uint32_t (*get_process_page_dir)(void *vm);
 	void (*set_vm_context_page_table_base)(struct kgd_dev *kgd,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 05/20] drm/amdkfd: Create KFD VMs on demand
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (3 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 04/20] drm/amdgpu: Add kfd2kgd interface to acquire an existing VM Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 06/20] drm/amdkfd: Populate DRM render device minor Felix Kuehling
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Instead of creating all VMs on process creation, create them when
a process is bound to a device. This will later allow registering
an existing VM from a DRM render node FD at runtime, before the
process is bound to the device. This way the render node VM can be
used for KFD instead of creating our own redundant VM.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 60 ++++++++++++++++++++++++++------
 2 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cac7aa2..014d608 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -536,6 +536,7 @@ struct kfd_process_device {
 	uint64_t scratch_limit;
 
 	/* VM context for GPUVM allocations */
+	struct file *drm_file;
 	void *vm;
 
 	/* Flag used to tell the pdd has dequeued from the dqm.
@@ -661,6 +662,8 @@ void kfd_unref_process(struct kfd_process *p);
 void kfd_suspend_all_processes(void);
 int kfd_resume_all_processes(void);
 
+int kfd_process_device_init_vm(struct kfd_process_device *pdd,
+			       struct file *drm_file);
 struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 						struct kfd_process *p);
 struct kfd_process_device *kfd_get_process_device_data(struct kfd_dev *dev,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 18b2b86..6618aaa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -30,6 +30,7 @@
 #include <linux/notifier.h>
 #include <linux/compat.h>
 #include <linux/mman.h>
+#include <linux/file.h>
 
 struct mm_struct;
 
@@ -158,7 +159,9 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 		pr_debug("Releasing pdd (topology id %d) for process (pasid %d)\n",
 				pdd->dev->id, p->pasid);
 
-		if (pdd->vm)
+		if (pdd->drm_file)
+			fput(pdd->drm_file);
+		else if (pdd->vm)
 			pdd->dev->kfd2kgd->destroy_process_vm(
 				pdd->dev->kgd, pdd->vm);
 
@@ -418,18 +421,51 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
-	/* Create the GPUVM context for this specific device */
-	if (dev->kfd2kgd->create_process_vm(dev->kgd, &pdd->vm,
-					    &p->kgd_process_info, &p->ef)) {
+	return pdd;
+}
+
+/**
+ * kfd_process_device_init_vm - Initialize a VM for a process-device
+ *
+ * @pdd: The process-device
+ * @drm_file: Optional pointer to a DRM file descriptor
+ *
+ * If @drm_file is specified, it will be used to acquire the VM from
+ * that file descriptor. If successful, the @pdd takes ownership of
+ * the file descriptor.
+ *
+ * If @drm_file is NULL, a new VM is created.
+ *
+ * Returns 0 on success, -errno on failure.
+ */
+int kfd_process_device_init_vm(struct kfd_process_device *pdd,
+			       struct file *drm_file)
+{
+	struct kfd_process *p;
+	struct kfd_dev *dev;
+	int ret;
+
+	if (pdd->vm)
+		return drm_file ? -EBUSY : 0;
+
+	p = pdd->process;
+	dev = pdd->dev;
+
+	if (drm_file)
+		ret = dev->kfd2kgd->acquire_process_vm(
+			dev->kgd, drm_file,
+			&pdd->vm, &p->kgd_process_info, &p->ef);
+	else
+		ret = dev->kfd2kgd->create_process_vm(
+			dev->kgd, &pdd->vm, &p->kgd_process_info, &p->ef);
+	if (ret) {
 		pr_err("Failed to create process VM object\n");
-		goto err_create_pdd;
+		return ret;
 	}
-	return pdd;
 
-err_create_pdd:
-	list_del(&pdd->per_device_list);
-	kfree(pdd);
-	return NULL;
+	pdd->drm_file = drm_file;
+
+	return 0;
 }
 
 /*
@@ -455,6 +491,10 @@ struct kfd_process_device *kfd_bind_process_to_device(struct kfd_dev *dev,
 	if (err)
 		return ERR_PTR(err);
 
+	err = kfd_process_device_init_vm(pdd, NULL);
+	if (err)
+		return ERR_PTR(err);
+
 	return pdd;
 }
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 06/20] drm/amdkfd: Populate DRM render device minor
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (4 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 05/20] drm/amdkfd: Create KFD VMs on demand Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 07/20] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
                     ` (14 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Oak Zeng, Felix Kuehling

From: Oak Zeng <Oak.Zeng@amd.com>

Populate DRM render device minor in kfd topology

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 ++++
 drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 1 +
 2 files changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 2506155..ac28abc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -441,6 +441,8 @@ static ssize_t node_show(struct kobject *kobj, struct attribute *attr,
 			dev->node_props.device_id);
 	sysfs_show_32bit_prop(buffer, "location_id",
 			dev->node_props.location_id);
+	sysfs_show_32bit_prop(buffer, "drm_render_minor",
+			dev->node_props.drm_render_minor);
 
 	if (dev->gpu) {
 		log_max_watch_addr =
@@ -1214,6 +1216,8 @@ int kfd_topology_add_device(struct kfd_dev *gpu)
 		dev->gpu->kfd2kgd->get_max_engine_clock_in_mhz(dev->gpu->kgd);
 	dev->node_props.max_engine_clk_ccompute =
 		cpufreq_quick_get_max(0) / 1000;
+	dev->node_props.drm_render_minor =
+		gpu->shared_resources.drm_render_minor;
 
 	kfd_fill_mem_clk_max_info(dev);
 	kfd_fill_iolink_non_crat_info(dev);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
index c0be2be..eb54cfc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.h
@@ -71,6 +71,7 @@ struct kfd_node_properties {
 	uint32_t location_id;
 	uint32_t max_engine_clk_fcompute;
 	uint32_t max_engine_clk_ccompute;
+	int32_t  drm_render_minor;
 	uint16_t marketing_name[KFD_TOPOLOGY_PUBLIC_NAME_SIZE];
 };
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 07/20] drm/amdkfd: Remove limit on number of GPUs
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (5 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 06/20] drm/amdkfd: Populate DRM render device minor Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 08/20] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
                     ` (13 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Currently the number of GPUs is limited by aperture placement options
available on GFX7 and GFX8 hardware. This limitation is not necessary.
Scratch and LDS represent per-work-item and per-work-group storage
respectively. Different work-items and work-groups use the same virtual
address to access their own data. Work running on different GPUs is by
definition in different work-groups (different dispatches, in fact).
That means the same virtual addresses can be used for these apertures
on different GPUs.

Add a new AMDKFD_IOC_GET_PROCESS_APERTURES_NEW ioctl that removes the
artificial limitation on the number of GPUs that can be supported. The
new ioctl allows user mode to query the number of GPUs to allocate
enough memory for all GPUs to be reported.

This deprecates AMDKFD_IOC_GET_PROCESS_APERTURES.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c     | 94 ++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 22 +++----
 include/uapi/linux/kfd_ioctl.h               | 27 +++++++-
 3 files changed, 128 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 6fe2496..7d40094 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -825,6 +825,97 @@ static int kfd_ioctl_get_process_apertures(struct file *filp,
 	return 0;
 }
 
+static int kfd_ioctl_get_process_apertures_new(struct file *filp,
+				struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_get_process_apertures_new_args *args = data;
+	struct kfd_process_device_apertures *pa;
+	struct kfd_process_device *pdd;
+	uint32_t nodes = 0;
+	int ret;
+
+	dev_dbg(kfd_device, "get apertures for PASID %d", p->pasid);
+
+	if (args->num_of_nodes == 0) {
+		/* Return number of nodes, so that user space can alloacate
+		 * sufficient memory
+		 */
+		mutex_lock(&p->mutex);
+
+		if (!kfd_has_process_device_data(p))
+			goto out_unlock;
+
+		/* Run over all pdd of the process */
+		pdd = kfd_get_first_process_device_data(p);
+		do {
+			args->num_of_nodes++;
+			pdd = kfd_get_next_process_device_data(p, pdd);
+		} while (pdd);
+
+		goto out_unlock;
+	}
+
+	/* Fill in process-aperture information for all available
+	 * nodes, but not more than args->num_of_nodes as that is
+	 * the amount of memory allocated by user
+	 */
+	pa = kzalloc((sizeof(struct kfd_process_device_apertures) *
+				args->num_of_nodes), GFP_KERNEL);
+	if (!pa)
+		return -ENOMEM;
+
+	mutex_lock(&p->mutex);
+
+	if (!kfd_has_process_device_data(p)) {
+		args->num_of_nodes = 0;
+		kfree(pa);
+		goto out_unlock;
+	}
+
+	/* Run over all pdd of the process */
+	pdd = kfd_get_first_process_device_data(p);
+	do {
+		pa[nodes].gpu_id = pdd->dev->id;
+		pa[nodes].lds_base = pdd->lds_base;
+		pa[nodes].lds_limit = pdd->lds_limit;
+		pa[nodes].gpuvm_base = pdd->gpuvm_base;
+		pa[nodes].gpuvm_limit = pdd->gpuvm_limit;
+		pa[nodes].scratch_base = pdd->scratch_base;
+		pa[nodes].scratch_limit = pdd->scratch_limit;
+
+		dev_dbg(kfd_device,
+			"gpu id %u\n", pdd->dev->id);
+		dev_dbg(kfd_device,
+			"lds_base %llX\n", pdd->lds_base);
+		dev_dbg(kfd_device,
+			"lds_limit %llX\n", pdd->lds_limit);
+		dev_dbg(kfd_device,
+			"gpuvm_base %llX\n", pdd->gpuvm_base);
+		dev_dbg(kfd_device,
+			"gpuvm_limit %llX\n", pdd->gpuvm_limit);
+		dev_dbg(kfd_device,
+			"scratch_base %llX\n", pdd->scratch_base);
+		dev_dbg(kfd_device,
+			"scratch_limit %llX\n", pdd->scratch_limit);
+		nodes++;
+
+		pdd = kfd_get_next_process_device_data(p, pdd);
+	} while (pdd && (nodes < args->num_of_nodes));
+	mutex_unlock(&p->mutex);
+
+	args->num_of_nodes = nodes;
+	ret = copy_to_user(
+			(void __user *)args->kfd_process_device_apertures_ptr,
+			pa,
+			(nodes * sizeof(struct kfd_process_device_apertures)));
+	kfree(pa);
+	return ret ? -EFAULT : 0;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return 0;
+}
+
 static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 					void *data)
 {
@@ -1017,6 +1108,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_SET_TRAP_HANDLER,
 			kfd_ioctl_set_trap_handler, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
+			kfd_ioctl_get_process_apertures_new, 0),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index 7377513..a06b010 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -282,14 +282,14 @@
 	(((uint64_t)(base) & \
 		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
 
-#define MAKE_SCRATCH_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x100000000L)
+#define MAKE_SCRATCH_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
 
 #define MAKE_SCRATCH_APP_LIMIT(base) \
 	(((uint64_t)base & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
-#define MAKE_LDS_APP_BASE(gpu_num) \
-	(((uint64_t)(gpu_num) << 61) + 0x0)
+#define MAKE_LDS_APP_BASE() \
+	(((uint64_t)(0x1UL) << 61) + 0x0)
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
@@ -314,7 +314,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			return -1;
 		}
 		/*
-		 * For 64 bit process aperture will be statically reserved in
+		 * For 64 bit process apertures will be statically reserved in
 		 * the x86_64 non canonical process address space
 		 * amdkfd doesn't currently support apertures for 32 bit process
 		 */
@@ -323,12 +323,11 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_base = pdd->gpuvm_limit = 0;
 			pdd->scratch_base = pdd->scratch_limit = 0;
 		} else {
-			/*
-			 * node id couldn't be 0 - the three MSB bits of
-			 * aperture shoudn't be 0
+			/* Same LDS and scratch apertures can be used
+			 * on all GPUs. This allows using more dGPUs
+			 * than placement options for apertures.
 			 */
-			pdd->lds_base = MAKE_LDS_APP_BASE(id + 1);
-
+			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
 			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
@@ -336,8 +335,7 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->gpuvm_limit =
 					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
 
-			pdd->scratch_base = MAKE_SCRATCH_APP_BASE(id + 1);
-
+			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
 		}
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 111d73b..5201437 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -107,8 +107,6 @@ struct kfd_ioctl_get_clock_counters_args {
 	__u32 pad;
 };
 
-#define NUM_OF_SUPPORTED_GPUS 7
-
 struct kfd_process_device_apertures {
 	__u64 lds_base;		/* from KFD */
 	__u64 lds_limit;		/* from KFD */
@@ -120,6 +118,12 @@ struct kfd_process_device_apertures {
 	__u32 pad;
 };
 
+/*
+ * AMDKFD_IOC_GET_PROCESS_APERTURES is deprecated. Use
+ * AMDKFD_IOC_GET_PROCESS_APERTURES_NEW instead, which supports an
+ * unlimited number of GPUs.
+ */
+#define NUM_OF_SUPPORTED_GPUS 7
 struct kfd_ioctl_get_process_apertures_args {
 	struct kfd_process_device_apertures
 			process_apertures[NUM_OF_SUPPORTED_GPUS];/* from KFD */
@@ -129,6 +133,19 @@ struct kfd_ioctl_get_process_apertures_args {
 	__u32 pad;
 };
 
+struct kfd_ioctl_get_process_apertures_new_args {
+	/* User allocated. Pointer to struct kfd_process_device_apertures
+	 * filled in by Kernel
+	 */
+	__u64 kfd_process_device_apertures_ptr;
+	/* to KFD - indicates amount of memory present in
+	 *  kfd_process_device_apertures_ptr
+	 * from KFD - Number of entries filled by KFD.
+	 */
+	__u32 num_of_nodes;
+	__u32 pad;
+};
+
 #define MAX_ALLOWED_NUM_POINTS    100
 #define MAX_ALLOWED_AW_BUFF_SIZE 4096
 #define MAX_ALLOWED_WAC_BUFF_SIZE  128
@@ -332,7 +349,11 @@ struct kfd_ioctl_set_trap_handler_args {
 #define AMDKFD_IOC_SET_TRAP_HANDLER		\
 		AMDKFD_IOW(0x13, struct kfd_ioctl_set_trap_handler_args)
 
+#define AMDKFD_IOC_GET_PROCESS_APERTURES_NEW	\
+		AMDKFD_IOWR(0x14,		\
+			struct kfd_ioctl_get_process_apertures_new_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x14
+#define AMDKFD_COMMAND_END		0x15
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 08/20] drm/amdkfd: Aperture setup for dGPUs
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (6 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 07/20] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 09/20] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
                     ` (12 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Set up the GPUVM aperture for SVM (shared virtual memory) that allows
sharing a part of virtual address space between GPUs and CPUs.

Report the size of the the GPUVM size supported by KGD accurately.

The low part of the GPUVM aperture is reserved for kernel use. This is
for kernel-allocated buffers that are only accessed on the GPU:
- CWSR trap handler
- IB for submitting commands in user-mode context from kernel mode

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c | 37 ++++++++++++++++++++++------
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h        |  4 +++
 2 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
index a06b010..66852de 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c
@@ -278,9 +278,8 @@
 #define MAKE_GPUVM_APP_BASE(gpu_num) \
 	(((uint64_t)(gpu_num) << 61) + 0x1000000000000L)
 
-#define MAKE_GPUVM_APP_LIMIT(base) \
-	(((uint64_t)(base) & \
-		0xFFFFFF0000000000UL) | 0xFFFFFFFFFFL)
+#define MAKE_GPUVM_APP_LIMIT(base, size) \
+	(((uint64_t)(base) & 0xFFFFFF0000000000UL) + (size) - 1)
 
 #define MAKE_SCRATCH_APP_BASE() \
 	(((uint64_t)(0x1UL) << 61) + 0x100000000L)
@@ -293,6 +292,14 @@
 #define MAKE_LDS_APP_LIMIT(base) \
 	(((uint64_t)(base) & 0xFFFFFFFF00000000UL) | 0xFFFFFFFF)
 
+/* User mode manages most of the SVM aperture address space. The low
+ * 16MB are reserved for kernel use (CWSR trap handler and kernel IB
+ * for now).
+ */
+#define SVM_USER_BASE 0x1000000ull
+#define SVM_CWSR_BASE (SVM_USER_BASE - KFD_CWSR_TBA_TMA_SIZE)
+#define SVM_IB_BASE   (SVM_CWSR_BASE - PAGE_SIZE)
+
 int kfd_init_apertures(struct kfd_process *process)
 {
 	uint8_t id  = 0;
@@ -330,14 +337,28 @@ int kfd_init_apertures(struct kfd_process *process)
 			pdd->lds_base = MAKE_LDS_APP_BASE();
 			pdd->lds_limit = MAKE_LDS_APP_LIMIT(pdd->lds_base);
 
-			pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
-
-			pdd->gpuvm_limit =
-					MAKE_GPUVM_APP_LIMIT(pdd->gpuvm_base);
-
 			pdd->scratch_base = MAKE_SCRATCH_APP_BASE();
 			pdd->scratch_limit =
 				MAKE_SCRATCH_APP_LIMIT(pdd->scratch_base);
+
+			if (dev->device_info->needs_iommu_device) {
+				/* APUs: GPUVM aperture in
+				 * non-canonical address space
+				 */
+				pdd->gpuvm_base = MAKE_GPUVM_APP_BASE(id + 1);
+				pdd->gpuvm_limit = MAKE_GPUVM_APP_LIMIT(
+					pdd->gpuvm_base,
+					dev->shared_resources.gpuvm_size);
+			} else {
+				/* dGPUs: SVM aperture starting at 0
+				 * with small reserved space for kernel
+				 */
+				pdd->gpuvm_base = SVM_USER_BASE;
+				pdd->gpuvm_limit =
+					dev->shared_resources.gpuvm_size - 1;
+				pdd->qpd.cwsr_base = SVM_CWSR_BASE;
+				pdd->qpd.ib_base = SVM_IB_BASE;
+			}
 		}
 
 		dev_dbg(kfd_device, "node id %u\n", id);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 014d608..0d5d924 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -488,8 +488,12 @@ struct qcm_process_device {
 
 	/* CWSR memory */
 	void *cwsr_kaddr;
+	uint64_t cwsr_base;
 	uint64_t tba_addr;
 	uint64_t tma_addr;
+
+	/* IB memory */
+	uint64_t ib_base;
 };
 
 /* KFD Memory Eviction */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 09/20] drm/amdkfd: Add per-process IDR for buffer handles
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (7 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 08/20] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 10/20] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Also used for cleaning up on process termination.

v2: Refactored cleanup on process termination

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 11 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 73 ++++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0d5d924..b2b5ef8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -543,6 +543,9 @@ struct kfd_process_device {
 	struct file *drm_file;
 	void *vm;
 
+	/* GPUVM allocations storage */
+	struct idr alloc_idr;
+
 	/* Flag used to tell the pdd has dequeued from the dqm.
 	 * This is used to prevent dev->dqm->ops.process_termination() from
 	 * being called twice when it is already called in IOMMU callback
@@ -678,6 +681,14 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 int kfd_reserved_mem_mmap(struct kfd_process *process,
 			  struct vm_area_struct *vma);
 
+/* KFD process API for creating and translating handles */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem);
+void *kfd_process_device_translate_handle(struct kfd_process_device *p,
+					int handle);
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle);
+
 /* Process device data iterator */
 struct kfd_process_device *kfd_get_first_process_device_data(
 							struct kfd_process *p);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 6618aaa..a2ae023 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -150,6 +150,40 @@ void kfd_unref_process(struct kfd_process *p)
 	kref_put(&p->ref, kfd_process_ref_release);
 }
 
+static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
+{
+	struct kfd_process *p = pdd->process;
+	void *mem;
+	int id;
+
+	/*
+	 * Remove all handles from idr and release appropriate
+	 * local memory object
+	 */
+	idr_for_each_entry(&pdd->alloc_idr, mem, id) {
+		struct kfd_process_device *peer_pdd;
+
+		list_for_each_entry(peer_pdd, &p->per_device_data,
+				    per_device_list) {
+			if (!peer_pdd->vm)
+				continue;
+			peer_pdd->dev->kfd2kgd->unmap_memory_to_gpu(
+				peer_pdd->dev->kgd, mem, peer_pdd->vm);
+		}
+
+		pdd->dev->kfd2kgd->free_memory_of_gpu(pdd->dev->kgd, mem);
+		kfd_process_device_remove_obj_handle(pdd, id);
+	}
+}
+
+static void kfd_process_free_outstanding_kfd_bos(struct kfd_process *p)
+{
+	struct kfd_process_device *pdd;
+
+	list_for_each_entry(pdd, &p->per_device_data, per_device_list)
+		kfd_process_device_free_bos(pdd);
+}
+
 static void kfd_process_destroy_pdds(struct kfd_process *p)
 {
 	struct kfd_process_device *pdd, *temp;
@@ -171,6 +205,8 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
+		idr_destroy(&pdd->alloc_idr);
+
 		kfree(pdd);
 	}
 }
@@ -187,6 +223,8 @@ static void kfd_process_wq_release(struct work_struct *work)
 
 	kfd_iommu_unbind_process(p);
 
+	kfd_process_free_outstanding_kfd_bos(p);
+
 	kfd_process_destroy_pdds(p);
 	dma_fence_put(p->ef);
 
@@ -371,6 +409,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	return process;
 
 err_init_cwsr:
+	kfd_process_free_outstanding_kfd_bos(process);
 	kfd_process_destroy_pdds(process);
 err_init_apertures:
 	pqm_uninit(&process->pqm);
@@ -421,6 +460,9 @@ struct kfd_process_device *kfd_create_process_device_data(struct kfd_dev *dev,
 	pdd->already_dequeued = false;
 	list_add(&pdd->per_device_list, &p->per_device_data);
 
+	/* Init idr used for memory handle translation */
+	idr_init(&pdd->alloc_idr);
+
 	return pdd;
 }
 
@@ -520,6 +562,37 @@ bool kfd_has_process_device_data(struct kfd_process *p)
 	return !(list_empty(&p->per_device_data));
 }
 
+/* Create specific handle mapped to mem from process local memory idr
+ * Assumes that the process lock is held.
+ */
+int kfd_process_device_create_obj_handle(struct kfd_process_device *pdd,
+					void *mem)
+{
+	return idr_alloc(&pdd->alloc_idr, mem, 0, 0, GFP_KERNEL);
+}
+
+/* Translate specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void *kfd_process_device_translate_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle < 0)
+		return NULL;
+
+	return idr_find(&pdd->alloc_idr, handle);
+}
+
+/* Remove specific handle from process local memory idr
+ * Assumes that the process lock is held.
+ */
+void kfd_process_device_remove_obj_handle(struct kfd_process_device *pdd,
+					int handle)
+{
+	if (handle >= 0)
+		idr_remove(&pdd->alloc_idr, handle);
+}
+
 /* This increments the process->ref counter. */
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid)
 {
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 10/20] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (8 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 09/20] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 11/20] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Add helpers for allocating GPUVM memory in kernel mode and use them
to allocate memory for the CWSR trap handler.

v2: Use dev instead of pdd->dev in kfd_process_free_gpuvm
v3:
* Cleaned up and simplified kfd_process_alloc_gpuvm
* Moved allocation for dGPU to kfd_process_device_init_vm

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 137 ++++++++++++++++++++++++++++---
 1 file changed, 127 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index a2ae023..aeb339d 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -54,7 +54,6 @@ static struct kfd_process *find_process(const struct task_struct *thread);
 static void kfd_process_ref_release(struct kref *ref);
 static struct kfd_process *create_process(const struct task_struct *thread,
 					struct file *filep);
-static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep);
 
 static void evict_process_worker(struct work_struct *work);
 static void restore_process_worker(struct work_struct *work);
@@ -74,6 +73,82 @@ void kfd_process_destroy_wq(void)
 	}
 }
 
+static void kfd_process_free_gpuvm(struct kgd_mem *mem,
+			struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+
+	dev->kfd2kgd->unmap_memory_to_gpu(dev->kgd, mem, pdd->vm);
+	dev->kfd2kgd->free_memory_of_gpu(dev->kgd, mem);
+}
+
+/* kfd_process_alloc_gpuvm - Allocate GPU VM for the KFD process
+ *	This function should be only called right after the process
+ *	is created and when kfd_processes_mutex is still being held
+ *	to avoid concurrency. Because of that exclusiveness, we do
+ *	not need to take p->mutex.
+ */
+static int kfd_process_alloc_gpuvm(struct kfd_process_device *pdd,
+				   uint64_t gpu_va, uint32_t size,
+				   uint32_t flags, void **kptr)
+{
+	struct kfd_dev *kdev = pdd->dev;
+	struct kgd_mem *mem = NULL;
+	int handle;
+	int err;
+
+	err = kdev->kfd2kgd->alloc_memory_of_gpu(kdev->kgd, gpu_va, size,
+						 pdd->vm, &mem, NULL, flags);
+	if (err)
+		goto err_alloc_mem;
+
+	err = kdev->kfd2kgd->map_memory_to_gpu(kdev->kgd, mem, pdd->vm);
+	if (err)
+		goto err_map_mem;
+
+	err = kdev->kfd2kgd->sync_memory(kdev->kgd, mem, true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Create an obj handle so kfd_process_device_remove_obj_handle
+	 * will take care of the bo removal when the process finishes.
+	 * We do not need to take p->mutex, because the process is just
+	 * created and the ioctls have not had the chance to run.
+	 */
+	handle = kfd_process_device_create_obj_handle(pdd, mem);
+
+	if (handle < 0) {
+		err = handle;
+		goto free_gpuvm;
+	}
+
+	if (kptr) {
+		err = kdev->kfd2kgd->map_gtt_bo_to_kernel(kdev->kgd,
+				(struct kgd_mem *)mem, kptr, NULL);
+		if (err) {
+			pr_debug("Map GTT BO to kernel failed\n");
+			goto free_obj_handle;
+		}
+	}
+
+	return err;
+
+free_obj_handle:
+	kfd_process_device_remove_obj_handle(pdd, handle);
+free_gpuvm:
+sync_memory_failed:
+	kfd_process_free_gpuvm(mem, pdd);
+	return err;
+
+err_map_mem:
+	kdev->kfd2kgd->free_memory_of_gpu(kdev->kgd, mem);
+err_alloc_mem:
+	*kptr = NULL;
+	return err;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -201,7 +276,7 @@ static void kfd_process_destroy_pdds(struct kfd_process *p)
 
 		list_del(&pdd->per_device_list);
 
-		if (pdd->qpd.cwsr_kaddr)
+		if (pdd->qpd.cwsr_kaddr && !pdd->qpd.cwsr_base)
 			free_pages((unsigned long)pdd->qpd.cwsr_kaddr,
 				get_order(KFD_CWSR_TBA_TMA_SIZE));
 
@@ -312,18 +387,18 @@ static const struct mmu_notifier_ops kfd_process_mmu_notifier_ops = {
 	.release = kfd_process_notifier_release,
 };
 
-static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep)
+static int kfd_process_init_cwsr_apu(struct kfd_process *p, struct file *filep)
 {
 	unsigned long  offset;
-	struct kfd_process_device *pdd = NULL;
-	struct kfd_dev *dev = NULL;
-	struct qcm_process_device *qpd = NULL;
+	struct kfd_process_device *pdd;
 
 	list_for_each_entry(pdd, &p->per_device_data, per_device_list) {
-		dev = pdd->dev;
-		qpd = &pdd->qpd;
-		if (!dev->cwsr_enabled || qpd->cwsr_kaddr)
+		struct kfd_dev *dev = pdd->dev;
+		struct qcm_process_device *qpd = &pdd->qpd;
+
+		if (!dev->cwsr_enabled || qpd->cwsr_kaddr || qpd->cwsr_base)
 			continue;
+
 		offset = (dev->id | KFD_MMAP_RESERVED_MEM_MASK) << PAGE_SHIFT;
 		qpd->tba_addr = (int64_t)vm_mmap(filep, 0,
 			KFD_CWSR_TBA_TMA_SIZE, PROT_READ | PROT_EXEC,
@@ -348,6 +423,36 @@ static int kfd_process_init_cwsr(struct kfd_process *p, struct file *filep)
 	return 0;
 }
 
+static int kfd_process_device_init_cwsr_dgpu(struct kfd_process_device *pdd)
+{
+	struct kfd_dev *dev = pdd->dev;
+	struct qcm_process_device *qpd = &pdd->qpd;
+	uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+		ALLOC_MEM_FLAGS_NO_SUBSTITUTE | ALLOC_MEM_FLAGS_EXECUTABLE;
+	void *kaddr;
+	int ret;
+
+	if (!dev->cwsr_enabled || qpd->cwsr_kaddr || !qpd->cwsr_base)
+		return 0;
+
+	/* cwsr_base is only set for dGPU */
+	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
+				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
+	if (ret)
+		return ret;
+
+	qpd->cwsr_kaddr = kaddr;
+	qpd->tba_addr = qpd->cwsr_base;
+
+	memcpy(qpd->cwsr_kaddr, dev->cwsr_isa, dev->cwsr_isa_size);
+
+	qpd->tma_addr = qpd->tba_addr + KFD_CWSR_TMA_OFFSET;
+	pr_debug("set tba :0x%llx, tma:0x%llx, cwsr_kaddr:%p for pqm.\n",
+		 qpd->tba_addr, qpd->tma_addr, qpd->cwsr_kaddr);
+
+	return 0;
+}
+
 static struct kfd_process *create_process(const struct task_struct *thread,
 					struct file *filep)
 {
@@ -402,7 +507,7 @@ static struct kfd_process *create_process(const struct task_struct *thread,
 	INIT_DELAYED_WORK(&process->restore_work, restore_process_worker);
 	process->last_restore_timestamp = get_jiffies_64();
 
-	err = kfd_process_init_cwsr(process, filep);
+	err = kfd_process_init_cwsr_apu(process, filep);
 	if (err)
 		goto err_init_cwsr;
 
@@ -505,9 +610,21 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd,
 		return ret;
 	}
 
+	ret = kfd_process_device_init_cwsr_dgpu(pdd);
+	if (ret)
+		goto err_init_cwsr;
+
 	pdd->drm_file = drm_file;
 
 	return 0;
+
+err_init_cwsr:
+	kfd_process_device_free_bos(pdd);
+	if (!drm_file)
+		dev->kfd2kgd->destroy_process_vm(dev->kgd, pdd->vm);
+	pdd->vm = NULL;
+
+	return ret;
 }
 
 /*
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 11/20] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (9 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 10/20] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 12/20] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Amber Lin, Felix Kuehling

On GFX7 the CP does not perform a TC flush when queues are unmapped.
To avoid TC eviction from accessing an invalid VMID, flush it
explicitly before releasing a VMID.

v2: Fix unnecessary list_for_each_entry_safe
v3: Moved allocation to kfd_process_device_init_vm

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 22 ++++++++++++-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    | 37 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  3 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 34 ++++++++++++++++++++
 4 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index b3b6dab..c18e048 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -142,12 +142,31 @@ static int allocate_vmid(struct device_queue_manager *dqm,
 	return 0;
 }
 
+static int flush_texture_cache_nocpsch(struct kfd_dev *kdev,
+				struct qcm_process_device *qpd)
+{
+	uint32_t len;
+
+	if (!qpd->ib_kaddr)
+		return -ENOMEM;
+
+	len = pm_create_release_mem(qpd->ib_base, (uint32_t *)qpd->ib_kaddr);
+
+	return kdev->kfd2kgd->submit_ib(kdev->kgd, KGD_ENGINE_MEC1, qpd->vmid,
+				qpd->ib_base, (uint32_t *)qpd->ib_kaddr, len);
+}
+
 static void deallocate_vmid(struct device_queue_manager *dqm,
 				struct qcm_process_device *qpd,
 				struct queue *q)
 {
 	int bit = qpd->vmid - dqm->dev->vm_info.first_vmid_kfd;
 
+	/* On GFX v7, CP doesn't flush TC at dequeue */
+	if (q->device->device_info->asic_family == CHIP_HAWAII)
+		if (flush_texture_cache_nocpsch(q->device, qpd))
+			pr_err("Failed to flush TC\n");
+
 	kfd_flush_tlb(qpd_to_pdd(qpd));
 
 	/* Release the vmid mapping */
@@ -792,11 +811,12 @@ static void uninitialize(struct device_queue_manager *dqm)
 static int start_nocpsch(struct device_queue_manager *dqm)
 {
 	init_interrupts(dqm);
-	return 0;
+	return pm_init(&dqm->packets, dqm);
 }
 
 static int stop_nocpsch(struct device_queue_manager *dqm)
 {
+	pm_uninit(&dqm->packets);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 0ecbd1f..7614375 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -356,6 +356,43 @@ static int pm_create_runlist_ib(struct packet_manager *pm,
 	return retval;
 }
 
+/* pm_create_release_mem - Create a RELEASE_MEM packet and return the size
+ *     of this packet
+ *     @gpu_addr - GPU address of the packet. It's a virtual address.
+ *     @buffer - buffer to fill up with the packet. It's a CPU kernel pointer
+ *     Return - length of the packet
+ */
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer)
+{
+	struct pm4_mec_release_mem *packet;
+
+	WARN_ON(!buffer);
+
+	packet = (struct pm4_mec_release_mem *)buffer;
+	memset(buffer, 0, sizeof(*packet));
+
+	packet->header.u32All = build_pm4_header(IT_RELEASE_MEM,
+						 sizeof(*packet));
+
+	packet->bitfields2.event_type = CACHE_FLUSH_AND_INV_TS_EVENT;
+	packet->bitfields2.event_index = event_index___release_mem__end_of_pipe;
+	packet->bitfields2.tcl1_action_ena = 1;
+	packet->bitfields2.tc_action_ena = 1;
+	packet->bitfields2.cache_policy = cache_policy___release_mem__lru;
+	packet->bitfields2.atc = 0;
+
+	packet->bitfields3.data_sel = data_sel___release_mem__send_32_bit_low;
+	packet->bitfields3.int_sel =
+		int_sel___release_mem__send_interrupt_after_write_confirm;
+
+	packet->bitfields4.address_lo_32b = (gpu_addr & 0xffffffff) >> 2;
+	packet->address_hi = upper_32_bits(gpu_addr);
+
+	packet->data_lo = 0;
+
+	return sizeof(*packet) / sizeof(unsigned int);
+}
+
 int pm_init(struct packet_manager *pm, struct device_queue_manager *dqm)
 {
 	pm->dqm = dqm;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b2b5ef8..aaed005 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -494,6 +494,7 @@ struct qcm_process_device {
 
 	/* IB memory */
 	uint64_t ib_base;
+	void *ib_kaddr;
 };
 
 /* KFD Memory Eviction */
@@ -834,6 +835,8 @@ int pm_send_unmap_queue(struct packet_manager *pm, enum kfd_queue_type type,
 
 void pm_release_ib(struct packet_manager *pm);
 
+uint32_t pm_create_release_mem(uint64_t gpu_addr, uint32_t *buffer);
+
 uint64_t kfd_get_number_elems(struct kfd_dev *kfd);
 
 /* Events */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index aeb339d..fbf58e9 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -149,6 +149,36 @@ static int kfd_process_alloc_gpuvm(struct kfd_process_device *pdd,
 	return err;
 }
 
+/* kfd_process_device_reserve_ib_mem - Reserve memory inside the
+ *	process for IB usage The memory reserved is for KFD to submit
+ *	IB to AMDGPU from kernel.  If the memory is reserved
+ *	successfully, ib_kaddr_assigned will have the CPU/kernel
+ *	address. Check ib_kaddr_assigned before accessing the memory.
+ */
+static int kfd_process_device_reserve_ib_mem(struct kfd_process_device *pdd)
+{
+	struct qcm_process_device *qpd = &pdd->qpd;
+	uint32_t flags = ALLOC_MEM_FLAGS_GTT |
+			 ALLOC_MEM_FLAGS_NO_SUBSTITUTE |
+			 ALLOC_MEM_FLAGS_WRITABLE |
+			 ALLOC_MEM_FLAGS_EXECUTABLE;
+	void *kaddr;
+	int ret;
+
+	if (qpd->ib_kaddr || !qpd->ib_base)
+		return 0;
+
+	/* ib_base is only set for dGPU */
+	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
+				      &kaddr);
+	if (ret)
+		return ret;
+
+	qpd->ib_kaddr = kaddr;
+
+	return 0;
+}
+
 struct kfd_process *kfd_create_process(struct file *filep)
 {
 	struct kfd_process *process;
@@ -610,6 +640,9 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd,
 		return ret;
 	}
 
+	ret = kfd_process_device_reserve_ib_mem(pdd);
+	if (ret)
+		goto err_reserve_ib_mem;
 	ret = kfd_process_device_init_cwsr_dgpu(pdd);
 	if (ret)
 		goto err_init_cwsr;
@@ -619,6 +652,7 @@ int kfd_process_device_init_vm(struct kfd_process_device *pdd,
 	return 0;
 
 err_init_cwsr:
+err_reserve_ib_mem:
 	kfd_process_device_free_bos(pdd);
 	if (!drm_file)
 		dev->kfd2kgd->destroy_process_vm(dev->kgd, pdd->vm);
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 12/20] drm/amdkfd: Add ioctls for GPUVM memory management
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (10 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 11/20] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 13/20] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

v2:
* Fix error handling after kfd_bind_process_to_device in
  kfd_ioctl_map_memory_to_gpu
v3:
* Add ioctl to acquire VM from a DRM FD
v4:
* Return number of successful map/unmap operations in failure cases
* Facilitate partial retry after failed map/unmap
* Added comments with parameter descriptions to new APIs
* Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c        | 377 ++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |   8 +
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |   2 +
 include/uapi/linux/kfd_ioctl.h                  |  97 +++++-
 4 files changed, 483 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 7d40094..a563ff2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -24,6 +24,7 @@
 #include <linux/export.h>
 #include <linux/err.h>
 #include <linux/fs.h>
+#include <linux/file.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
@@ -1046,6 +1047,366 @@ static int kfd_ioctl_get_tile_config(struct file *filep,
 	return 0;
 }
 
+static int kfd_ioctl_acquire_vm(struct file *filep, struct kfd_process *p,
+				void *data)
+{
+	struct kfd_ioctl_acquire_vm_args *args = data;
+	struct kfd_process_device *pdd;
+	struct kfd_dev *dev;
+	struct file *drm_file;
+	int ret;
+
+	dev = kfd_device_by_id(args->gpu_id);
+	if (!dev)
+		return -EINVAL;
+
+	drm_file = fget(args->drm_fd);
+	if (!drm_file)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	if (pdd->drm_file) {
+		ret = pdd->drm_file == drm_file ? 0 : -EBUSY;
+		goto err_unlock;
+	}
+
+	ret = kfd_process_device_init_vm(pdd, drm_file);
+	if (ret)
+		goto err_unlock;
+	/* On success, the PDD keeps the drm_file reference */
+	mutex_unlock(&p->mutex);
+
+	return 0;
+
+err_unlock:
+	mutex_unlock(&p->mutex);
+	fput(drm_file);
+	return ret;
+}
+
+bool kfd_dev_is_large_bar(struct kfd_dev *dev)
+{
+	struct kfd_local_mem_info mem_info;
+
+	if (dev->device_info->needs_iommu_device)
+		return false;
+
+	dev->kfd2kgd->get_local_mem_info(dev->kgd, &mem_info);
+	if (mem_info.local_mem_size_private == 0 &&
+			mem_info.local_mem_size_public > 0)
+		return true;
+	return false;
+}
+
+static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_alloc_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int idr_handle;
+	long err;
+	uint64_t offset = args->mmap_offset;
+	uint32_t flags = args->flags;
+
+	if (args->size == 0)
+		return -EINVAL;
+
+	dev = kfd_device_by_id(args->gpu_id);
+	if (!dev)
+		return -EINVAL;
+
+	if ((flags & KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC) &&
+		(flags & KFD_IOC_ALLOC_MEM_FLAGS_VRAM) &&
+		!kfd_dev_is_large_bar(dev)) {
+		pr_err("Alloc host visible vram on small bar is not allowed\n");
+		return -EINVAL;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto err_unlock;
+	}
+
+	err = dev->kfd2kgd->alloc_memory_of_gpu(
+		dev->kgd, args->va_addr, args->size,
+		pdd->vm, (struct kgd_mem **) &mem, &offset,
+		flags);
+
+	if (err)
+		goto err_unlock;
+
+	idr_handle = kfd_process_device_create_obj_handle(pdd, mem);
+	if (idr_handle < 0) {
+		err = -EFAULT;
+		goto err_free;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	args->handle = MAKE_HANDLE(args->gpu_id, idr_handle);
+	args->mmap_offset = offset;
+
+	return 0;
+
+err_free:
+	dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
+}
+
+static int kfd_ioctl_free_memory_of_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_free_memory_of_gpu_args *args = data;
+	struct kfd_process_device *pdd;
+	void *mem;
+	struct kfd_dev *dev;
+	int ret;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		pr_err("Process device data doesn't exist\n");
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	mem = kfd_process_device_translate_handle(
+		pdd, GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		ret = -EINVAL;
+		goto err_unlock;
+	}
+
+	ret = dev->kfd2kgd->free_memory_of_gpu(dev->kgd, (struct kgd_mem *)mem);
+
+	/* If freeing the buffer failed, leave the handle in place for
+	 * clean-up during process tear-down.
+	 */
+	if (!ret)
+		kfd_process_device_remove_obj_handle(
+			pdd, GET_IDR_HANDLE(args->handle));
+
+err_unlock:
+	mutex_unlock(&p->mutex);
+	return ret;
+}
+
+static int kfd_ioctl_map_memory_to_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_map_memory_to_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	int i;
+	uint32_t *devices_arr = NULL;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->n_devices) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->n_success > args->n_devices) {
+		pr_debug("n_success exceeds n_devices\n");
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->n_devices * sizeof(*devices_arr),
+			      GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->n_devices * sizeof(*devices_arr));
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_bind_process_to_device(dev, p);
+	if (IS_ERR(pdd)) {
+		err = PTR_ERR(pdd);
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	for (i = args->n_success; i < args->n_devices; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			pr_debug("Getting device by id failed for 0x%x\n",
+				 devices_arr[i]);
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_bind_process_to_device(peer, p);
+		if (IS_ERR(peer_pdd)) {
+			err = PTR_ERR(peer_pdd);
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = peer->kfd2kgd->map_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to map to gpu %d/%d\n",
+			       i, args->n_devices);
+			goto map_memory_to_gpu_failed;
+		}
+		args->n_success = i+1;
+	}
+
+	mutex_unlock(&p->mutex);
+
+	err = dev->kfd2kgd->sync_memory(dev->kgd, (struct kgd_mem *) mem, true);
+	if (err) {
+		pr_debug("Sync memory failed, wait interrupted by user signal\n");
+		goto sync_memory_failed;
+	}
+
+	/* Flush TLBs after waiting for the page table updates to complete */
+	for (i = 0; i < args->n_devices; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (WARN_ON_ONCE(!peer))
+			continue;
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (WARN_ON_ONCE(!peer_pdd))
+			continue;
+		kfd_flush_tlb(peer_pdd);
+	}
+
+	kfree(devices_arr);
+
+	return err;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+map_memory_to_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+sync_memory_failed:
+	kfree(devices_arr);
+
+	return err;
+}
+
+static int kfd_ioctl_unmap_memory_from_gpu(struct file *filep,
+					struct kfd_process *p, void *data)
+{
+	struct kfd_ioctl_unmap_memory_from_gpu_args *args = data;
+	struct kfd_process_device *pdd, *peer_pdd;
+	void *mem;
+	struct kfd_dev *dev, *peer;
+	long err = 0;
+	uint32_t *devices_arr = NULL, i;
+
+	dev = kfd_device_by_id(GET_GPU_ID(args->handle));
+	if (!dev)
+		return -EINVAL;
+
+	if (!args->n_devices) {
+		pr_debug("Device IDs array empty\n");
+		return -EINVAL;
+	}
+	if (args->n_success > args->n_devices) {
+		pr_debug("n_success exceeds n_devices\n");
+		return -EINVAL;
+	}
+
+	devices_arr = kmalloc(args->n_devices * sizeof(*devices_arr),
+			      GFP_KERNEL);
+	if (!devices_arr)
+		return -ENOMEM;
+
+	err = copy_from_user(devices_arr,
+			     (void __user *)args->device_ids_array_ptr,
+			     args->n_devices * sizeof(*devices_arr));
+	if (err != 0) {
+		err = -EFAULT;
+		goto copy_from_user_failed;
+	}
+
+	mutex_lock(&p->mutex);
+
+	pdd = kfd_get_process_device_data(dev, p);
+	if (!pdd) {
+		err = PTR_ERR(pdd);
+		goto bind_process_to_device_failed;
+	}
+
+	mem = kfd_process_device_translate_handle(pdd,
+						GET_IDR_HANDLE(args->handle));
+	if (!mem) {
+		err = -ENOMEM;
+		goto get_mem_obj_from_handle_failed;
+	}
+
+	for (i = args->n_success; i < args->n_devices; i++) {
+		peer = kfd_device_by_id(devices_arr[i]);
+		if (!peer) {
+			err = -EINVAL;
+			goto get_mem_obj_from_handle_failed;
+		}
+
+		peer_pdd = kfd_get_process_device_data(peer, p);
+		if (!peer_pdd) {
+			err = -ENODEV;
+			goto get_mem_obj_from_handle_failed;
+		}
+		err = dev->kfd2kgd->unmap_memory_to_gpu(
+			peer->kgd, (struct kgd_mem *)mem, peer_pdd->vm);
+		if (err) {
+			pr_err("Failed to unmap from gpu %d/%d\n",
+			       i, args->n_devices);
+			goto unmap_memory_from_gpu_failed;
+		}
+		args->n_success = i+1;
+	}
+	kfree(devices_arr);
+
+	mutex_unlock(&p->mutex);
+
+	return 0;
+
+bind_process_to_device_failed:
+get_mem_obj_from_handle_failed:
+unmap_memory_from_gpu_failed:
+	mutex_unlock(&p->mutex);
+copy_from_user_failed:
+	kfree(devices_arr);
+	return err;
+}
+
 #define AMDKFD_IOCTL_DEF(ioctl, _func, _flags) \
 	[_IOC_NR(ioctl)] = {.cmd = ioctl, .func = _func, .flags = _flags, \
 			    .cmd_drv = 0, .name = #ioctl}
@@ -1111,6 +1472,22 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
 	AMDKFD_IOCTL_DEF(AMDKFD_IOC_GET_PROCESS_APERTURES_NEW,
 			kfd_ioctl_get_process_apertures_new, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_ACQUIRE_VM,
+			kfd_ioctl_acquire_vm, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_ALLOC_MEMORY_OF_GPU,
+			kfd_ioctl_alloc_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_FREE_MEMORY_OF_GPU,
+			kfd_ioctl_free_memory_of_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_MAP_MEMORY_TO_GPU,
+			kfd_ioctl_map_memory_to_gpu, 0),
+
+	AMDKFD_IOCTL_DEF(AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU,
+			kfd_ioctl_unmap_memory_from_gpu, 0),
+
 };
 
 #define AMDKFD_CORE_IOCTL_COUNT	ARRAY_SIZE(amdkfd_ioctls)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index aaed005..1542807 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -509,6 +509,14 @@ struct qcm_process_device {
 int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
 					       struct dma_fence *fence);
 
+/* 8 byte handle containing GPU ID in the most significant 4 bytes and
+ * idr_handle in the least significant 4 bytes
+ */
+#define MAKE_HANDLE(gpu_id, idr_handle) \
+	(((uint64_t)(gpu_id) << 32) + idr_handle)
+#define GET_GPU_ID(handle) (handle >> 32)
+#define GET_IDR_HANDLE(handle) (handle & 0xFFFFFFFF)
+
 enum kfd_pdd_bound {
 	PDD_UNBOUND = 0,
 	PDD_BOUND,
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index b1f35c8..237289a 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -130,6 +130,7 @@ struct tile_config {
 
 /*
  * Allocation flag domains
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_VRAM		(1 << 0)
 #define ALLOC_MEM_FLAGS_GTT		(1 << 1)
@@ -138,6 +139,7 @@ struct tile_config {
 
 /*
  * Allocation flags attributes/access options.
+ * NOTE: This must match the corresponding definitions in kfd_ioctl.h.
  */
 #define ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
 #define ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 5201437..b4f5073 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -286,6 +286,86 @@ struct kfd_ioctl_set_trap_handler_args {
 	__u32 pad;
 };
 
+struct kfd_ioctl_acquire_vm_args {
+	__u32 drm_fd;	/* to KFD */
+	__u32 gpu_id;	/* to KFD */
+};
+
+/* Allocation flags: memory types */
+#define KFD_IOC_ALLOC_MEM_FLAGS_VRAM		(1 << 0)
+#define KFD_IOC_ALLOC_MEM_FLAGS_GTT		(1 << 1)
+#define KFD_IOC_ALLOC_MEM_FLAGS_USERPTR		(1 << 2)
+#define KFD_IOC_ALLOC_MEM_FLAGS_DOORBELL	(1 << 3)
+/* Allocation flags: attributes/access options */
+#define KFD_IOC_ALLOC_MEM_FLAGS_WRITABLE	(1 << 31)
+#define KFD_IOC_ALLOC_MEM_FLAGS_EXECUTABLE	(1 << 30)
+#define KFD_IOC_ALLOC_MEM_FLAGS_PUBLIC		(1 << 29)
+#define KFD_IOC_ALLOC_MEM_FLAGS_NO_SUBSTITUTE	(1 << 28)
+#define KFD_IOC_ALLOC_MEM_FLAGS_AQL_QUEUE_MEM	(1 << 27)
+#define KFD_IOC_ALLOC_MEM_FLAGS_COHERENT	(1 << 26)
+
+/* Allocate memory for later SVM (shared virtual memory) mapping.
+ *
+ * @va_addr:     virtual address of the memory to be allocated
+ *               all later mappings on all GPUs will use this address
+ * @size:        size in bytes
+ * @handle:      buffer handle returned to user mode, used to refer to
+ *               this allocation for mapping, unmapping and freeing
+ * @mmap_offset: for CPU-mapping the allocation by mmapping a render node
+ *               for userptrs this is overloaded to specify the CPU address
+ * @gpu_id:      device identifier
+ * @flags:       memory type and attributes. See KFD_IOC_ALLOC_MEM_FLAGS above
+ */
+struct kfd_ioctl_alloc_memory_of_gpu_args {
+	__u64 va_addr;		/* to KFD */
+	__u64 size;		/* to KFD */
+	__u64 handle;		/* from KFD */
+	__u64 mmap_offset;	/* to KFD (userptr), from KFD (mmap offset) */
+	__u32 gpu_id;		/* to KFD */
+	__u32 flags;
+};
+
+/* Free memory allocated with kfd_ioctl_alloc_memory_of_gpu
+ *
+ * @handle: memory handle returned by alloc
+ */
+struct kfd_ioctl_free_memory_of_gpu_args {
+	__u64 handle;		/* to KFD */
+};
+
+/* Map memory to one or more GPUs
+ *
+ * @handle:                memory handle returned by alloc
+ * @device_ids_array_ptr:  array of gpu_ids (__u32 per device)
+ * @n_devices:             number of devices in the array
+ * @n_success:             number of devices mapped successfully
+ *
+ * @n_success returns information to the caller how many devices from
+ * the start of the array have mapped the buffer successfully. It can
+ * be passed into a subsequent retry call to skip those devices. For
+ * the first call the caller should initialize it to 0.
+ *
+ * If the ioctl completes with return code 0 (success), n_success ==
+ * n_devices.
+ */
+struct kfd_ioctl_map_memory_to_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 n_devices;		/* to KFD */
+	__u32 n_success;		/* to/from KFD */
+};
+
+/* Unmap memory from one or more GPUs
+ *
+ * same arguments as for mapping
+ */
+struct kfd_ioctl_unmap_memory_from_gpu_args {
+	__u64 handle;			/* to KFD */
+	__u64 device_ids_array_ptr;	/* to KFD */
+	__u32 n_devices;		/* to KFD */
+	__u32 n_success;		/* to/from KFD */
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)			_IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)		_IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -353,7 +433,22 @@ struct kfd_ioctl_set_trap_handler_args {
 		AMDKFD_IOWR(0x14,		\
 			struct kfd_ioctl_get_process_apertures_new_args)
 
+#define AMDKFD_IOC_ACQUIRE_VM			\
+		AMDKFD_IOW(0x15, struct kfd_ioctl_acquire_vm_args)
+
+#define AMDKFD_IOC_ALLOC_MEMORY_OF_GPU		\
+		AMDKFD_IOWR(0x16, struct kfd_ioctl_alloc_memory_of_gpu_args)
+
+#define AMDKFD_IOC_FREE_MEMORY_OF_GPU		\
+		AMDKFD_IOW(0x17, struct kfd_ioctl_free_memory_of_gpu_args)
+
+#define AMDKFD_IOC_MAP_MEMORY_TO_GPU		\
+		AMDKFD_IOWR(0x18, struct kfd_ioctl_map_memory_to_gpu_args)
+
+#define AMDKFD_IOC_UNMAP_MEMORY_FROM_GPU	\
+		AMDKFD_IOWR(0x19, struct kfd_ioctl_unmap_memory_from_gpu_args)
+
 #define AMDKFD_COMMAND_START		0x01
-#define AMDKFD_COMMAND_END		0x15
+#define AMDKFD_COMMAND_END		0x1A
 
 #endif
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 13/20] drm/amdkfd: Kmap event page for dGPUs
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (11 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 12/20] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 14/20] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
                     ` (7 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

The events page must be accessible in user mode by the GPU and CPU
as well as in kernel mode by the CPU. On dGPUs user mode virtual
addresses are managed by the Thunk's GPU memory allocation code.
Therefore we can't allocate the memory in kernel mode like we do
on APUs. But KFD still needs to map the memory for kernel access.
To facilitate this, the Thunk provides the buffer handle of the
events page to KFD when creating the first event.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 56 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_events.c  | 31 ++++++++++++++++--
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  2 ++
 3 files changed, 87 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index a563ff2..ec0b2c0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -923,6 +923,58 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 	struct kfd_ioctl_create_event_args *args = data;
 	int err;
 
+	/* For dGPUs the event page is allocated in user mode. The
+	 * handle is passed to KFD with the first call to this IOCTL
+	 * through the event_page_offset field.
+	 */
+	if (args->event_page_offset) {
+		struct kfd_dev *kfd;
+		struct kfd_process_device *pdd;
+		void *mem, *kern_addr;
+		uint64_t size;
+
+		if (p->signal_page) {
+			pr_err("Event page is already set\n");
+			return -EINVAL;
+		}
+
+		kfd = kfd_device_by_id(GET_GPU_ID(args->event_page_offset));
+		if (!kfd) {
+			pr_err("Getting device by id failed in %s\n", __func__);
+			return -EINVAL;
+		}
+
+		mutex_lock(&p->mutex);
+		pdd = kfd_bind_process_to_device(kfd, p);
+		if (IS_ERR(pdd)) {
+			err = PTR_ERR(pdd);
+			goto out_unlock;
+		}
+
+		mem = kfd_process_device_translate_handle(pdd,
+				GET_IDR_HANDLE(args->event_page_offset));
+		if (!mem) {
+			pr_err("Can't find BO, offset is 0x%llx\n",
+			       args->event_page_offset);
+			err = -EINVAL;
+			goto out_unlock;
+		}
+		mutex_unlock(&p->mutex);
+
+		err = kfd->kfd2kgd->map_gtt_bo_to_kernel(kfd->kgd,
+						mem, &kern_addr, &size);
+		if (err) {
+			pr_err("Failed to map event page to kernel\n");
+			return err;
+		}
+
+		err = kfd_event_page_set(p, kern_addr, size);
+		if (err) {
+			pr_err("Failed to set event page\n");
+			return err;
+		}
+	}
+
 	err = kfd_event_create(filp, p, args->event_type,
 				args->auto_reset != 0, args->node_id,
 				&args->event_id, &args->event_trigger_data,
@@ -930,6 +982,10 @@ static int kfd_ioctl_create_event(struct file *filp, struct kfd_process *p,
 				&args->event_slot_index);
 
 	return err;
+
+out_unlock:
+	mutex_unlock(&p->mutex);
+	return err;
 }
 
 static int kfd_ioctl_destroy_event(struct file *filp, struct kfd_process *p,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
index 6fb9c0d..4890a90 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
@@ -52,6 +52,7 @@ struct kfd_event_waiter {
 struct kfd_signal_page {
 	uint64_t *kernel_address;
 	uint64_t __user *user_address;
+	bool need_to_free_pages;
 };
 
 
@@ -79,6 +80,7 @@ static struct kfd_signal_page *allocate_signal_page(struct kfd_process *p)
 	       KFD_SIGNAL_EVENT_LIMIT * 8);
 
 	page->kernel_address = backing_store;
+	page->need_to_free_pages = true;
 	pr_debug("Allocated new event signal page at %p, for process %p\n",
 			page, p);
 
@@ -269,8 +271,9 @@ static void shutdown_signal_page(struct kfd_process *p)
 	struct kfd_signal_page *page = p->signal_page;
 
 	if (page) {
-		free_pages((unsigned long)page->kernel_address,
-				get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
+		if (page->need_to_free_pages)
+			free_pages((unsigned long)page->kernel_address,
+				   get_order(KFD_SIGNAL_EVENT_LIMIT * 8));
 		kfree(page);
 	}
 }
@@ -292,6 +295,30 @@ static bool event_can_be_cpu_signaled(const struct kfd_event *ev)
 	return ev->type == KFD_EVENT_TYPE_SIGNAL;
 }
 
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size)
+{
+	struct kfd_signal_page *page;
+
+	if (p->signal_page)
+		return -EBUSY;
+
+	page = kzalloc(sizeof(*page), GFP_KERNEL);
+	if (!page)
+		return -ENOMEM;
+
+	/* Initialize all events to unsignaled */
+	memset(kernel_address, (uint8_t) UNSIGNALED_EVENT_SLOT,
+	       KFD_SIGNAL_EVENT_LIMIT * 8);
+
+	page->kernel_address = kernel_address;
+
+	p->signal_page = page;
+	p->signal_mapped_size = size;
+
+	return 0;
+}
+
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 1542807..aa93863 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -866,6 +866,8 @@ void kfd_signal_iommu_event(struct kfd_dev *dev,
 void kfd_signal_hw_exception_event(unsigned int pasid);
 int kfd_set_event(struct kfd_process *p, uint32_t event_id);
 int kfd_reset_event(struct kfd_process *p, uint32_t event_id);
+int kfd_event_page_set(struct kfd_process *p, void *kernel_address,
+		       uint64_t size);
 int kfd_event_create(struct file *devkfd, struct kfd_process *p,
 		     uint32_t event_type, bool auto_reset, uint32_t node_id,
 		     uint32_t *event_id, uint32_t *event_trigger_data,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 14/20] drm/amdkfd: Add module option for testing large-BAR functionality
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (12 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 13/20] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 15/20] drm/amdgpu: Add MMU notifier type for KFD userptr Felix Kuehling
                     ` (6 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

Simulate large-BAR system by exporting only visible memory. This
limits the amount of available VRAM to the size of the BAR, but
enables CPU access to VRAM.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_crat.c    | 3 +++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c  | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h    | 6 ++++++
 4 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index ec0b2c0..cd679cf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1151,6 +1151,11 @@ bool kfd_dev_is_large_bar(struct kfd_dev *dev)
 {
 	struct kfd_local_mem_info mem_info;
 
+	if (debug_largebar) {
+		pr_debug("Simulate large-bar allocation on non large-bar machine\n");
+		return true;
+	}
+
 	if (dev->device_info->needs_iommu_device)
 		return false;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
index 7493f47..3c6c4cdd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
@@ -1117,6 +1117,9 @@ static int kfd_create_vcrat_image_gpu(void *pcrat_image,
 	sub_type_hdr = (typeof(sub_type_hdr))((char *)sub_type_hdr +
 			sub_type_hdr->length);
 
+	if (debug_largebar)
+		local_mem_info.local_mem_size_private = 0;
+
 	if (local_mem_info.local_mem_size_private == 0)
 		ret = kfd_fill_gpu_memory_affinity(&avail_size,
 				kdev, HSA_MEM_HEAP_TYPE_FB_PUBLIC,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index 65574c6..b0acb06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -71,6 +71,11 @@ module_param(send_sigterm, int, 0444);
 MODULE_PARM_DESC(send_sigterm,
 	"Send sigterm to HSA process on unhandled exception (0 = disable, 1 = enable)");
 
+int debug_largebar;
+module_param(debug_largebar, int, 0444);
+MODULE_PARM_DESC(debug_largebar,
+	"Debug large-bar flag used to simulate large-bar capability on non-large bar machine (0 = disable, 1 = enable)");
+
 int ignore_crat;
 module_param(ignore_crat, int, 0444);
 MODULE_PARM_DESC(ignore_crat,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index aa93863..db27f9f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -105,6 +105,12 @@ extern int cwsr_enable;
 extern int send_sigterm;
 
 /*
+ * This kernel module is used to simulate large bar machine on non-large bar
+ * enabled machines.
+ */
+extern int debug_largebar;
+
+/*
  * Ignore CRAT table during KFD initialization, can be used to work around
  * broken CRAT tables on some AMD systems
  */
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 15/20] drm/amdgpu: Add MMU notifier type for KFD userptr
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (13 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 14/20] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 16/20] drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads Felix Kuehling
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This commit adds the notion of MMU notifier types GFX and HSA. GFX
continues to work like MMU notifiers did before. HSA adds support for
KFD userptr BOs. The implementation of KFD userptr eviction is a stub
for now.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c |  7 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c           |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c           | 94 ++++++++++++++++++++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h           | 11 ++-
 5 files changed, 97 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index c2c2bea..83e0c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -104,6 +104,7 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
 
+int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem, struct mm_struct *mm);
 int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
 				uint32_t vmid, uint64_t gpu_addr,
 				uint32_t *ib_cmd, uint32_t ib_len);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2f42c60..2d6f13a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1418,6 +1418,13 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
 	return ret;
 }
 
+int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem,
+				struct mm_struct *mm)
+{
+	/* TODO */
+	return 0;
+}
+
 /** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  *   KFD process identified by process_info
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index dc34b50..8e66f37 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -536,7 +536,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 	if (p->bo_list) {
 		amdgpu_bo_list_get_list(p->bo_list, &p->validated);
 		if (p->bo_list->first_userptr != p->bo_list->num_entries)
-			p->mn = amdgpu_mn_get(p->adev);
+			p->mn = amdgpu_mn_get(p->adev, AMDGPU_MN_TYPE_GFX);
 	}
 
 	INIT_LIST_HEAD(&duplicates);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index bd67f4c..f2ed18e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -36,12 +36,14 @@
 #include <drm/drm.h>
 
 #include "amdgpu.h"
+#include "amdgpu_amdkfd.h"
 
 struct amdgpu_mn {
 	/* constant after initialisation */
 	struct amdgpu_device	*adev;
 	struct mm_struct	*mm;
 	struct mmu_notifier	mn;
+	enum amdgpu_mn_type	type;
 
 	/* only used on destruction */
 	struct work_struct	work;
@@ -185,7 +187,7 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node,
 }
 
 /**
- * amdgpu_mn_invalidate_range_start - callback to notify about mm change
+ * amdgpu_mn_invalidate_range_start_gfx - callback to notify about mm change
  *
  * @mn: our notifier
  * @mn: the mm this callback is about
@@ -195,10 +197,10 @@ static void amdgpu_mn_invalidate_node(struct amdgpu_mn_node *node,
  * We block for all BOs between start and end to be idle and
  * unmap them by move them into system domain again.
  */
-static void amdgpu_mn_invalidate_range_start(struct mmu_notifier *mn,
-					     struct mm_struct *mm,
-					     unsigned long start,
-					     unsigned long end)
+static void amdgpu_mn_invalidate_range_start_gfx(struct mmu_notifier *mn,
+						 struct mm_struct *mm,
+						 unsigned long start,
+						 unsigned long end)
 {
 	struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
 	struct interval_tree_node *it;
@@ -220,6 +222,49 @@ static void amdgpu_mn_invalidate_range_start(struct mmu_notifier *mn,
 }
 
 /**
+ * amdgpu_mn_invalidate_range_start_hsa - callback to notify about mm change
+ *
+ * @mn: our notifier
+ * @mn: the mm this callback is about
+ * @start: start of updated range
+ * @end: end of updated range
+ *
+ * We temporarily evict all BOs between start and end. This
+ * necessitates evicting all user-mode queues of the process. The BOs
+ * are restorted in amdgpu_mn_invalidate_range_end_hsa.
+ */
+static void amdgpu_mn_invalidate_range_start_hsa(struct mmu_notifier *mn,
+						 struct mm_struct *mm,
+						 unsigned long start,
+						 unsigned long end)
+{
+	struct amdgpu_mn *rmn = container_of(mn, struct amdgpu_mn, mn);
+	struct interval_tree_node *it;
+
+	/* notification is exclusive, but interval is inclusive */
+	end -= 1;
+
+	amdgpu_mn_read_lock(rmn);
+
+	it = interval_tree_iter_first(&rmn->objects, start, end);
+	while (it) {
+		struct amdgpu_mn_node *node;
+		struct amdgpu_bo *bo;
+
+		node = container_of(it, struct amdgpu_mn_node, it);
+		it = interval_tree_iter_next(it, start, end);
+
+		list_for_each_entry(bo, &node->bos, mn_list) {
+			struct kgd_mem *mem = bo->kfd_bo;
+
+			if (amdgpu_ttm_tt_affect_userptr(bo->tbo.ttm,
+							 start, end))
+				amdgpu_amdkfd_evict_userptr(mem, mm);
+		}
+	}
+}
+
+/**
  * amdgpu_mn_invalidate_range_end - callback to notify about mm change
  *
  * @mn: our notifier
@@ -239,23 +284,39 @@ static void amdgpu_mn_invalidate_range_end(struct mmu_notifier *mn,
 	amdgpu_mn_read_unlock(rmn);
 }
 
-static const struct mmu_notifier_ops amdgpu_mn_ops = {
-	.release = amdgpu_mn_release,
-	.invalidate_range_start = amdgpu_mn_invalidate_range_start,
-	.invalidate_range_end = amdgpu_mn_invalidate_range_end,
+static const struct mmu_notifier_ops amdgpu_mn_ops[] = {
+	[AMDGPU_MN_TYPE_GFX] = {
+		.release = amdgpu_mn_release,
+		.invalidate_range_start = amdgpu_mn_invalidate_range_start_gfx,
+		.invalidate_range_end = amdgpu_mn_invalidate_range_end,
+	},
+	[AMDGPU_MN_TYPE_HSA] = {
+		.release = amdgpu_mn_release,
+		.invalidate_range_start = amdgpu_mn_invalidate_range_start_hsa,
+		.invalidate_range_end = amdgpu_mn_invalidate_range_end,
+	},
 };
 
+/* Low bits of any reasonable mm pointer will be unused due to struct
+ * alignment. Use these bits to make a unique key from the mm pointer
+ * and notifier type.
+ */
+#define AMDGPU_MN_KEY(mm, type) ((unsigned long)(mm) + (type))
+
 /**
  * amdgpu_mn_get - create notifier context
  *
  * @adev: amdgpu device pointer
+ * @type: type of MMU notifier context
  *
  * Creates a notifier context for current->mm.
  */
-struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
+struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev,
+				enum amdgpu_mn_type type)
 {
 	struct mm_struct *mm = current->mm;
 	struct amdgpu_mn *rmn;
+	unsigned long key = AMDGPU_MN_KEY(mm, type);
 	int r;
 
 	mutex_lock(&adev->mn_lock);
@@ -264,8 +325,8 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 		return ERR_PTR(-EINTR);
 	}
 
-	hash_for_each_possible(adev->mn_hash, rmn, node, (unsigned long)mm)
-		if (rmn->mm == mm)
+	hash_for_each_possible(adev->mn_hash, rmn, node, key)
+		if (AMDGPU_MN_KEY(rmn->mm, rmn->type) == key)
 			goto release_locks;
 
 	rmn = kzalloc(sizeof(*rmn), GFP_KERNEL);
@@ -276,8 +337,9 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 
 	rmn->adev = adev;
 	rmn->mm = mm;
-	rmn->mn.ops = &amdgpu_mn_ops;
 	init_rwsem(&rmn->lock);
+	rmn->type = type;
+	rmn->mn.ops = &amdgpu_mn_ops[type];
 	rmn->objects = RB_ROOT_CACHED;
 	mutex_init(&rmn->read_lock);
 	atomic_set(&rmn->recursion, 0);
@@ -286,7 +348,7 @@ struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 	if (r)
 		goto free_rmn;
 
-	hash_add(adev->mn_hash, &rmn->node, (unsigned long)mm);
+	hash_add(adev->mn_hash, &rmn->node, AMDGPU_MN_KEY(mm, type));
 
 release_locks:
 	up_write(&mm->mmap_sem);
@@ -315,12 +377,14 @@ int amdgpu_mn_register(struct amdgpu_bo *bo, unsigned long addr)
 {
 	unsigned long end = addr + amdgpu_bo_size(bo) - 1;
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+	enum amdgpu_mn_type type =
+		bo->kfd_bo ? AMDGPU_MN_TYPE_HSA : AMDGPU_MN_TYPE_GFX;
 	struct amdgpu_mn *rmn;
 	struct amdgpu_mn_node *node = NULL;
 	struct list_head bos;
 	struct interval_tree_node *it;
 
-	rmn = amdgpu_mn_get(adev);
+	rmn = amdgpu_mn_get(adev, type);
 	if (IS_ERR(rmn))
 		return PTR_ERR(rmn);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
index d0095a3..eb0f432 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h
@@ -29,16 +29,23 @@
  */
 struct amdgpu_mn;
 
+enum amdgpu_mn_type {
+	AMDGPU_MN_TYPE_GFX,
+	AMDGPU_MN_TYPE_HSA,
+};
+
 #if defined(CONFIG_MMU_NOTIFIER)
 void amdgpu_mn_lock(struct amdgpu_mn *mn);
 void amdgpu_mn_unlock(struct amdgpu_mn *mn);
-struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev);
+struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev,
+				enum amdgpu_mn_type type);
 int amdgpu_mn_register(struct amdgpu_bo *bo, unsigned long addr);
 void amdgpu_mn_unregister(struct amdgpu_bo *bo);
 #else
 static inline void amdgpu_mn_lock(struct amdgpu_mn *mn) {}
 static inline void amdgpu_mn_unlock(struct amdgpu_mn *mn) {}
-static inline struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
+static inline struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev,
+					      enum amdgpu_mn_type type)
 {
 	return NULL;
 }
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 16/20] drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (14 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 15/20] drm/amdgpu: Add MMU notifier type for KFD userptr Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 17/20] drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier Felix Kuehling
                     ` (4 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This commit allows amdgpu_ttm_tt_get_user_pages to work in a worker
thread rather than regular process context. This will be used when
KFD userptr BOs are restored after an MMU-notifier eviction.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c2fae04..f1b893c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -689,7 +689,7 @@ struct amdgpu_ttm_tt {
 	struct ttm_dma_tt	ttm;
 	u64			offset;
 	uint64_t		userptr;
-	struct mm_struct	*usermm;
+	struct task_struct	*usertask;
 	uint32_t		userflags;
 	spinlock_t              guptasklock;
 	struct list_head        guptasks;
@@ -700,14 +700,18 @@ struct amdgpu_ttm_tt {
 int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 {
 	struct amdgpu_ttm_tt *gtt = (void *)ttm;
+	struct mm_struct *mm = gtt->usertask->mm;
 	unsigned int flags = 0;
 	unsigned pinned = 0;
 	int r;
 
+	if (!mm) /* Happens during process shutdown */
+		return -ESRCH;
+
 	if (!(gtt->userflags & AMDGPU_GEM_USERPTR_READONLY))
 		flags |= FOLL_WRITE;
 
-	down_read(&current->mm->mmap_sem);
+	down_read(&mm->mmap_sem);
 
 	if (gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) {
 		/* check that we only use anonymous memory
@@ -715,9 +719,9 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 		unsigned long end = gtt->userptr + ttm->num_pages * PAGE_SIZE;
 		struct vm_area_struct *vma;
 
-		vma = find_vma(gtt->usermm, gtt->userptr);
+		vma = find_vma(mm, gtt->userptr);
 		if (!vma || vma->vm_file || vma->vm_end < end) {
-			up_read(&current->mm->mmap_sem);
+			up_read(&mm->mmap_sem);
 			return -EPERM;
 		}
 	}
@@ -733,7 +737,12 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 		list_add(&guptask.list, &gtt->guptasks);
 		spin_unlock(&gtt->guptasklock);
 
-		r = get_user_pages(userptr, num_pages, flags, p, NULL);
+		if (mm == current->mm)
+			r = get_user_pages(userptr, num_pages, flags, p, NULL);
+		else
+			r = get_user_pages_remote(gtt->usertask,
+					mm, userptr, num_pages,
+					flags, p, NULL, NULL);
 
 		spin_lock(&gtt->guptasklock);
 		list_del(&guptask.list);
@@ -746,12 +755,12 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 
 	} while (pinned < ttm->num_pages);
 
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return 0;
 
 release_pages:
 	release_pages(pages, pinned);
-	up_read(&current->mm->mmap_sem);
+	up_read(&mm->mmap_sem);
 	return r;
 }
 
@@ -1072,7 +1081,7 @@ int amdgpu_ttm_tt_set_userptr(struct ttm_tt *ttm, uint64_t addr,
 		return -EINVAL;
 
 	gtt->userptr = addr;
-	gtt->usermm = current->mm;
+	gtt->usertask = current->group_leader;
 	gtt->userflags = flags;
 	spin_lock_init(&gtt->guptasklock);
 	INIT_LIST_HEAD(&gtt->guptasks);
@@ -1089,7 +1098,10 @@ struct mm_struct *amdgpu_ttm_tt_get_usermm(struct ttm_tt *ttm)
 	if (gtt == NULL)
 		return NULL;
 
-	return gtt->usermm;
+	if (gtt->usertask == NULL)
+		return NULL;
+
+	return gtt->usertask->mm;
 }
 
 bool amdgpu_ttm_tt_affect_userptr(struct ttm_tt *ttm, unsigned long start,
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 17/20] drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (15 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 16/20] drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 18/20] drm/amdkfd: " Felix Kuehling
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

When an MMU notifier runs in memory reclaim context, it can deadlock
trying to take locks that are already held in the thread causing the
memory reclaim. The solution is to avoid memory reclaim while holding
locks that are taken in MMU notifiers by using GFP_NOIO.

This commit fixes kmalloc while holding rmn->lock. The GFX MMU
notifier also locks reservation objects. I have no good solution for
avoiding reclaim while holding reservation objects. The HSA MMU
notifier will not lock any reservation objects.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index f2ed18e..b4a5c03 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -402,7 +402,7 @@ int amdgpu_mn_register(struct amdgpu_bo *bo, unsigned long addr)
 	}
 
 	if (!node) {
-		node = kmalloc(sizeof(struct amdgpu_mn_node), GFP_KERNEL);
+		node = kmalloc(sizeof(struct amdgpu_mn_node), GFP_NOIO);
 		if (!node) {
 			up_write(&rmn->lock);
 			return -ENOMEM;
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 18/20] drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (16 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 17/20] drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 19/20] drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls Felix Kuehling
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

When an MMU notifier runs in memory reclaim context, it can deadlock
trying to take locks that are already held in the thread causing the
memory reclaim. The solution is to avoid memory reclaim while holding
locks that are taken in MMU notifiers by using GFP_NOIO.

This commit fixes memory allocations done while holding the dqm->lock
which is needed in the MMU notifier (dqm->ops.evict_process_queues).

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c          | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c  | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 334669996..0434f65 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -652,7 +652,7 @@ int kfd_gtt_sa_allocate(struct kfd_dev *kfd, unsigned int size,
 	if (size > kfd->gtt_sa_num_of_chunks * kfd->gtt_sa_chunk_size)
 		return -ENOMEM;
 
-	*mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_KERNEL);
+	*mem_obj = kmalloc(sizeof(struct kfd_mem_obj), GFP_NOIO);
 	if ((*mem_obj) == NULL)
 		return -ENOMEM;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
index c00c325..2bc49c6 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c
@@ -412,7 +412,7 @@ struct mqd_manager *mqd_manager_init_cik(enum KFD_MQD_TYPE type,
 	if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
 		return NULL;
 
-	mqd = kzalloc(sizeof(*mqd), GFP_KERNEL);
+	mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
 	if (!mqd)
 		return NULL;
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
index 89e4242..481307b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c
@@ -394,7 +394,7 @@ struct mqd_manager *mqd_manager_init_vi(enum KFD_MQD_TYPE type,
 	if (WARN_ON(type >= KFD_MQD_TYPE_MAX))
 		return NULL;
 
-	mqd = kzalloc(sizeof(*mqd), GFP_KERNEL);
+	mqd = kzalloc(sizeof(*mqd), GFP_NOIO);
 	if (!mqd)
 		return NULL;
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 19/20] drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (17 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 18/20] drm/amdkfd: " Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-15 21:27   ` [PATCH 20/20] drm/amdgpu: Add userptr support for KFD Felix Kuehling
  2018-03-19 16:39   ` [PATCH 00/20] Add KFD GPUVM support for dGPUs v4 Christian König
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

These interfaces allow KGD to stop and resume all GPU user mode queue
access to a process address space. This is needed for handling MMU
notifiers of userptrs mapped for GPU access in KFD VMs.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c         | 38 +++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdkfd/kfd_module.c         |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h           |  4 +++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c        | 10 +++----
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  6 ++++
 5 files changed, 55 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0434f65..7b57995 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -541,6 +541,44 @@ void kgd2kfd_interrupt(struct kfd_dev *kfd, const void *ih_ring_entry)
 	spin_unlock(&kfd->interrupt_lock);
 }
 
+int kgd2kfd_quiesce_mm(struct mm_struct *mm)
+{
+	struct kfd_process *p;
+	int r;
+
+	/* Because we are called from arbitrary context (workqueue) as opposed
+	 * to process context, kfd_process could attempt to exit while we are
+	 * running so the lookup function increments the process ref count.
+	 */
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return -ESRCH;
+
+	r = kfd_process_evict_queues(p);
+
+	kfd_unref_process(p);
+	return r;
+}
+
+int kgd2kfd_resume_mm(struct mm_struct *mm)
+{
+	struct kfd_process *p;
+	int r;
+
+	/* Because we are called from arbitrary context (workqueue) as opposed
+	 * to process context, kfd_process could attempt to exit while we are
+	 * running so the lookup function increments the process ref count.
+	 */
+	p = kfd_lookup_process_by_mm(mm);
+	if (!p)
+		return -ESRCH;
+
+	r = kfd_process_restore_queues(p);
+
+	kfd_unref_process(p);
+	return r;
+}
+
 /** kgd2kfd_schedule_evict_and_restore_process - Schedules work queue that will
  *   prepare for safe eviction of KFD BOs that belong to the specified
  *   process.
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_module.c b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
index b0acb06..d5caa4a5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_module.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_module.c
@@ -43,6 +43,8 @@ static const struct kgd2kfd_calls kgd2kfd = {
 	.interrupt	= kgd2kfd_interrupt,
 	.suspend	= kgd2kfd_suspend,
 	.resume		= kgd2kfd_resume,
+	.quiesce_mm	= kgd2kfd_quiesce_mm,
+	.resume_mm	= kgd2kfd_resume_mm,
 	.schedule_evict_and_restore_process =
 			  kgd2kfd_schedule_evict_and_restore_process,
 };
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index db27f9f..33c188c 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -512,6 +512,8 @@ struct qcm_process_device {
 /* Approx. time before evicting the process again */
 #define PROCESS_ACTIVE_TIME_MS 10
 
+int kgd2kfd_quiesce_mm(struct mm_struct *mm);
+int kgd2kfd_resume_mm(struct mm_struct *mm);
 int kgd2kfd_schedule_evict_and_restore_process(struct mm_struct *mm,
 					       struct dma_fence *fence);
 
@@ -681,6 +683,8 @@ struct kfd_process *kfd_get_process(const struct task_struct *);
 struct kfd_process *kfd_lookup_process_by_pasid(unsigned int pasid);
 struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm);
 void kfd_unref_process(struct kfd_process *p);
+int kfd_process_evict_queues(struct kfd_process *p);
+int kfd_process_restore_queues(struct kfd_process *p);
 void kfd_suspend_all_processes(void);
 int kfd_resume_all_processes(void);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index fbf58e9..42e5b41 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -786,7 +786,7 @@ struct kfd_process *kfd_lookup_process_by_mm(const struct mm_struct *mm)
  * Eviction is reference-counted per process-device. This means multiple
  * evictions from different sources can be nested safely.
  */
-static int process_evict_queues(struct kfd_process *p)
+int kfd_process_evict_queues(struct kfd_process *p)
 {
 	struct kfd_process_device *pdd;
 	int r = 0;
@@ -822,7 +822,7 @@ static int process_evict_queues(struct kfd_process *p)
 }
 
 /* process_restore_queues - Restore all user queues of a process */
-static  int process_restore_queues(struct kfd_process *p)
+int kfd_process_restore_queues(struct kfd_process *p)
 {
 	struct kfd_process_device *pdd;
 	int r, ret = 0;
@@ -864,7 +864,7 @@ static void evict_process_worker(struct work_struct *work)
 	flush_delayed_work(&p->restore_work);
 
 	pr_debug("Started evicting pasid %d\n", p->pasid);
-	ret = process_evict_queues(p);
+	ret = kfd_process_evict_queues(p);
 	if (!ret) {
 		dma_fence_signal(p->ef);
 		dma_fence_put(p->ef);
@@ -924,7 +924,7 @@ static void restore_process_worker(struct work_struct *work)
 		return;
 	}
 
-	ret = process_restore_queues(p);
+	ret = kfd_process_restore_queues(p);
 	if (!ret)
 		pr_debug("Finished restoring pasid %d\n", p->pasid);
 	else
@@ -941,7 +941,7 @@ void kfd_suspend_all_processes(void)
 		cancel_delayed_work_sync(&p->eviction_work);
 		cancel_delayed_work_sync(&p->restore_work);
 
-		if (process_evict_queues(p))
+		if (kfd_process_evict_queues(p))
 			pr_err("Failed to suspend process %d\n", p->pasid);
 		dma_fence_signal(p->ef);
 		dma_fence_put(p->ef);
diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 237289a..286cfe7 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -382,6 +382,10 @@ struct kfd2kgd_calls {
  *
  * @resume: Notifies amdkfd about a resume action done to a kgd device
  *
+ * @quiesce_mm: Quiesce all user queue access to specified MM address space
+ *
+ * @resume_mm: Resume user queue access to specified MM address space
+ *
  * @schedule_evict_and_restore_process: Schedules work queue that will prepare
  * for safe eviction of KFD BOs that belong to the specified process.
  *
@@ -399,6 +403,8 @@ struct kgd2kfd_calls {
 	void (*interrupt)(struct kfd_dev *kfd, const void *ih_ring_entry);
 	void (*suspend)(struct kfd_dev *kfd);
 	int (*resume)(struct kfd_dev *kfd);
+	int (*quiesce_mm)(struct mm_struct *mm);
+	int (*resume_mm)(struct mm_struct *mm);
 	int (*schedule_evict_and_restore_process)(struct mm_struct *mm,
 			struct dma_fence *fence);
 };
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 20/20] drm/amdgpu: Add userptr support for KFD
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (18 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 19/20] drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls Felix Kuehling
@ 2018-03-15 21:27   ` Felix Kuehling
  2018-03-19 16:39   ` [PATCH 00/20] Add KFD GPUVM support for dGPUs v4 Christian König
  20 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-15 21:27 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w
  Cc: Felix Kuehling

This adds support for allocating, mapping, unmapping and freeing
userptr BOs, and for handling MMU notifiers.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       |  11 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 566 ++++++++++++++++++++++-
 2 files changed, 553 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 83e0c5c..c3024b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -28,6 +28,7 @@
 #include <linux/types.h>
 #include <linux/mm.h>
 #include <linux/mmu_context.h>
+#include <linux/workqueue.h>
 #include <kgd_kfd_interface.h>
 #include <drm/ttm/ttm_execbuf_util.h>
 #include "amdgpu_sync.h"
@@ -59,7 +60,9 @@ struct kgd_mem {
 
 	uint32_t mapping_flags;
 
+	atomic_t invalid;
 	struct amdkfd_process_info *process_info;
+	struct page **user_pages;
 
 	struct amdgpu_sync sync;
 
@@ -84,6 +87,9 @@ struct amdkfd_process_info {
 	struct list_head vm_list_head;
 	/* List head for all KFD BOs that belong to a KFD process. */
 	struct list_head kfd_bo_list;
+	/* List of userptr BOs that are valid or invalid */
+	struct list_head userptr_valid_list;
+	struct list_head userptr_inval_list;
 	/* Lock to protect kfd_bo_list */
 	struct mutex lock;
 
@@ -91,6 +97,11 @@ struct amdkfd_process_info {
 	unsigned int n_vms;
 	/* Eviction Fence */
 	struct amdgpu_amdkfd_fence *eviction_fence;
+
+	/* MMU-notifier related fields */
+	atomic_t evicted_bos;
+	struct delayed_work restore_userptr_work;
+	struct pid *pid;
 };
 
 int amdgpu_amdkfd_init(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6f13a..5057e64 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -23,6 +23,7 @@
 #define pr_fmt(fmt) "kfd2kgd: " fmt
 
 #include <linux/list.h>
+#include <linux/sched/mm.h>
 #include <drm/drmP.h>
 #include "amdgpu_object.h"
 #include "amdgpu_vm.h"
@@ -33,10 +34,20 @@
  */
 #define VI_BO_SIZE_ALIGN (0x8000)
 
+/* BO flag to indicate a KFD userptr BO */
+#define AMDGPU_AMDKFD_USERPTR_BO (1ULL << 63)
+
+/* Userptr restore delay, just long enough to allow consecutive VM
+ * changes to accumulate
+ */
+#define AMDGPU_USERPTR_RESTORE_DELAY_MS 1
+
 /* Impose limit on how much memory KFD can use */
 static struct {
 	uint64_t max_system_mem_limit;
+	uint64_t max_userptr_mem_limit;
 	int64_t system_mem_used;
+	int64_t userptr_mem_used;
 	spinlock_t mem_limit_lock;
 } kfd_mem_limit;
 
@@ -57,6 +68,7 @@ static const char * const domain_bit_to_string[] = {
 
 #define domain_string(domain) domain_bit_to_string[ffs(domain)-1]
 
+static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work);
 
 
 static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
@@ -90,8 +102,10 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
 
 	spin_lock_init(&kfd_mem_limit.mem_limit_lock);
 	kfd_mem_limit.max_system_mem_limit = (mem >> 1) - (mem >> 3);
-	pr_debug("Kernel memory limit %lluM\n",
-		(kfd_mem_limit.max_system_mem_limit >> 20));
+	kfd_mem_limit.max_userptr_mem_limit = mem - (mem >> 2);
+	pr_debug("Kernel memory limit %lluM, userptr limit %lluM\n",
+		(kfd_mem_limit.max_system_mem_limit >> 20),
+		(kfd_mem_limit.max_userptr_mem_limit >> 20));
 }
 
 static int amdgpu_amdkfd_reserve_system_mem_limit(struct amdgpu_device *adev,
@@ -111,6 +125,16 @@ static int amdgpu_amdkfd_reserve_system_mem_limit(struct amdgpu_device *adev,
 			goto err_no_mem;
 		}
 		kfd_mem_limit.system_mem_used += (acc_size + size);
+	} else if (domain == AMDGPU_GEM_DOMAIN_CPU) {
+		if ((kfd_mem_limit.system_mem_used + acc_size >
+			kfd_mem_limit.max_system_mem_limit) ||
+			(kfd_mem_limit.userptr_mem_used + (size + acc_size) >
+			kfd_mem_limit.max_userptr_mem_limit)) {
+			ret = -ENOMEM;
+			goto err_no_mem;
+		}
+		kfd_mem_limit.system_mem_used += acc_size;
+		kfd_mem_limit.userptr_mem_used += size;
 	}
 err_no_mem:
 	spin_unlock(&kfd_mem_limit.mem_limit_lock);
@@ -126,10 +150,16 @@ static void unreserve_system_mem_limit(struct amdgpu_device *adev,
 				       sizeof(struct amdgpu_bo));
 
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
-	if (domain == AMDGPU_GEM_DOMAIN_GTT)
+	if (domain == AMDGPU_GEM_DOMAIN_GTT) {
 		kfd_mem_limit.system_mem_used -= (acc_size + size);
+	} else if (domain == AMDGPU_GEM_DOMAIN_CPU) {
+		kfd_mem_limit.system_mem_used -= acc_size;
+		kfd_mem_limit.userptr_mem_used -= size;
+	}
 	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
 		  "kfd system memory accounting unbalanced");
+	WARN_ONCE(kfd_mem_limit.userptr_mem_used < 0,
+		  "kfd userptr memory accounting unbalanced");
 
 	spin_unlock(&kfd_mem_limit.mem_limit_lock);
 }
@@ -138,12 +168,17 @@ void amdgpu_amdkfd_unreserve_system_memory_limit(struct amdgpu_bo *bo)
 {
 	spin_lock(&kfd_mem_limit.mem_limit_lock);
 
-	if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT) {
+	if (bo->flags & AMDGPU_AMDKFD_USERPTR_BO) {
+		kfd_mem_limit.system_mem_used -= bo->tbo.acc_size;
+		kfd_mem_limit.userptr_mem_used -= amdgpu_bo_size(bo);
+	} else if (bo->preferred_domains == AMDGPU_GEM_DOMAIN_GTT) {
 		kfd_mem_limit.system_mem_used -=
 			(bo->tbo.acc_size + amdgpu_bo_size(bo));
 	}
 	WARN_ONCE(kfd_mem_limit.system_mem_used < 0,
 		  "kfd system memory accounting unbalanced");
+	WARN_ONCE(kfd_mem_limit.userptr_mem_used < 0,
+		  "kfd userptr memory accounting unbalanced");
 
 	spin_unlock(&kfd_mem_limit.mem_limit_lock);
 }
@@ -506,7 +541,8 @@ static void remove_bo_from_vm(struct amdgpu_device *adev,
 }
 
 static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
-				struct amdkfd_process_info *process_info)
+				struct amdkfd_process_info *process_info,
+				bool userptr)
 {
 	struct ttm_validate_buffer *entry = &mem->validate_list;
 	struct amdgpu_bo *bo = mem->bo;
@@ -515,8 +551,93 @@ static void add_kgd_mem_to_kfd_bo_list(struct kgd_mem *mem,
 	entry->shared = true;
 	entry->bo = &bo->tbo;
 	mutex_lock(&process_info->lock);
-	list_add_tail(&entry->head, &process_info->kfd_bo_list);
+	if (userptr)
+		list_add_tail(&entry->head, &process_info->userptr_valid_list);
+	else
+		list_add_tail(&entry->head, &process_info->kfd_bo_list);
+	mutex_unlock(&process_info->lock);
+}
+
+/* Initializes user pages. It registers the MMU notifier and validates
+ * the userptr BO in the GTT domain.
+ *
+ * The BO must already be on the userptr_valid_list. Otherwise an
+ * eviction and restore may happen that leaves the new BO unmapped
+ * with the user mode queues running.
+ *
+ * Takes the process_info->lock to protect against concurrent restore
+ * workers.
+ *
+ * Returns 0 for success, negative errno for errors.
+ */
+static int init_user_pages(struct kgd_mem *mem, struct mm_struct *mm,
+			   uint64_t user_addr)
+{
+	struct amdkfd_process_info *process_info = mem->process_info;
+	struct amdgpu_bo *bo = mem->bo;
+	struct ttm_operation_ctx ctx = { true, false };
+	int ret = 0;
+
+	mutex_lock(&process_info->lock);
+
+	ret = amdgpu_ttm_tt_set_userptr(bo->tbo.ttm, user_addr, 0);
+	if (ret) {
+		pr_err("%s: Failed to set userptr: %d\n", __func__, ret);
+		goto out;
+	}
+
+	ret = amdgpu_mn_register(bo, user_addr);
+	if (ret) {
+		pr_err("%s: Failed to register MMU notifier: %d\n",
+		       __func__, ret);
+		goto out;
+	}
+
+	/* If no restore worker is running concurrently, user_pages
+	 * should not be allocated
+	 */
+	WARN(mem->user_pages, "Leaking user_pages array");
+
+	mem->user_pages = kvmalloc_array(bo->tbo.ttm->num_pages,
+					   sizeof(struct page *),
+					   GFP_KERNEL | __GFP_ZERO);
+	if (!mem->user_pages) {
+		pr_err("%s: Failed to allocate pages array\n", __func__);
+		ret = -ENOMEM;
+		goto unregister_out;
+	}
+
+	ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm, mem->user_pages);
+	if (ret) {
+		pr_err("%s: Failed to get user pages: %d\n", __func__, ret);
+		goto free_out;
+	}
+
+	amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm, mem->user_pages);
+
+	ret = amdgpu_bo_reserve(bo, true);
+	if (ret) {
+		pr_err("%s: Failed to reserve BO\n", __func__);
+		goto release_out;
+	}
+	amdgpu_ttm_placement_from_domain(bo, mem->domain);
+	ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+	if (ret)
+		pr_err("%s: failed to validate BO\n", __func__);
+	amdgpu_bo_unreserve(bo);
+
+release_out:
+	if (ret)
+		release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
+free_out:
+	kvfree(mem->user_pages);
+	mem->user_pages = NULL;
+unregister_out:
+	if (ret)
+		amdgpu_mn_unregister(bo);
+out:
 	mutex_unlock(&process_info->lock);
+	return ret;
 }
 
 /* Reserving a BO and its page table BOs must happen atomically to
@@ -748,7 +869,8 @@ static int update_gpuvm_pte(struct amdgpu_device *adev,
 }
 
 static int map_bo_to_gpuvm(struct amdgpu_device *adev,
-		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync)
+		struct kfd_bo_va_list *entry, struct amdgpu_sync *sync,
+		bool no_update_pte)
 {
 	int ret;
 
@@ -762,6 +884,9 @@ static int map_bo_to_gpuvm(struct amdgpu_device *adev,
 		return ret;
 	}
 
+	if (no_update_pte)
+		return 0;
+
 	ret = update_gpuvm_pte(adev, entry, sync);
 	if (ret) {
 		pr_err("update_gpuvm_pte() failed\n");
@@ -820,6 +945,8 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 		mutex_init(&info->lock);
 		INIT_LIST_HEAD(&info->vm_list_head);
 		INIT_LIST_HEAD(&info->kfd_bo_list);
+		INIT_LIST_HEAD(&info->userptr_valid_list);
+		INIT_LIST_HEAD(&info->userptr_inval_list);
 
 		info->eviction_fence =
 			amdgpu_amdkfd_fence_create(dma_fence_context_alloc(1),
@@ -830,6 +957,11 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 			goto create_evict_fence_fail;
 		}
 
+		info->pid = get_task_pid(current->group_leader, PIDTYPE_PID);
+		atomic_set(&info->evicted_bos, 0);
+		INIT_DELAYED_WORK(&info->restore_userptr_work,
+				  amdgpu_amdkfd_restore_userptr_worker);
+
 		*process_info = info;
 		*ef = dma_fence_get(&info->eviction_fence->base);
 	}
@@ -872,6 +1004,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void **process_info,
 		dma_fence_put(*ef);
 		*ef = NULL;
 		*process_info = NULL;
+		put_pid(info->pid);
 create_evict_fence_fail:
 		mutex_destroy(&info->lock);
 		kfree(info);
@@ -967,8 +1100,12 @@ void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
 	/* Release per-process resources when last compute VM is destroyed */
 	if (!process_info->n_vms) {
 		WARN_ON(!list_empty(&process_info->kfd_bo_list));
+		WARN_ON(!list_empty(&process_info->userptr_valid_list));
+		WARN_ON(!list_empty(&process_info->userptr_inval_list));
 
 		dma_fence_put(&process_info->eviction_fence->base);
+		cancel_delayed_work_sync(&process_info->restore_userptr_work);
+		put_pid(process_info->pid);
 		mutex_destroy(&process_info->lock);
 		kfree(process_info);
 	}
@@ -1003,9 +1140,10 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 {
 	struct amdgpu_device *adev = get_amdgpu_device(kgd);
 	struct amdgpu_vm *avm = (struct amdgpu_vm *)vm;
+	uint64_t user_addr = 0;
 	struct amdgpu_bo *bo;
 	int byte_align;
-	u32 alloc_domain;
+	u32 domain, alloc_domain;
 	u64 alloc_flags;
 	uint32_t mapping_flags;
 	int ret;
@@ -1014,14 +1152,21 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	 * Check on which domain to allocate BO
 	 */
 	if (flags & ALLOC_MEM_FLAGS_VRAM) {
-		alloc_domain = AMDGPU_GEM_DOMAIN_VRAM;
+		domain = alloc_domain = AMDGPU_GEM_DOMAIN_VRAM;
 		alloc_flags = AMDGPU_GEM_CREATE_VRAM_CLEARED;
 		alloc_flags |= (flags & ALLOC_MEM_FLAGS_PUBLIC) ?
 			AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED :
 			AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
 	} else if (flags & ALLOC_MEM_FLAGS_GTT) {
-		alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
+		domain = alloc_domain = AMDGPU_GEM_DOMAIN_GTT;
 		alloc_flags = 0;
+	} else if (flags & ALLOC_MEM_FLAGS_USERPTR) {
+		domain = AMDGPU_GEM_DOMAIN_GTT;
+		alloc_domain = AMDGPU_GEM_DOMAIN_CPU;
+		alloc_flags = 0;
+		if (!offset || !*offset)
+			return -EINVAL;
+		user_addr = *offset;
 	} else {
 		return -EINVAL;
 	}
@@ -1078,18 +1223,34 @@ int amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu(
 	}
 	bo->kfd_bo = *mem;
 	(*mem)->bo = bo;
+	if (user_addr)
+		bo->flags |= AMDGPU_AMDKFD_USERPTR_BO;
 
 	(*mem)->va = va;
-	(*mem)->domain = alloc_domain;
+	(*mem)->domain = domain;
 	(*mem)->mapped_to_gpu_memory = 0;
 	(*mem)->process_info = avm->process_info;
-	add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info);
+	add_kgd_mem_to_kfd_bo_list(*mem, avm->process_info, user_addr);
+
+	if (user_addr) {
+		ret = init_user_pages(*mem, current->mm, user_addr);
+		if (ret) {
+			mutex_lock(&avm->process_info->lock);
+			list_del(&(*mem)->validate_list.head);
+			mutex_unlock(&avm->process_info->lock);
+			goto allocate_init_user_pages_failed;
+		}
+	}
 
 	if (offset)
 		*offset = amdgpu_bo_mmap_offset(bo);
 
 	return 0;
 
+allocate_init_user_pages_failed:
+	amdgpu_bo_unref(&bo);
+	/* Don't unreserve system mem limit twice */
+	goto err_reserve_system_mem;
 err_bo_create:
 	unreserve_system_mem_limit(adev, size, alloc_domain);
 err_reserve_system_mem:
@@ -1122,12 +1283,24 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
 	 * be freed anyway
 	 */
 
+	/* No more MMU notifiers */
+	amdgpu_mn_unregister(mem->bo);
+
 	/* Make sure restore workers don't access the BO any more */
 	bo_list_entry = &mem->validate_list;
 	mutex_lock(&process_info->lock);
 	list_del(&bo_list_entry->head);
 	mutex_unlock(&process_info->lock);
 
+	/* Free user pages if necessary */
+	if (mem->user_pages) {
+		pr_debug("%s: Freeing user_pages array\n", __func__);
+		if (mem->user_pages[0])
+			release_pages(mem->user_pages,
+					mem->bo->tbo.ttm->num_pages);
+		kvfree(mem->user_pages);
+	}
+
 	ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
 	if (unlikely(ret))
 		return ret;
@@ -1173,21 +1346,32 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	struct kfd_bo_va_list *bo_va_entry = NULL;
 	struct kfd_bo_va_list *bo_va_entry_aql = NULL;
 	unsigned long bo_size;
-
-	/* Make sure restore is not running concurrently.
-	 */
-	mutex_lock(&mem->process_info->lock);
-
-	mutex_lock(&mem->lock);
+	bool is_invalid_userptr = false;
 
 	bo = mem->bo;
-
 	if (!bo) {
 		pr_err("Invalid BO when mapping memory to GPU\n");
-		ret = -EINVAL;
-		goto out;
+		return -EINVAL;
+	}
+
+	/* Make sure restore is not running concurrently. Since we
+	 * don't map invalid userptr BOs, we rely on the next restore
+	 * worker to do the mapping
+	 */
+	mutex_lock(&mem->process_info->lock);
+
+	/* Lock mmap-sem. If we find an invalid userptr BO, we can be
+	 * sure that the MMU notifier is no longer running
+	 * concurrently and the queues are actually stopped
+	 */
+	if (amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
+		down_write(&current->mm->mmap_sem);
+		is_invalid_userptr = atomic_read(&mem->invalid);
+		up_write(&current->mm->mmap_sem);
 	}
 
+	mutex_lock(&mem->lock);
+
 	domain = mem->domain;
 	bo_size = bo->tbo.mem.size;
 
@@ -1200,6 +1384,14 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 	if (unlikely(ret))
 		goto out;
 
+	/* Userptr can be marked as "not invalid", but not actually be
+	 * validated yet (still in the system domain). In that case
+	 * the queues are still stopped and we can leave mapping for
+	 * the next restore worker
+	 */
+	if (bo->tbo.mem.mem_type == TTM_PL_SYSTEM)
+		is_invalid_userptr = true;
+
 	if (check_if_add_bo_to_vm(avm, mem)) {
 		ret = add_bo_to_vm(adev, mem, avm, false,
 				&bo_va_entry);
@@ -1217,7 +1409,8 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 			goto add_bo_to_vm_failed;
 	}
 
-	if (mem->mapped_to_gpu_memory == 0) {
+	if (mem->mapped_to_gpu_memory == 0 &&
+	    !amdgpu_ttm_tt_get_usermm(bo->tbo.ttm)) {
 		/* Validate BO only once. The eviction fence gets added to BO
 		 * the first time it is mapped. Validate will wait for all
 		 * background evictions to complete.
@@ -1235,7 +1428,8 @@ int amdgpu_amdkfd_gpuvm_map_memory_to_gpu(
 					entry->va, entry->va + bo_size,
 					entry);
 
-			ret = map_bo_to_gpuvm(adev, entry, ctx.sync);
+			ret = map_bo_to_gpuvm(adev, entry, ctx.sync,
+					      is_invalid_userptr);
 			if (ret) {
 				pr_err("Failed to map radeon bo to gpuvm\n");
 				goto map_bo_to_gpuvm_failed;
@@ -1418,13 +1612,337 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
 	return ret;
 }
 
+/* Evict a userptr BO by stopping the queues if necessary
+ *
+ * Runs in MMU notifier, may be in RECLAIM_FS context. This means it
+ * cannot do any memory allocations, and cannot take any locks that
+ * are held elsewhere while allocating memory. Therefore this is as
+ * simple as possible, using atomic counters.
+ *
+ * It doesn't do anything to the BO itself. The real work happens in
+ * restore, where we get updated page addresses. This function only
+ * ensures that GPU access to the BO is stopped.
+ */
 int amdgpu_amdkfd_evict_userptr(struct kgd_mem *mem,
 				struct mm_struct *mm)
 {
-	/* TODO */
+	struct amdkfd_process_info *process_info = mem->process_info;
+	int invalid, evicted_bos;
+	int r = 0;
+
+	invalid = atomic_inc_return(&mem->invalid);
+	evicted_bos = atomic_inc_return(&process_info->evicted_bos);
+	if (evicted_bos == 1) {
+		/* First eviction, stop the queues */
+		r = kgd2kfd->quiesce_mm(mm);
+		if (r)
+			pr_err("Failed to quiesce KFD\n");
+		schedule_delayed_work(&process_info->restore_userptr_work,
+			msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+	}
+
+	return r;
+}
+
+/* Update invalid userptr BOs
+ *
+ * Moves invalidated (evicted) userptr BOs from userptr_valid_list to
+ * userptr_inval_list and updates user pages for all BOs that have
+ * been invalidated since their last update.
+ */
+static int update_invalid_user_pages(struct amdkfd_process_info *process_info,
+				     struct mm_struct *mm)
+{
+	struct kgd_mem *mem, *tmp_mem;
+	struct amdgpu_bo *bo;
+	struct ttm_operation_ctx ctx = { false, false };
+	int invalid, ret;
+
+	/* Move all invalidated BOs to the userptr_inval_list and
+	 * release their user pages by migration to the CPU domain
+	 */
+	list_for_each_entry_safe(mem, tmp_mem,
+				 &process_info->userptr_valid_list,
+				 validate_list.head) {
+		if (!atomic_read(&mem->invalid))
+			continue; /* BO is still valid */
+
+		bo = mem->bo;
+
+		if (amdgpu_bo_reserve(bo, true))
+			return -EAGAIN;
+		amdgpu_ttm_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
+		ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+		amdgpu_bo_unreserve(bo);
+		if (ret) {
+			pr_err("%s: Failed to invalidate userptr BO\n",
+			       __func__);
+			return -EAGAIN;
+		}
+
+		list_move_tail(&mem->validate_list.head,
+			       &process_info->userptr_inval_list);
+	}
+
+	if (list_empty(&process_info->userptr_inval_list))
+		return 0; /* All evicted userptr BOs were freed */
+
+	/* Go through userptr_inval_list and update any invalid user_pages */
+	list_for_each_entry(mem, &process_info->userptr_inval_list,
+			    validate_list.head) {
+		invalid = atomic_read(&mem->invalid);
+		if (!invalid)
+			/* BO hasn't been invalidated since the last
+			 * revalidation attempt. Keep its BO list.
+			 */
+			continue;
+
+		bo = mem->bo;
+
+		if (!mem->user_pages) {
+			mem->user_pages =
+				kvmalloc_array(bo->tbo.ttm->num_pages,
+						 sizeof(struct page *),
+						 GFP_KERNEL | __GFP_ZERO);
+			if (!mem->user_pages) {
+				pr_err("%s: Failed to allocate pages array\n",
+				       __func__);
+				return -ENOMEM;
+			}
+		} else if (mem->user_pages[0]) {
+			release_pages(mem->user_pages, bo->tbo.ttm->num_pages);
+		}
+
+		/* Get updated user pages */
+		ret = amdgpu_ttm_tt_get_user_pages(bo->tbo.ttm,
+						   mem->user_pages);
+		if (ret) {
+			mem->user_pages[0] = NULL;
+			pr_info("%s: Failed to get user pages: %d\n",
+				__func__, ret);
+			/* Pretend it succeeded. It will fail later
+			 * with a VM fault if the GPU tries to access
+			 * it. Better than hanging indefinitely with
+			 * stalled user mode queues.
+			 */
+		}
+
+		/* Mark the BO as valid unless it was invalidated
+		 * again concurrently
+		 */
+		if (atomic_cmpxchg(&mem->invalid, invalid, 0) != invalid)
+			return -EAGAIN;
+	}
+
 	return 0;
 }
 
+/* Validate invalid userptr BOs
+ *
+ * Validates BOs on the userptr_inval_list, and moves them back to the
+ * userptr_valid_list. Also updates GPUVM page tables with new page
+ * addresses and waits for the page table updates to complete.
+ */
+static int validate_invalid_user_pages(struct amdkfd_process_info *process_info)
+{
+	struct amdgpu_bo_list_entry *pd_bo_list_entries;
+	struct list_head resv_list, duplicates;
+	struct ww_acquire_ctx ticket;
+	struct amdgpu_sync sync;
+
+	struct amdgpu_vm *peer_vm;
+	struct kgd_mem *mem, *tmp_mem;
+	struct amdgpu_bo *bo;
+	struct ttm_operation_ctx ctx = { false, false };
+	int i, ret;
+
+	pd_bo_list_entries = kcalloc(process_info->n_vms,
+				     sizeof(struct amdgpu_bo_list_entry),
+				     GFP_KERNEL);
+	if (!pd_bo_list_entries) {
+		pr_err("%s: Failed to allocate PD BO list entries\n", __func__);
+		return -ENOMEM;
+	}
+
+	INIT_LIST_HEAD(&resv_list);
+	INIT_LIST_HEAD(&duplicates);
+
+	/* Get all the page directory BOs that need to be reserved */
+	i = 0;
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node)
+		amdgpu_vm_get_pd_bo(peer_vm, &resv_list,
+				    &pd_bo_list_entries[i++]);
+	/* Add the userptr_inval_list entries to resv_list */
+	list_for_each_entry(mem, &process_info->userptr_inval_list,
+			    validate_list.head) {
+		list_add_tail(&mem->resv_list.head, &resv_list);
+		mem->resv_list.bo = mem->validate_list.bo;
+		mem->resv_list.shared = mem->validate_list.shared;
+	}
+
+	/* Reserve all BOs and page tables for validation */
+	ret = ttm_eu_reserve_buffers(&ticket, &resv_list, false, &duplicates);
+	WARN(!list_empty(&duplicates), "Duplicates should be empty");
+	if (ret)
+		goto out;
+
+	amdgpu_sync_create(&sync);
+
+	/* Avoid triggering eviction fences when unmapping invalid
+	 * userptr BOs (waits for all fences, doesn't use
+	 * FENCE_OWNER_VM)
+	 */
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node)
+		amdgpu_amdkfd_remove_eviction_fence(peer_vm->root.base.bo,
+						process_info->eviction_fence,
+						NULL, NULL);
+
+	ret = process_validate_vms(process_info);
+	if (ret)
+		goto unreserve_out;
+
+	/* Validate BOs and update GPUVM page tables */
+	list_for_each_entry_safe(mem, tmp_mem,
+				 &process_info->userptr_inval_list,
+				 validate_list.head) {
+		struct kfd_bo_va_list *bo_va_entry;
+
+		bo = mem->bo;
+
+		/* Copy pages array and validate the BO if we got user pages */
+		if (mem->user_pages[0]) {
+			amdgpu_ttm_tt_set_user_pages(bo->tbo.ttm,
+						     mem->user_pages);
+			amdgpu_ttm_placement_from_domain(bo, mem->domain);
+			ret = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
+			if (ret) {
+				pr_err("%s: failed to validate BO\n", __func__);
+				goto unreserve_out;
+			}
+		}
+
+		/* Validate succeeded, now the BO owns the pages, free
+		 * our copy of the pointer array. Put this BO back on
+		 * the userptr_valid_list. If we need to revalidate
+		 * it, we need to start from scratch.
+		 */
+		kvfree(mem->user_pages);
+		mem->user_pages = NULL;
+		list_move_tail(&mem->validate_list.head,
+			       &process_info->userptr_valid_list);
+
+		/* Update mapping. If the BO was not validated
+		 * (because we couldn't get user pages), this will
+		 * clear the page table entries, which will result in
+		 * VM faults if the GPU tries to access the invalid
+		 * memory.
+		 */
+		list_for_each_entry(bo_va_entry, &mem->bo_va_list, bo_list) {
+			if (!bo_va_entry->is_mapped)
+				continue;
+
+			ret = update_gpuvm_pte((struct amdgpu_device *)
+					       bo_va_entry->kgd_dev,
+					       bo_va_entry, &sync);
+			if (ret) {
+				pr_err("%s: update PTE failed\n", __func__);
+				/* make sure this gets validated again */
+				atomic_inc(&mem->invalid);
+				goto unreserve_out;
+			}
+		}
+	}
+
+	/* Update page directories */
+	ret = process_update_pds(process_info, &sync);
+
+unreserve_out:
+	list_for_each_entry(peer_vm, &process_info->vm_list_head,
+			    vm_list_node)
+		amdgpu_bo_fence(peer_vm->root.base.bo,
+				&process_info->eviction_fence->base, true);
+	ttm_eu_backoff_reservation(&ticket, &resv_list);
+	amdgpu_sync_wait(&sync, false);
+	amdgpu_sync_free(&sync);
+out:
+	kfree(pd_bo_list_entries);
+
+	return ret;
+}
+
+/* Worker callback to restore evicted userptr BOs
+ *
+ * Tries to update and validate all userptr BOs. If successful and no
+ * concurrent evictions happened, the queues are restarted. Otherwise,
+ * reschedule for another attempt later.
+ */
+static void amdgpu_amdkfd_restore_userptr_worker(struct work_struct *work)
+{
+	struct delayed_work *dwork = to_delayed_work(work);
+	struct amdkfd_process_info *process_info =
+		container_of(dwork, struct amdkfd_process_info,
+			     restore_userptr_work);
+	struct task_struct *usertask;
+	struct mm_struct *mm;
+	int evicted_bos;
+
+	evicted_bos = atomic_read(&process_info->evicted_bos);
+	if (!evicted_bos)
+		return;
+
+	/* Reference task and mm in case of concurrent process termination */
+	usertask = get_pid_task(process_info->pid, PIDTYPE_PID);
+	if (!usertask)
+		return;
+	mm = get_task_mm(usertask);
+	if (!mm) {
+		put_task_struct(usertask);
+		return;
+	}
+
+	mutex_lock(&process_info->lock);
+
+	if (update_invalid_user_pages(process_info, mm))
+		goto unlock_out;
+	/* userptr_inval_list can be empty if all evicted userptr BOs
+	 * have been freed. In that case there is nothing to validate
+	 * and we can just restart the queues.
+	 */
+	if (!list_empty(&process_info->userptr_inval_list)) {
+		if (atomic_read(&process_info->evicted_bos) != evicted_bos)
+			goto unlock_out; /* Concurrent eviction, try again */
+
+		if (validate_invalid_user_pages(process_info))
+			goto unlock_out;
+	}
+	/* Final check for concurrent evicton and atomic update. If
+	 * another eviction happens after successful update, it will
+	 * be a first eviction that calls quiesce_mm. The eviction
+	 * reference counting inside KFD will handle this case.
+	 */
+	if (atomic_cmpxchg(&process_info->evicted_bos, evicted_bos, 0) !=
+	    evicted_bos)
+		goto unlock_out;
+	evicted_bos = 0;
+	if (kgd2kfd->resume_mm(mm)) {
+		pr_err("%s: Failed to resume KFD\n", __func__);
+		/* No recovery from this failure. Probably the CP is
+		 * hanging. No point trying again.
+		 */
+	}
+unlock_out:
+	mutex_unlock(&process_info->lock);
+	mmput(mm);
+	put_task_struct(usertask);
+
+	/* If validation failed, reschedule another attempt */
+	if (evicted_bos)
+		schedule_delayed_work(&process_info->restore_userptr_work,
+			msecs_to_jiffies(AMDGPU_USERPTR_RESTORE_DELAY_MS));
+}
+
 /** amdgpu_amdkfd_gpuvm_restore_process_bos - Restore all BOs for the given
  *   KFD process identified by process_info
  *
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4
       [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
                     ` (19 preceding siblings ...)
  2018-03-15 21:27   ` [PATCH 20/20] drm/amdgpu: Add userptr support for KFD Felix Kuehling
@ 2018-03-19 16:39   ` Christian König
       [not found]     ` <41ef23e5-53a0-676f-3726-54fa6fc34126-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  20 siblings, 1 reply; 25+ messages in thread
From: Christian König @ 2018-03-19 16:39 UTC (permalink / raw)
  To: Felix Kuehling, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

So coming back to this series once more.

Patch #1, #3 are Reviewed-by: Christian König <christian.koenig@amd.com>.

Patch #2, #4 - #13 and #18-#19 are Acked-by: Christian König 
<christian.koenig@amd.com>.

Patch #14: What's the difference to setting vramlimit=$size_of_bar ?

Patch #15 & #20: Why is that actually still needed? I thought we have 
fixed all dependencies and can now use the "standard" way of attaching 
fences to reservation objects to do this.

Saying so I still need to take a closer look at patch #20.

Patch #16: Looks good to me in general, but I think it would be safer if 
we grab a reference to the task structure. Otherwise grabbing pages from 
a mm_struct sounds a bit scary to me.

Patch #17: I think it would be better to allocate the node when the 
locks are not held and free it when we find that it isn't used, but no 
big deal.

Regards,
Christian.

Am 15.03.2018 um 22:27 schrieb Felix Kuehling:
> Rebased and integrated review feedback from v3:
> * Removed vm->vm_context field
> * Use uninterruptible waiting in initial PD validation to avoid ERESTARTSYS
> * Return number of successful map/unmap operations in failure cases
> * Facilitate partial retry after failed map/unmap
> * Added comments with parameter descriptions to new APIs
> * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only
>
> This patch series also adds Userptr support in patches 15-20.
>
> Felix Kuehling (19):
>    drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
>    drm/amdgpu: Fix initial validation of PD BO for KFD VMs
>    drm/amdgpu: Add helper to turn an existing VM into a compute VM
>    drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
>    drm/amdkfd: Create KFD VMs on demand
>    drm/amdkfd: Remove limit on number of GPUs
>    drm/amdkfd: Aperture setup for dGPUs
>    drm/amdkfd: Add per-process IDR for buffer handles
>    drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
>    drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
>    drm/amdkfd: Add ioctls for GPUVM memory management
>    drm/amdkfd: Kmap event page for dGPUs
>    drm/amdkfd: Add module option for testing large-BAR functionality
>    drm/amdgpu: Add MMU notifier type for KFD userptr
>    drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
>    drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
>    drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
>    drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
>    drm/amdgpu: Add userptr support for KFD
>
> Oak Zeng (1):
>    drm/amdkfd: Populate DRM render device minor
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  37 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818 ++++++++++++++++++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  96 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h             |  11 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  30 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  70 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  10 +
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           | 532 ++++++++++++++
>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   3 +
>   drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  40 +-
>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  22 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_events.c            |  31 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  59 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   7 +
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  37 +
>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  41 ++
>   drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 314 +++++++-
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |   4 +
>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   1 +
>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  10 +
>   include/uapi/linux/kfd_ioctl.h                     | 122 ++-
>   26 files changed, 2090 insertions(+), 213 deletions(-)
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4
       [not found]     ` <41ef23e5-53a0-676f-3726-54fa6fc34126-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2018-03-19 19:05       ` Felix Kuehling
       [not found]         ` <cc84ba71-96fd-1763-fd13-8e04b0b7df1f-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Felix Kuehling @ 2018-03-19 19:05 UTC (permalink / raw)
  To: christian.koenig-5C7GfCeVMHo,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	oded.gabbay-Re5JQEeQqe8AvxtiuMwx3w

On 2018-03-19 12:39 PM, Christian König wrote:
> So coming back to this series once more.
>
> Patch #1, #3 are Reviewed-by: Christian König <christian.koenig@amd.com>.
>
> Patch #2, #4 - #13 and #18-#19 are Acked-by: Christian König
> <christian.koenig@amd.com>.
>
> Patch #14: What's the difference to setting vramlimit=$size_of_bar ?

The debug_largebar option only affects KFD. Graphics can still use all
memory.

>
> Patch #15 & #20: Why is that actually still needed? I thought we have
> fixed all dependencies and can now use the "standard" way of attaching
> fences to reservation objects to do this.

Patch 15 adds a KFD-specific MMU notifier, because the graphics MMU
notifier deals with amdgpu_cs command submission. We need a completely
different treatment of MMU notifiers for KFD. We need to stop user mode
queues.

Patch 20 implements the user mode queue preemption mechanism, and the
corresponding restore function that re-validates userptr BOs before
restarting user mode queues.

I think you're implying that the graphics MMU notifier would wait for
the eviction fence and trigger a regular eviction in KFD. I haven't
tried that. The MMU notifier and userptr eviction mechanism was
implemented in KFD before we had the TTM evictions. We didn't go back
and revisit it after that. There are a few major differences that make
me want to keep the two types of evictions separate, though:

  * TTM evictions are the result of memory pressure (triggered by
    another process)
      o Most MMU notifiers are triggered by something in the same
        process (fork, mprotect, etc.)
      o Thus the restore delay after MMU notifiers can be much shorter
  * TTM evictions are done asynchronously in a delayed worker
      o MMU notifiers are synchronous, queues need to be stopped before
        returning
  * KFD's TTM eviction/restore code doesn't handle userptrs
    (get_user_pages, etc)
      o MMU notifier restore worker is specialized to handle just userptrs


>
> Saying so I still need to take a closer look at patch #20.
>
> Patch #16: Looks good to me in general, but I think it would be safer
> if we grab a reference to the task structure. Otherwise grabbing pages
> from a mm_struct sounds a bit scary to me.

You're right. I've never seen it cause problems when testing process
termination, probably because during process termination KFD cancels the
delayed worker that calls this function. But I'll fix this and take a
proper reference.

>
> Patch #17: I think it would be better to allocate the node when the
> locks are not held and free it when we find that it isn't used, but no
> big deal.

OK. I'll change that.

Thanks,
  Felix

>
> Regards,
> Christian.
>
> Am 15.03.2018 um 22:27 schrieb Felix Kuehling:
>> Rebased and integrated review feedback from v3:
>> * Removed vm->vm_context field
>> * Use uninterruptible waiting in initial PD validation to avoid
>> ERESTARTSYS
>> * Return number of successful map/unmap operations in failure cases
>> * Facilitate partial retry after failed map/unmap
>> * Added comments with parameter descriptions to new APIs
>> * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only
>>
>> This patch series also adds Userptr support in patches 15-20.
>>
>> Felix Kuehling (19):
>>    drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
>>    drm/amdgpu: Fix initial validation of PD BO for KFD VMs
>>    drm/amdgpu: Add helper to turn an existing VM into a compute VM
>>    drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
>>    drm/amdkfd: Create KFD VMs on demand
>>    drm/amdkfd: Remove limit on number of GPUs
>>    drm/amdkfd: Aperture setup for dGPUs
>>    drm/amdkfd: Add per-process IDR for buffer handles
>>    drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
>>    drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
>>    drm/amdkfd: Add ioctls for GPUVM memory management
>>    drm/amdkfd: Kmap event page for dGPUs
>>    drm/amdkfd: Add module option for testing large-BAR functionality
>>    drm/amdgpu: Add MMU notifier type for KFD userptr
>>    drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
>>    drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
>>    drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
>>    drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
>>    drm/amdgpu: Add userptr support for KFD
>>
>> Oak Zeng (1):
>>    drm/amdkfd: Populate DRM render device minor
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  37 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818
>> ++++++++++++++++++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |   2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  96 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h             |  11 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  30 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  70 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  10 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           | 532
>> ++++++++++++++
>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   3 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  40 +-
>>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  22 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c            |  31 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  59 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   7 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   2 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   2 +-
>>   drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  37 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  41 ++
>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 314 +++++++-
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |   4 +
>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   1 +
>>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  10 +
>>   include/uapi/linux/kfd_ioctl.h                     | 122 ++-
>>   26 files changed, 2090 insertions(+), 213 deletions(-)
>>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4
       [not found]         ` <cc84ba71-96fd-1763-fd13-8e04b0b7df1f-5C7GfCeVMHo@public.gmane.org>
@ 2018-03-21  7:52           ` Oded Gabbay
       [not found]             ` <CAFCwf10KtqTzC47iKU=MLwXbLU6C5qa7PqMmFXWbPah0U23xQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 25+ messages in thread
From: Oded Gabbay @ 2018-03-21  7:52 UTC (permalink / raw)
  To: Felix Kuehling; +Cc: Christian König, amd-gfx list

On Mon, Mar 19, 2018 at 9:05 PM, Felix Kuehling <felix.kuehling@amd.com> wrote:
> On 2018-03-19 12:39 PM, Christian König wrote:
>> So coming back to this series once more.
>>
>> Patch #1, #3 are Reviewed-by: Christian König <christian.koenig@amd.com>.
>>
>> Patch #2, #4 - #13 and #18-#19 are Acked-by: Christian König
>> <christian.koenig@amd.com>.
>>
>> Patch #14: What's the difference to setting vramlimit=$size_of_bar ?
>
> The debug_largebar option only affects KFD. Graphics can still use all
> memory.
>
>>
>> Patch #15 & #20: Why is that actually still needed? I thought we have
>> fixed all dependencies and can now use the "standard" way of attaching
>> fences to reservation objects to do this.
>
> Patch 15 adds a KFD-specific MMU notifier, because the graphics MMU
> notifier deals with amdgpu_cs command submission. We need a completely
> different treatment of MMU notifiers for KFD. We need to stop user mode
> queues.
>
> Patch 20 implements the user mode queue preemption mechanism, and the
> corresponding restore function that re-validates userptr BOs before
> restarting user mode queues.
>
> I think you're implying that the graphics MMU notifier would wait for
> the eviction fence and trigger a regular eviction in KFD. I haven't
> tried that. The MMU notifier and userptr eviction mechanism was
> implemented in KFD before we had the TTM evictions. We didn't go back
> and revisit it after that. There are a few major differences that make
> me want to keep the two types of evictions separate, though:
>
>   * TTM evictions are the result of memory pressure (triggered by
>     another process)
>       o Most MMU notifiers are triggered by something in the same
>         process (fork, mprotect, etc.)
>       o Thus the restore delay after MMU notifiers can be much shorter
>   * TTM evictions are done asynchronously in a delayed worker
>       o MMU notifiers are synchronous, queues need to be stopped before
>         returning
>   * KFD's TTM eviction/restore code doesn't handle userptrs
>     (get_user_pages, etc)
>       o MMU notifier restore worker is specialized to handle just userptrs
>
>
>>
>> Saying so I still need to take a closer look at patch #20.
>>
>> Patch #16: Looks good to me in general, but I think it would be safer
>> if we grab a reference to the task structure. Otherwise grabbing pages
>> from a mm_struct sounds a bit scary to me.
>
> You're right. I've never seen it cause problems when testing process
> termination, probably because during process termination KFD cancels the
> delayed worker that calls this function. But I'll fix this and take a
> proper reference.
>
>>
>> Patch #17: I think it would be better to allocate the node when the
>> locks are not held and free it when we find that it isn't used, but no
>> big deal.
>
> OK. I'll change that.
>
> Thanks,
>   Felix
>
>>
>> Regards,
>> Christian.
>>
>> Am 15.03.2018 um 22:27 schrieb Felix Kuehling:
>>> Rebased and integrated review feedback from v3:
>>> * Removed vm->vm_context field
>>> * Use uninterruptible waiting in initial PD validation to avoid
>>> ERESTARTSYS
>>> * Return number of successful map/unmap operations in failure cases
>>> * Facilitate partial retry after failed map/unmap
>>> * Added comments with parameter descriptions to new APIs
>>> * Defined AMDKFD_IOC_FREE_MEMORY_OF_GPU write-only
>>>
>>> This patch series also adds Userptr support in patches 15-20.
>>>
>>> Felix Kuehling (19):
>>>    drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm
>>>    drm/amdgpu: Fix initial validation of PD BO for KFD VMs
>>>    drm/amdgpu: Add helper to turn an existing VM into a compute VM
>>>    drm/amdgpu: Add kfd2kgd interface to acquire an existing VM
>>>    drm/amdkfd: Create KFD VMs on demand
>>>    drm/amdkfd: Remove limit on number of GPUs
>>>    drm/amdkfd: Aperture setup for dGPUs
>>>    drm/amdkfd: Add per-process IDR for buffer handles
>>>    drm/amdkfd: Allocate CWSR trap handler memory for dGPUs
>>>    drm/amdkfd: Add TC flush on VMID deallocation for Hawaii
>>>    drm/amdkfd: Add ioctls for GPUVM memory management
>>>    drm/amdkfd: Kmap event page for dGPUs
>>>    drm/amdkfd: Add module option for testing large-BAR functionality
>>>    drm/amdgpu: Add MMU notifier type for KFD userptr
>>>    drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads
>>>    drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier
>>>    drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
>>>    drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls
>>>    drm/amdgpu: Add userptr support for KFD
>>>
>>> Oak Zeng (1):
>>>    drm/amdkfd: Populate DRM render device minor
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h         |  37 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  |   1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |   1 +
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c   | 818
>>> ++++++++++++++++++---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |   2 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  96 ++-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h             |  11 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  30 +-
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  70 ++
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  10 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c           | 532
>>> ++++++++++++++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_crat.c              |   3 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_device.c            |  40 +-
>>>   .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  |  22 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_events.c            |  31 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_flat_memory.c       |  59 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_module.c            |   7 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c   |   2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_vi.c    |   2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c    |  37 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h              |  41 ++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c           | 314 +++++++-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.c          |   4 +
>>>   drivers/gpu/drm/amd/amdkfd/kfd_topology.h          |   1 +
>>>   drivers/gpu/drm/amd/include/kgd_kfd_interface.h    |  10 +
>>>   include/uapi/linux/kfd_ioctl.h                     | 122 ++-
>>>   26 files changed, 2090 insertions(+), 213 deletions(-)
>>>
>>
>

Hi Felix,
I did a quick pass on the patch-set and didn't see anything scary.
Patches 1-14 are already applied to my -next tree. If I will send it
now to Dave I believe we would be ok from schedule POV.
I suggest we delay userptr support for the next kernel release because
you need to address what Christian said and I also want to take a
closer look at it.
What do you think ?

Thanks,
Oded
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 00/20] Add KFD GPUVM support for dGPUs v4
       [not found]             ` <CAFCwf10KtqTzC47iKU=MLwXbLU6C5qa7PqMmFXWbPah0U23xQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-03-21 15:36               ` Felix Kuehling
  0 siblings, 0 replies; 25+ messages in thread
From: Felix Kuehling @ 2018-03-21 15:36 UTC (permalink / raw)
  To: Oded Gabbay; +Cc: Christian König, amd-gfx list

On 2018-03-21 03:52 AM, Oded Gabbay wrote:
>
> Hi Felix,
> I did a quick pass on the patch-set and didn't see anything scary.
> Patches 1-14 are already applied to my -next tree. If I will send it
> now to Dave I believe we would be ok from schedule POV.
> I suggest we delay userptr support for the next kernel release because
> you need to address what Christian said and I also want to take a
> closer look at it.
> What do you think ?

OK. I have changes to address Christian's comments that I'm testing
internally. I'll send out an update this week.

Thanks,
  Felix

>
> Thanks,
> Oded

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2018-03-21 15:36 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-15 21:27 [PATCH 00/20] Add KFD GPUVM support for dGPUs v4 Felix Kuehling
     [not found] ` <1521149279-31892-1-git-send-email-Felix.Kuehling-5C7GfCeVMHo@public.gmane.org>
2018-03-15 21:27   ` [PATCH 01/20] drm/amdgpu: Move KFD-specific fields into struct amdgpu_vm Felix Kuehling
2018-03-15 21:27   ` [PATCH 02/20] drm/amdgpu: Fix initial validation of PD BO for KFD VMs Felix Kuehling
2018-03-15 21:27   ` [PATCH 03/20] drm/amdgpu: Add helper to turn an existing VM into a compute VM Felix Kuehling
2018-03-15 21:27   ` [PATCH 04/20] drm/amdgpu: Add kfd2kgd interface to acquire an existing VM Felix Kuehling
2018-03-15 21:27   ` [PATCH 05/20] drm/amdkfd: Create KFD VMs on demand Felix Kuehling
2018-03-15 21:27   ` [PATCH 06/20] drm/amdkfd: Populate DRM render device minor Felix Kuehling
2018-03-15 21:27   ` [PATCH 07/20] drm/amdkfd: Remove limit on number of GPUs Felix Kuehling
2018-03-15 21:27   ` [PATCH 08/20] drm/amdkfd: Aperture setup for dGPUs Felix Kuehling
2018-03-15 21:27   ` [PATCH 09/20] drm/amdkfd: Add per-process IDR for buffer handles Felix Kuehling
2018-03-15 21:27   ` [PATCH 10/20] drm/amdkfd: Allocate CWSR trap handler memory for dGPUs Felix Kuehling
2018-03-15 21:27   ` [PATCH 11/20] drm/amdkfd: Add TC flush on VMID deallocation for Hawaii Felix Kuehling
2018-03-15 21:27   ` [PATCH 12/20] drm/amdkfd: Add ioctls for GPUVM memory management Felix Kuehling
2018-03-15 21:27   ` [PATCH 13/20] drm/amdkfd: Kmap event page for dGPUs Felix Kuehling
2018-03-15 21:27   ` [PATCH 14/20] drm/amdkfd: Add module option for testing large-BAR functionality Felix Kuehling
2018-03-15 21:27   ` [PATCH 15/20] drm/amdgpu: Add MMU notifier type for KFD userptr Felix Kuehling
2018-03-15 21:27   ` [PATCH 16/20] drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads Felix Kuehling
2018-03-15 21:27   ` [PATCH 17/20] drm/amdgpu: GFP_NOIO while holding locks taken in MMU notifier Felix Kuehling
2018-03-15 21:27   ` [PATCH 18/20] drm/amdkfd: " Felix Kuehling
2018-03-15 21:27   ` [PATCH 19/20] drm/amdkfd: Add quiesce_mm and resume_mm to kgd2kfd_calls Felix Kuehling
2018-03-15 21:27   ` [PATCH 20/20] drm/amdgpu: Add userptr support for KFD Felix Kuehling
2018-03-19 16:39   ` [PATCH 00/20] Add KFD GPUVM support for dGPUs v4 Christian König
     [not found]     ` <41ef23e5-53a0-676f-3726-54fa6fc34126-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2018-03-19 19:05       ` Felix Kuehling
     [not found]         ` <cc84ba71-96fd-1763-fd13-8e04b0b7df1f-5C7GfCeVMHo@public.gmane.org>
2018-03-21  7:52           ` Oded Gabbay
     [not found]             ` <CAFCwf10KtqTzC47iKU=MLwXbLU6C5qa7PqMmFXWbPah0U23xQw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-03-21 15:36               ` Felix Kuehling

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.