From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?Q2hyaXN0aWFuIEvDtm5pZw==?= Subject: Re: [PATCH 1/3] drm/radeon: stop poisoning the GART TLB Date: Tue, 24 Jun 2014 12:14:44 +0200 Message-ID: <53A94F94.6040603@amd.com> References: <1401888598-1961-1-git-send-email-deathsimple@vodafone.de> <5398218A.4040104@vodafone.de> <53998D99.6050008@vodafone.de> <539B1CA0.6010600@vodafone.de> <539D9601.8090308@vodafone.de> <53A2415D.6020808@daenzer.net> <53A2B155.4000001@vodafone.de> <53A7E21E.1000000@daenzer.net> <53A7F9E1.8080700@amd.com> <53A91F89.7090504@daenzer.net> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------070807030607040809050100" Return-path: Received: from na01-bl2-obe.outbound.protection.outlook.com (mail-bl2lp0212.outbound.protection.outlook.com [207.46.163.212]) by gabe.freedesktop.org (Postfix) with ESMTP id 4E7BE6E4F1 for ; Tue, 24 Jun 2014 03:15:09 -0700 (PDT) In-Reply-To: <53A91F89.7090504@daenzer.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: =?UTF-8?B?TWljaGVsIETDpG56ZXI=?= , Alex Deucher Cc: dri-devel List-Id: dri-devel@lists.freedesktop.org --------------070807030607040809050100 Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: quoted-printable Am 24.06.2014 08:49, schrieb Michel D=C3=A4nzer: > On 23.06.2014 18:56, Christian K=C3=B6nig wrote: >> Am 23.06.2014 10:15, schrieb Michel D=C3=A4nzer: >>> On 19.06.2014 18:45, Christian K=C3=B6nig wrote: >>> >>>> I think even when we revert to the old code we have a couple of unso= lved >>>> problems with the VM support or in the driver in general where we sh= ould >>>> try to understand the underlying reason for it instead of applying m= ore >>>> workarounds. >>> I'm not suggesting applying more workarounds but going back to a know= n >>> more stable state. It seems like we've maneuvered ourselves to a rath= er >>> uncomfortable position from there, with no clear way to a better plac= e. >>> But if we basically started from the 3.14 state again, we have a few >>> known hurdles like mine and Marek's Bonaire etc. which we know any >>> further improvements will have to pass before they can be considered = for >>> general consumption. >> Yeah agree, especially on the uncomfortable position. >> >> Please try with the two attached patches applied on top of 3.15 and >> retest. They should revert back to the old implementation. > Unfortunately, X fails to start with these, see the attached excerpt > from dmesg. My fault, incorrectly solved a merge conflict and then failed to test=20 the right kernel. BTW: Wasn't there an option to tell grup to use the latest installed=20 kernel instead of the one with the highest version number? Can't seem to=20 find that any more. Please try attached patches instead, Christian. --------------070807030607040809050100 Content-Type: text/x-diff; name="0001-drm-radeon-Revert-drop-non-blocking-allocations-from.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename*0="0001-drm-radeon-Revert-drop-non-blocking-allocations-from.pa"; filename*1="tch" >>From 17300436e5598357cb9396d3d52c8c40064adc16 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Christian=20K=C3=B6nig?= Date: Mon, 23 Jun 2014 11:07:29 +0200 Subject: [PATCH 1/2] drm/radeon: Revert drop non blocking allocations from sub allocator The next revert needs this functionality. This reverts commit 4d1526466296360f56f93c195848c1202b0cc10b. --- drivers/gpu/drm/radeon/radeon_object.h | 2 +- drivers/gpu/drm/radeon/radeon_ring.c | 2 +- drivers/gpu/drm/radeon/radeon_sa.c | 7 +++++-- drivers/gpu/drm/radeon/radeon_semaphore.c | 2 +- 4 files changed, 8 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h index 9e7b25a..7dff64d 100644 --- a/drivers/gpu/drm/radeon/radeon_object.h +++ b/drivers/gpu/drm/radeon/radeon_object.h @@ -180,7 +180,7 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev, extern int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align); + unsigned size, unsigned align, bool block); extern void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo, struct radeon_fence *fence); diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index f8050f5..62201db 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -63,7 +63,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring, { int r; - r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &ib->sa_bo, size, 256); + r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &ib->sa_bo, size, 256, true); if (r) { dev_err(rdev->dev, "failed to get a new IB (%d)\n", r); return r; diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c index adcf3e2..c062580 100644 --- a/drivers/gpu/drm/radeon/radeon_sa.c +++ b/drivers/gpu/drm/radeon/radeon_sa.c @@ -312,7 +312,7 @@ static bool radeon_sa_bo_next_hole(struct radeon_sa_manager *sa_manager, int radeon_sa_bo_new(struct radeon_device *rdev, struct radeon_sa_manager *sa_manager, struct radeon_sa_bo **sa_bo, - unsigned size, unsigned align) + unsigned size, unsigned align, bool block) { struct radeon_fence *fences[RADEON_NUM_RINGS]; unsigned tries[RADEON_NUM_RINGS]; @@ -353,11 +353,14 @@ int radeon_sa_bo_new(struct radeon_device *rdev, r = radeon_fence_wait_any(rdev, fences, false); spin_lock(&sa_manager->wq.lock); /* if we have nothing to wait for block */ - if (r == -ENOENT) { + if (r == -ENOENT && block) { r = wait_event_interruptible_locked( sa_manager->wq, radeon_sa_event(sa_manager, size, align) ); + + } else if (r == -ENOENT) { + r = -ENOMEM; } } while (!r); diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c index dbd6bcd..6140af6 100644 --- a/drivers/gpu/drm/radeon/radeon_semaphore.c +++ b/drivers/gpu/drm/radeon/radeon_semaphore.c @@ -42,7 +42,7 @@ int radeon_semaphore_create(struct radeon_device *rdev, return -ENOMEM; } r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &(*semaphore)->sa_bo, - 8 * RADEON_NUM_SYNCS, 8); + 8 * RADEON_NUM_SYNCS, 8, true); if (r) { kfree(*semaphore); *semaphore = NULL; -- 1.9.1 --------------070807030607040809050100 Content-Type: text/x-diff; name="0002-drm-radeon-Revert-use-normal-BOs-for-the-page-tables.patch" Content-Disposition: attachment; filename*0="0002-drm-radeon-Revert-use-normal-BOs-for-the-page-tables.pa"; filename*1="tch" Content-Transfer-Encoding: quoted-printable >>From 28cd0733ff4b91b917962e964255f0a12278b29a Mon Sep 17 00:00:00 2001 From: =3D?UTF-8?q?Christian=3D20K=3DC3=3DB6nig?=3D Date: Mon, 23 Jun 2014 11:08:24 +0200 Subject: [PATCH 2/2] drm/radeon: Revert use normal BOs for the page table= s v2 MIME-Version: 1.0 Content-Type: text/plain; charset=3DUTF-8 Content-Transfer-Encoding: 8bit This reverts the commit "use normal BOs for the page tables v4" and the f= ollowing depending bug fixes: drm/radeon: sync page table updates drm/radeon: fix vm buffer size estimation drm/radeon: only allocate necessary size for vm bo list drm/radeon: fix page directory update size estimation drm/radeon: remove global vm lock v2: fix incorrect merge conflict solving Signed-off-by: Christian K=C3=B6nig --- drivers/gpu/drm/radeon/radeon.h | 24 +- drivers/gpu/drm/radeon/radeon_cs.c | 48 ++- drivers/gpu/drm/radeon/radeon_device.c | 4 +- drivers/gpu/drm/radeon/radeon_kms.c | 7 +- drivers/gpu/drm/radeon/radeon_ring.c | 7 - drivers/gpu/drm/radeon/radeon_vm.c | 513 ++++++++++++++++++---------= ------ 6 files changed, 309 insertions(+), 294 deletions(-) diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/rad= eon.h index 8149e7c..b390d79 100644 --- a/drivers/gpu/drm/radeon/radeon.h +++ b/drivers/gpu/drm/radeon/radeon.h @@ -854,22 +854,17 @@ struct radeon_mec { #define R600_PTE_READABLE (1 << 5) #define R600_PTE_WRITEABLE (1 << 6) =20 -struct radeon_vm_pt { - struct radeon_bo *bo; - uint64_t addr; -}; - struct radeon_vm { + struct list_head list; struct list_head va; unsigned id; =20 /* contains the page directory */ - struct radeon_bo *page_directory; + struct radeon_sa_bo *page_directory; uint64_t pd_gpu_addr; - unsigned max_pde_used; =20 /* array of page tables, one for each page directory entry */ - struct radeon_vm_pt *page_tables; + struct radeon_sa_bo **page_tables; =20 struct mutex mutex; /* last fence for cs using this vm */ @@ -881,7 +876,10 @@ struct radeon_vm { }; =20 struct radeon_vm_manager { + struct mutex lock; + struct list_head lru_vm; struct radeon_fence *active[RADEON_NUM_VM]; + struct radeon_sa_manager sa_manager; uint32_t max_pfn; /* number of VMIDs */ unsigned nvm; @@ -1013,7 +1011,6 @@ struct radeon_cs_parser { unsigned nrelocs; struct radeon_cs_reloc *relocs; struct radeon_cs_reloc **relocs_ptr; - struct radeon_cs_reloc *vm_bos; struct list_head validated; unsigned dma_reloc_idx; /* indices of various chunks */ @@ -2807,11 +2804,10 @@ extern void radeon_program_register_sequence(stru= ct radeon_device *rdev, */ int radeon_vm_manager_init(struct radeon_device *rdev); void radeon_vm_manager_fini(struct radeon_device *rdev); -int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm); +void radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm); void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm); -struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev, - struct radeon_vm *vm, - struct list_head *head); +int radeon_vm_alloc_pt(struct radeon_device *rdev, struct radeon_vm *vm)= ; +void radeon_vm_add_to_lru(struct radeon_device *rdev, struct radeon_vm *= vm); struct radeon_fence *radeon_vm_grab_id(struct radeon_device *rdev, struct radeon_vm *vm, int ring); void radeon_vm_flush(struct radeon_device *rdev, @@ -2821,8 +2817,6 @@ void radeon_vm_fence(struct radeon_device *rdev, struct radeon_vm *vm, struct radeon_fence *fence); uint64_t radeon_vm_map_gart(struct radeon_device *rdev, uint64_t addr); -int radeon_vm_update_page_directory(struct radeon_device *rdev, - struct radeon_vm *vm); int radeon_vm_bo_update(struct radeon_device *rdev, struct radeon_vm *vm, struct radeon_bo *bo, diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/= radeon_cs.c index 41ecf8a..06a00a1 100644 --- a/drivers/gpu/drm/radeon/radeon_cs.c +++ b/drivers/gpu/drm/radeon/radeon_cs.c @@ -173,10 +173,6 @@ static int radeon_cs_parser_relocs(struct radeon_cs_= parser *p) =20 radeon_cs_buckets_get_list(&buckets, &p->validated); =20 - if (p->cs_flags & RADEON_CS_USE_VM) - p->vm_bos =3D radeon_vm_get_bos(p->rdev, p->ib.vm, - &p->validated); - return radeon_bo_list_validate(p->rdev, &p->ticket, &p->validated, p->r= ing); } =20 @@ -417,7 +413,6 @@ static void radeon_cs_parser_fini(struct radeon_cs_pa= rser *parser, int error, bo kfree(parser->track); kfree(parser->relocs); kfree(parser->relocs_ptr); - kfree(parser->vm_bos); for (i =3D 0; i < parser->nchunks; i++) drm_free_large(parser->chunks[i].kdata); kfree(parser->chunks); @@ -457,32 +452,24 @@ static int radeon_cs_ib_chunk(struct radeon_device = *rdev, return r; } =20 -static int radeon_bo_vm_update_pte(struct radeon_cs_parser *p, +static int radeon_bo_vm_update_pte(struct radeon_cs_parser *parser, struct radeon_vm *vm) { - struct radeon_device *rdev =3D p->rdev; - int i, r; - - r =3D radeon_vm_update_page_directory(rdev, vm); - if (r) - return r; + struct radeon_device *rdev =3D parser->rdev; + struct radeon_cs_reloc *lobj; + struct radeon_bo *bo; + int r; =20 - r =3D radeon_vm_bo_update(rdev, vm, rdev->ring_tmp_bo.bo, - &rdev->ring_tmp_bo.bo->tbo.mem); - if (r) + r =3D radeon_vm_bo_update(rdev, vm, rdev->ring_tmp_bo.bo, &rdev->ring_t= mp_bo.bo->tbo.mem); + if (r) { return r; - - for (i =3D 0; i < p->nrelocs; i++) { - struct radeon_bo *bo; - - /* ignore duplicates */ - if (p->relocs_ptr[i] !=3D &p->relocs[i]) - continue; - - bo =3D p->relocs[i].robj; - r =3D radeon_vm_bo_update(rdev, vm, bo, &bo->tbo.mem); - if (r) + } + list_for_each_entry(lobj, &parser->validated, tv.head) { + bo =3D lobj->robj; + r =3D radeon_vm_bo_update(parser->rdev, vm, bo, &bo->tbo.mem); + if (r) { return r; + } } return 0; } @@ -514,13 +501,20 @@ static int radeon_cs_ib_vm_chunk(struct radeon_devi= ce *rdev, if (parser->ring =3D=3D R600_RING_TYPE_UVD_INDEX) radeon_uvd_note_usage(rdev); =20 + mutex_lock(&rdev->vm_manager.lock); mutex_lock(&vm->mutex); + r =3D radeon_vm_alloc_pt(rdev, vm); + if (r) { + goto out; + } r =3D radeon_bo_vm_update_pte(parser, vm); if (r) { goto out; } radeon_cs_sync_rings(parser); radeon_semaphore_sync_to(parser->ib.semaphore, vm->fence); + radeon_semaphore_sync_to(parser->ib.semaphore, + radeon_vm_grab_id(rdev, vm, parser->ring)); =20 if ((rdev->family >=3D CHIP_TAHITI) && (parser->chunk_const_ib_idx !=3D -1)) { @@ -530,7 +524,9 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device= *rdev, } =20 out: + radeon_vm_add_to_lru(rdev, vm); mutex_unlock(&vm->mutex); + mutex_unlock(&rdev->vm_manager.lock); return r; } =20 diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/rad= eon/radeon_device.c index 2cd144c..9ebd035 100644 --- a/drivers/gpu/drm/radeon/radeon_device.c +++ b/drivers/gpu/drm/radeon/radeon_device.c @@ -1195,12 +1195,14 @@ int radeon_device_init(struct radeon_device *rdev= , r =3D radeon_gem_init(rdev); if (r) return r; - + /* initialize vm here */ + mutex_init(&rdev->vm_manager.lock); /* Adjust VM size here. * Currently set to 4GB ((1 << 20) 4k pages). * Max GPUVM size for cayman and SI is 40 bits. */ rdev->vm_manager.max_pfn =3D 1 << 20; + INIT_LIST_HEAD(&rdev->vm_manager.lru_vm); =20 /* Set asic functions */ r =3D radeon_asic_init(rdev); diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon= /radeon_kms.c index eaaedba..cc47fa1 100644 --- a/drivers/gpu/drm/radeon/radeon_kms.c +++ b/drivers/gpu/drm/radeon/radeon_kms.c @@ -571,12 +571,7 @@ int radeon_driver_open_kms(struct drm_device *dev, s= truct drm_file *file_priv) return -ENOMEM; } =20 - r =3D radeon_vm_init(rdev, &fpriv->vm); - if (r) { - kfree(fpriv); - return r; - } - + radeon_vm_init(rdev, &fpriv->vm); if (rdev->accel_working) { r =3D radeon_bo_reserve(rdev->ring_tmp_bo.bo, false); if (r) { diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeo= n/radeon_ring.c index 62201db..4ddc6d77 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -145,13 +145,6 @@ int radeon_ib_schedule(struct radeon_device *rdev, s= truct radeon_ib *ib, return r; } =20 - /* grab a vm id if necessary */ - if (ib->vm) { - struct radeon_fence *vm_id_fence; - vm_id_fence =3D radeon_vm_grab_id(rdev, ib->vm, ib->ring); - radeon_semaphore_sync_to(ib->semaphore, vm_id_fence); - } - /* sync with other rings */ r =3D radeon_semaphore_sync_rings(rdev, ib->semaphore, ib->ring); if (r) { diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/= radeon_vm.c index c11b71d..5160176 100644 --- a/drivers/gpu/drm/radeon/radeon_vm.c +++ b/drivers/gpu/drm/radeon/radeon_vm.c @@ -84,19 +84,85 @@ static unsigned radeon_vm_directory_size(struct radeo= n_device *rdev) */ int radeon_vm_manager_init(struct radeon_device *rdev) { + struct radeon_vm *vm; + struct radeon_bo_va *bo_va; int r; + unsigned size; =20 if (!rdev->vm_manager.enabled) { + /* allocate enough for 2 full VM pts */ + size =3D radeon_vm_directory_size(rdev); + size +=3D rdev->vm_manager.max_pfn * 8; + size *=3D 2; + r =3D radeon_sa_bo_manager_init(rdev, &rdev->vm_manager.sa_manager, + RADEON_GPU_PAGE_ALIGN(size), + RADEON_VM_PTB_ALIGN_SIZE, + RADEON_GEM_DOMAIN_VRAM); + if (r) { + dev_err(rdev->dev, "failed to allocate vm bo (%dKB)\n", + (rdev->vm_manager.max_pfn * 8) >> 10); + return r; + } + r =3D radeon_asic_vm_init(rdev); if (r) return r; =20 rdev->vm_manager.enabled =3D true; + + r =3D radeon_sa_bo_manager_start(rdev, &rdev->vm_manager.sa_manager); + if (r) + return r; + } + + /* restore page table */ + list_for_each_entry(vm, &rdev->vm_manager.lru_vm, list) { + if (vm->page_directory =3D=3D NULL) + continue; + + list_for_each_entry(bo_va, &vm->va, vm_list) { + bo_va->valid =3D false; + } } return 0; } =20 /** + * radeon_vm_free_pt - free the page table for a specific vm + * + * @rdev: radeon_device pointer + * @vm: vm to unbind + * + * Free the page table of a specific vm (cayman+). + * + * Global and local mutex must be lock! + */ +static void radeon_vm_free_pt(struct radeon_device *rdev, + struct radeon_vm *vm) +{ + struct radeon_bo_va *bo_va; + int i; + + if (!vm->page_directory) + return; + + list_del_init(&vm->list); + radeon_sa_bo_free(rdev, &vm->page_directory, vm->fence); + + list_for_each_entry(bo_va, &vm->va, vm_list) { + bo_va->valid =3D false; + } + + if (vm->page_tables =3D=3D NULL) + return; + + for (i =3D 0; i < radeon_vm_num_pdes(rdev); i++) + radeon_sa_bo_free(rdev, &vm->page_tables[i], vm->fence); + + kfree(vm->page_tables); +} + +/** * radeon_vm_manager_fini - tear down the vm manager * * @rdev: radeon_device pointer @@ -105,63 +171,155 @@ int radeon_vm_manager_init(struct radeon_device *r= dev) */ void radeon_vm_manager_fini(struct radeon_device *rdev) { + struct radeon_vm *vm, *tmp; int i; =20 if (!rdev->vm_manager.enabled) return; =20 - for (i =3D 0; i < RADEON_NUM_VM; ++i) + mutex_lock(&rdev->vm_manager.lock); + /* free all allocated page tables */ + list_for_each_entry_safe(vm, tmp, &rdev->vm_manager.lru_vm, list) { + mutex_lock(&vm->mutex); + radeon_vm_free_pt(rdev, vm); + mutex_unlock(&vm->mutex); + } + for (i =3D 0; i < RADEON_NUM_VM; ++i) { radeon_fence_unref(&rdev->vm_manager.active[i]); + } radeon_asic_vm_fini(rdev); + mutex_unlock(&rdev->vm_manager.lock); + + radeon_sa_bo_manager_suspend(rdev, &rdev->vm_manager.sa_manager); + radeon_sa_bo_manager_fini(rdev, &rdev->vm_manager.sa_manager); rdev->vm_manager.enabled =3D false; } =20 /** - * radeon_vm_get_bos - add the vm BOs to a validation list + * radeon_vm_evict - evict page table to make room for new one + * + * @rdev: radeon_device pointer + * @vm: VM we want to allocate something for * - * @vm: vm providing the BOs - * @head: head of validation list + * Evict a VM from the lru, making sure that it isn't @vm. (cayman+). + * Returns 0 for success, -ENOMEM for failure. * - * Add the page directory to the list of BOs to - * validate for command submission (cayman+). + * Global and local mutex must be locked! */ -struct radeon_cs_reloc *radeon_vm_get_bos(struct radeon_device *rdev, - struct radeon_vm *vm, - struct list_head *head) +static int radeon_vm_evict(struct radeon_device *rdev, struct radeon_vm = *vm) { - struct radeon_cs_reloc *list; - unsigned i, idx; + struct radeon_vm *vm_evict; =20 - list =3D kmalloc_array(vm->max_pde_used + 2, - sizeof(struct radeon_cs_reloc), GFP_KERNEL); - if (!list) - return NULL; + if (list_empty(&rdev->vm_manager.lru_vm)) + return -ENOMEM; =20 - /* add the vm page table to the list */ - list[0].gobj =3D NULL; - list[0].robj =3D vm->page_directory; - list[0].domain =3D RADEON_GEM_DOMAIN_VRAM; - list[0].alt_domain =3D RADEON_GEM_DOMAIN_VRAM; - list[0].tv.bo =3D &vm->page_directory->tbo; - list[0].tiling_flags =3D 0; - list[0].handle =3D 0; - list_add(&list[0].tv.head, head); - - for (i =3D 0, idx =3D 1; i <=3D vm->max_pde_used; i++) { - if (!vm->page_tables[i].bo) - continue; + vm_evict =3D list_first_entry(&rdev->vm_manager.lru_vm, + struct radeon_vm, list); + if (vm_evict =3D=3D vm) + return -ENOMEM; + + mutex_lock(&vm_evict->mutex); + radeon_vm_free_pt(rdev, vm_evict); + mutex_unlock(&vm_evict->mutex); + return 0; +} =20 - list[idx].gobj =3D NULL; - list[idx].robj =3D vm->page_tables[i].bo; - list[idx].domain =3D RADEON_GEM_DOMAIN_VRAM; - list[idx].alt_domain =3D RADEON_GEM_DOMAIN_VRAM; - list[idx].tv.bo =3D &list[idx].robj->tbo; - list[idx].tiling_flags =3D 0; - list[idx].handle =3D 0; - list_add(&list[idx++].tv.head, head); +/** + * radeon_vm_alloc_pt - allocates a page table for a VM + * + * @rdev: radeon_device pointer + * @vm: vm to bind + * + * Allocate a page table for the requested vm (cayman+). + * Returns 0 for success, error for failure. + * + * Global and local mutex must be locked! + */ +int radeon_vm_alloc_pt(struct radeon_device *rdev, struct radeon_vm *vm) +{ + unsigned pd_size, pd_entries, pts_size; + struct radeon_ib ib; + int r; + + if (vm =3D=3D NULL) { + return -EINVAL; + } + + if (vm->page_directory !=3D NULL) { + return 0; + } + + pd_size =3D radeon_vm_directory_size(rdev); + pd_entries =3D radeon_vm_num_pdes(rdev); + +retry: + r =3D radeon_sa_bo_new(rdev, &rdev->vm_manager.sa_manager, + &vm->page_directory, pd_size, + RADEON_VM_PTB_ALIGN_SIZE, false); + if (r =3D=3D -ENOMEM) { + r =3D radeon_vm_evict(rdev, vm); + if (r) + return r; + goto retry; + + } else if (r) { + return r; } =20 - return list; + vm->pd_gpu_addr =3D radeon_sa_bo_gpu_addr(vm->page_directory); + + /* Initially clear the page directory */ + r =3D radeon_ib_get(rdev, R600_RING_TYPE_DMA_INDEX, &ib, + NULL, pd_entries * 2 + 64); + if (r) { + radeon_sa_bo_free(rdev, &vm->page_directory, vm->fence); + return r; + } + + ib.length_dw =3D 0; + + radeon_asic_vm_set_page(rdev, &ib, vm->pd_gpu_addr, + 0, pd_entries, 0, 0); + + radeon_semaphore_sync_to(ib.semaphore, vm->fence); + r =3D radeon_ib_schedule(rdev, &ib, NULL); + if (r) { + radeon_ib_free(rdev, &ib); + radeon_sa_bo_free(rdev, &vm->page_directory, vm->fence); + return r; + } + radeon_fence_unref(&vm->fence); + vm->fence =3D radeon_fence_ref(ib.fence); + radeon_ib_free(rdev, &ib); + radeon_fence_unref(&vm->last_flush); + + /* allocate page table array */ + pts_size =3D radeon_vm_num_pdes(rdev) * sizeof(struct radeon_sa_bo *); + vm->page_tables =3D kzalloc(pts_size, GFP_KERNEL); + + if (vm->page_tables =3D=3D NULL) { + DRM_ERROR("Cannot allocate memory for page table array\n"); + radeon_sa_bo_free(rdev, &vm->page_directory, vm->fence); + return -ENOMEM; + } + + return 0; +} + +/** + * radeon_vm_add_to_lru - add VMs page table to LRU list + * + * @rdev: radeon_device pointer + * @vm: vm to add to LRU + * + * Add the allocated page table to the LRU list (cayman+). + * + * Global mutex must be locked! + */ +void radeon_vm_add_to_lru(struct radeon_device *rdev, struct radeon_vm *= vm) +{ + list_del_init(&vm->list); + list_add_tail(&vm->list, &rdev->vm_manager.lru_vm); } =20 /** @@ -235,14 +393,10 @@ void radeon_vm_flush(struct radeon_device *rdev, struct radeon_vm *vm, int ring) { - uint64_t pd_addr =3D radeon_bo_gpu_offset(vm->page_directory); - /* if we can't remember our last VM flush then flush now! */ /* XXX figure out why we have to flush all the time */ - if (!vm->last_flush || true || pd_addr !=3D vm->pd_gpu_addr) { - vm->pd_gpu_addr =3D pd_addr; + if (!vm->last_flush || true) radeon_ring_vm_flush(rdev, ring, vm); - } } =20 /** @@ -342,63 +496,6 @@ struct radeon_bo_va *radeon_vm_bo_add(struct radeon_= device *rdev, } =20 /** - * radeon_vm_clear_bo - initially clear the page dir/table - * - * @rdev: radeon_device pointer - * @bo: bo to clear - */ -static int radeon_vm_clear_bo(struct radeon_device *rdev, - struct radeon_bo *bo) -{ - struct ttm_validate_buffer tv; - struct ww_acquire_ctx ticket; - struct list_head head; - struct radeon_ib ib; - unsigned entries; - uint64_t addr; - int r; - - memset(&tv, 0, sizeof(tv)); - tv.bo =3D &bo->tbo; - - INIT_LIST_HEAD(&head); - list_add(&tv.head, &head); - - r =3D ttm_eu_reserve_buffers(&ticket, &head); - if (r) - return r; - - r =3D ttm_bo_validate(&bo->tbo, &bo->placement, true, false); - if (r) - goto error; - - addr =3D radeon_bo_gpu_offset(bo); - entries =3D radeon_bo_size(bo) / 8; - - r =3D radeon_ib_get(rdev, R600_RING_TYPE_DMA_INDEX, &ib, - NULL, entries * 2 + 64); - if (r) - goto error; - - ib.length_dw =3D 0; - - radeon_asic_vm_set_page(rdev, &ib, addr, 0, entries, 0, 0); - - r =3D radeon_ib_schedule(rdev, &ib, NULL); - if (r) - goto error; - - ttm_eu_fence_buffer_objects(&ticket, &head, ib.fence); - radeon_ib_free(rdev, &ib); - - return 0; - -error: - ttm_eu_backoff_reservation(&ticket, &head); - return r; -} - -/** * radeon_vm_bo_set_addr - set bos virtual address inside a vm * * @rdev: radeon_device pointer @@ -422,8 +519,7 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev, struct radeon_vm *vm =3D bo_va->vm; struct radeon_bo_va *tmp; struct list_head *head; - unsigned last_pfn, pt_idx; - int r; + unsigned last_pfn; =20 if (soffset) { /* make sure object fit at this offset */ @@ -474,53 +570,8 @@ int radeon_vm_bo_set_addr(struct radeon_device *rdev= , bo_va->valid =3D false; list_move(&bo_va->vm_list, head); =20 - soffset =3D (soffset / RADEON_GPU_PAGE_SIZE) >> RADEON_VM_BLOCK_SIZE; - eoffset =3D (eoffset / RADEON_GPU_PAGE_SIZE) >> RADEON_VM_BLOCK_SIZE; - - if (eoffset > vm->max_pde_used) - vm->max_pde_used =3D eoffset; - - radeon_bo_unreserve(bo_va->bo); - - /* walk over the address space and allocate the page tables */ - for (pt_idx =3D soffset; pt_idx <=3D eoffset; ++pt_idx) { - struct radeon_bo *pt; - - if (vm->page_tables[pt_idx].bo) - continue; - - /* drop mutex to allocate and clear page table */ - mutex_unlock(&vm->mutex); - - r =3D radeon_bo_create(rdev, RADEON_VM_PTE_COUNT * 8, - RADEON_GPU_PAGE_SIZE, false,=20 - RADEON_GEM_DOMAIN_VRAM, NULL, &pt); - if (r) - return r; - - r =3D radeon_vm_clear_bo(rdev, pt); - if (r) { - radeon_bo_unref(&pt); - radeon_bo_reserve(bo_va->bo, false); - return r; - } - - /* aquire mutex again */ - mutex_lock(&vm->mutex); - if (vm->page_tables[pt_idx].bo) { - /* someone else allocated the pt in the meantime */ - mutex_unlock(&vm->mutex); - radeon_bo_unref(&pt); - mutex_lock(&vm->mutex); - continue; - } - - vm->page_tables[pt_idx].addr =3D 0; - vm->page_tables[pt_idx].bo =3D pt; - } - mutex_unlock(&vm->mutex); - return radeon_bo_reserve(bo_va->bo, false); + return 0; } =20 /** @@ -580,54 +631,58 @@ static uint32_t radeon_vm_page_flags(uint32_t flags= ) * * Global and local mutex must be locked! */ -int radeon_vm_update_page_directory(struct radeon_device *rdev, - struct radeon_vm *vm) +static int radeon_vm_update_pdes(struct radeon_device *rdev, + struct radeon_vm *vm, + struct radeon_ib *ib, + uint64_t start, uint64_t end) { static const uint32_t incr =3D RADEON_VM_PTE_COUNT * 8; =20 - struct radeon_bo *pd =3D vm->page_directory; - uint64_t pd_addr =3D radeon_bo_gpu_offset(pd); uint64_t last_pde =3D ~0, last_pt =3D ~0; - unsigned count =3D 0, pt_idx, ndw; - struct radeon_ib ib; + unsigned count =3D 0; + uint64_t pt_idx; int r; =20 - /* padding, etc. */ - ndw =3D 64; - - /* assume the worst case */ - ndw +=3D vm->max_pde_used * 16; - - /* update too big for an IB */ - if (ndw > 0xfffff) - return -ENOMEM; - - r =3D radeon_ib_get(rdev, R600_RING_TYPE_DMA_INDEX, &ib, NULL, ndw * 4)= ; - if (r) - return r; - ib.length_dw =3D 0; + start =3D (start / RADEON_GPU_PAGE_SIZE) >> RADEON_VM_BLOCK_SIZE; + end =3D (end / RADEON_GPU_PAGE_SIZE) >> RADEON_VM_BLOCK_SIZE; =20 /* walk over the address space and update the page directory */ - for (pt_idx =3D 0; pt_idx <=3D vm->max_pde_used; ++pt_idx) { - struct radeon_bo *bo =3D vm->page_tables[pt_idx].bo; + for (pt_idx =3D start; pt_idx <=3D end; ++pt_idx) { uint64_t pde, pt; =20 - if (bo =3D=3D NULL) + if (vm->page_tables[pt_idx]) continue; =20 - pt =3D radeon_bo_gpu_offset(bo); - if (vm->page_tables[pt_idx].addr =3D=3D pt) - continue; - vm->page_tables[pt_idx].addr =3D pt; +retry: + r =3D radeon_sa_bo_new(rdev, &rdev->vm_manager.sa_manager, + &vm->page_tables[pt_idx], + RADEON_VM_PTE_COUNT * 8, + RADEON_GPU_PAGE_SIZE, false); + + if (r =3D=3D -ENOMEM) { + r =3D radeon_vm_evict(rdev, vm); + if (r) + return r; + goto retry; + } else if (r) { + return r; + } + + pde =3D vm->pd_gpu_addr + pt_idx * 8; + + pt =3D radeon_sa_bo_gpu_addr(vm->page_tables[pt_idx]); =20 - pde =3D pd_addr + pt_idx * 8; if (((last_pde + 8 * count) !=3D pde) || ((last_pt + incr * count) !=3D pt)) { =20 if (count) { - radeon_asic_vm_set_page(rdev, &ib, last_pde, + radeon_asic_vm_set_page(rdev, ib, last_pde, last_pt, count, incr, R600_PTE_VALID); + + count *=3D RADEON_VM_PTE_COUNT; + radeon_asic_vm_set_page(rdev, ib, last_pt, 0, + count, 0, 0); } =20 count =3D 1; @@ -638,23 +693,14 @@ int radeon_vm_update_page_directory(struct radeon_d= evice *rdev, } } =20 - if (count) - radeon_asic_vm_set_page(rdev, &ib, last_pde, last_pt, count, + if (count) { + radeon_asic_vm_set_page(rdev, ib, last_pde, last_pt, count, incr, R600_PTE_VALID); =20 - if (ib.length_dw !=3D 0) { - radeon_semaphore_sync_to(ib.semaphore, pd->tbo.sync_obj); - radeon_semaphore_sync_to(ib.semaphore, vm->last_id_use); - r =3D radeon_ib_schedule(rdev, &ib, NULL); - if (r) { - radeon_ib_free(rdev, &ib); - return r; - } - radeon_fence_unref(&vm->fence); - vm->fence =3D radeon_fence_ref(ib.fence); - radeon_fence_unref(&vm->last_flush); + count *=3D RADEON_VM_PTE_COUNT; + radeon_asic_vm_set_page(rdev, ib, last_pt, 0, + count, 0, 0); } - radeon_ib_free(rdev, &ib); =20 return 0; } @@ -691,18 +737,15 @@ static void radeon_vm_update_ptes(struct radeon_dev= ice *rdev, /* walk over the address space and update the page tables */ for (addr =3D start; addr < end; ) { uint64_t pt_idx =3D addr >> RADEON_VM_BLOCK_SIZE; - struct radeon_bo *pt =3D vm->page_tables[pt_idx].bo; unsigned nptes; uint64_t pte; =20 - radeon_semaphore_sync_to(ib->semaphore, pt->tbo.sync_obj); - if ((addr & ~mask) =3D=3D (end & ~mask)) nptes =3D end - addr; else nptes =3D RADEON_VM_PTE_COUNT - (addr & mask); =20 - pte =3D radeon_bo_gpu_offset(pt); + pte =3D radeon_sa_bo_gpu_addr(vm->page_tables[pt_idx]); pte +=3D (addr & mask) * 8; =20 if ((last_pte + 8 * count) !=3D pte) { @@ -743,7 +786,7 @@ static void radeon_vm_update_ptes(struct radeon_devic= e *rdev, * Fill in the page table entries for @bo (cayman+). * Returns 0 for success, -EINVAL for failure. * - * Object have to be reserved and mutex must be locked! + * Object have to be reserved & global and local mutex must be locked! */ int radeon_vm_bo_update(struct radeon_device *rdev, struct radeon_vm *vm, @@ -752,10 +795,14 @@ int radeon_vm_bo_update(struct radeon_device *rdev, { struct radeon_ib ib; struct radeon_bo_va *bo_va; - unsigned nptes, ndw; + unsigned nptes, npdes, ndw; uint64_t addr; int r; =20 + /* nothing to do if vm isn't bound */ + if (vm->page_directory =3D=3D NULL) + return 0; + bo_va =3D radeon_vm_bo_find(vm, bo); if (bo_va =3D=3D NULL) { dev_err(rdev->dev, "bo %p not in vm %p\n", bo, vm); @@ -793,6 +840,9 @@ int radeon_vm_bo_update(struct radeon_device *rdev, =20 nptes =3D radeon_bo_ngpu_pages(bo); =20 + /* assume two extra pdes in case the mapping overlaps the borders */ + npdes =3D (nptes >> RADEON_VM_BLOCK_SIZE) + 2; + /* padding, etc. */ ndw =3D 64; =20 @@ -807,6 +857,15 @@ int radeon_vm_bo_update(struct radeon_device *rdev, /* reserve space for pte addresses */ ndw +=3D nptes * 2; =20 + /* reserve space for one header for every 2k dwords */ + ndw +=3D (npdes >> 11) * 4; + + /* reserve space for pde addresses */ + ndw +=3D npdes * 2; + + /* reserve space for clearing new page tables */ + ndw +=3D npdes * 2 * RADEON_VM_PTE_COUNT; + /* update too big for an IB */ if (ndw > 0xfffff) return -ENOMEM; @@ -816,6 +875,12 @@ int radeon_vm_bo_update(struct radeon_device *rdev, return r; ib.length_dw =3D 0; =20 + r =3D radeon_vm_update_pdes(rdev, vm, &ib, bo_va->soffset, bo_va->eoffs= et); + if (r) { + radeon_ib_free(rdev, &ib); + return r; + } + radeon_vm_update_ptes(rdev, vm, &ib, bo_va->soffset, bo_va->eoffset, addr, radeon_vm_page_flags(bo_va->flags)); =20 @@ -851,10 +916,12 @@ int radeon_vm_bo_rmv(struct radeon_device *rdev, { int r =3D 0; =20 + mutex_lock(&rdev->vm_manager.lock); mutex_lock(&bo_va->vm->mutex); - if (bo_va->soffset) + if (bo_va->soffset) { r =3D radeon_vm_bo_update(rdev, bo_va->vm, bo_va->bo, NULL); - + } + mutex_unlock(&rdev->vm_manager.lock); list_del(&bo_va->vm_list); mutex_unlock(&bo_va->vm->mutex); list_del(&bo_va->bo_list); @@ -890,43 +957,15 @@ void radeon_vm_bo_invalidate(struct radeon_device *= rdev, * * Init @vm fields (cayman+). */ -int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) +void radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm) { - unsigned pd_size, pd_entries, pts_size; - int r; - vm->id =3D 0; vm->fence =3D NULL; vm->last_flush =3D NULL; vm->last_id_use =3D NULL; mutex_init(&vm->mutex); + INIT_LIST_HEAD(&vm->list); INIT_LIST_HEAD(&vm->va); - - pd_size =3D radeon_vm_directory_size(rdev); - pd_entries =3D radeon_vm_num_pdes(rdev); - - /* allocate page table array */ - pts_size =3D pd_entries * sizeof(struct radeon_vm_pt); - vm->page_tables =3D kzalloc(pts_size, GFP_KERNEL); - if (vm->page_tables =3D=3D NULL) { - DRM_ERROR("Cannot allocate memory for page table array\n"); - return -ENOMEM; - } - - r =3D radeon_bo_create(rdev, pd_size, RADEON_VM_PTB_ALIGN_SIZE, false, - RADEON_GEM_DOMAIN_VRAM, NULL, - &vm->page_directory); - if (r) - return r; - - r =3D radeon_vm_clear_bo(rdev, vm->page_directory); - if (r) { - radeon_bo_unref(&vm->page_directory); - vm->page_directory =3D NULL; - return r; - } - - return 0; } =20 /** @@ -941,7 +980,12 @@ int radeon_vm_init(struct radeon_device *rdev, struc= t radeon_vm *vm) void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm) { struct radeon_bo_va *bo_va, *tmp; - int i, r; + int r; + + mutex_lock(&rdev->vm_manager.lock); + mutex_lock(&vm->mutex); + radeon_vm_free_pt(rdev, vm); + mutex_unlock(&rdev->vm_manager.lock); =20 if (!list_empty(&vm->va)) { dev_err(rdev->dev, "still active bo inside vm\n"); @@ -955,17 +999,8 @@ void radeon_vm_fini(struct radeon_device *rdev, stru= ct radeon_vm *vm) kfree(bo_va); } } - - - for (i =3D 0; i < radeon_vm_num_pdes(rdev); i++) - radeon_bo_unref(&vm->page_tables[i].bo); - kfree(vm->page_tables); - - radeon_bo_unref(&vm->page_directory); - radeon_fence_unref(&vm->fence); radeon_fence_unref(&vm->last_flush); radeon_fence_unref(&vm->last_id_use); - - mutex_destroy(&vm->mutex); + mutex_unlock(&vm->mutex); } --=20 1.9.1 --------------070807030607040809050100 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel --------------070807030607040809050100--