* [PATCH v2] Visible VRAM Management Improvements @ 2017-06-28 2:33 John Brooks 2017-06-28 2:33 ` [PATCH 1/5] drm/amdgpu: Add vis_vramlimit module parameter John Brooks ` (4 more replies) 0 siblings, 5 replies; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Michel Dänzer, Marek Olšák, Christian König Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Changes in this version: - Dropped former patch 1 ("Separate placements and busy placements") as it was no longer necessary - Dropped former patch 4 ("Don't force BOs into visible VRAM if they can go to GTT instead") as it was unnecessary and had too many side effects (Thanks Christian) - Dropped former patches 8 and 9 ("Asynchronously move BOs to visible VRAM" and an associated optimization), as there may be better ways to address the root of the problem - Added a u64 cast to patch 1 (Thanks Christian) - Optimized patch 2 in the case where visible VRAM is not smaller than total VRAM (Thanks Michel) The series as it is now fixes Dying Light and improves DiRT Rally frame times. Without the asynchronous moves, there is still occasional stalling in DiRT Rally. The "Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS" patch received objections, but I have not removed it yet as there seems to be some support for the idea. The precise details of when to set and clear the flag may need further discussion. I note that if the patch is removed (but retaining the preventative measures against repeated moves back and forth between GTT and invisible VRAM), there are very few evictions but the average framerate drops from ~89 to ~82. Presumably we'll see either higher memory presure or lower framerates depending on whether userspace over- or under-estimates when to set this flag. What we do in the kernel may depend on which side of this tradeoff we want to end up on for now. -- John Brooks (Frogging101) _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 1/5] drm/amdgpu: Add vis_vramlimit module parameter 2017-06-28 2:33 [PATCH v2] Visible VRAM Management Improvements John Brooks @ 2017-06-28 2:33 ` John Brooks 2017-06-28 2:33 ` [PATCH 2/5] drm/amdgpu: Throttle visible VRAM moves separately John Brooks ` (3 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx, Michel Dänzer, Marek Olšák, Christian König Cc: dri-devel Allow specifying a limit on visible VRAM via a module parameter. This is helpful for testing performance under visible VRAM pressure. v2: Add cast to 64-bit (Christian König) Signed-off-by: John Brooks <john@fastquake.com> Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 8 ++++++++ 3 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 12d61ed..06ed45c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -74,6 +74,7 @@ */ extern int amdgpu_modeset; extern int amdgpu_vram_limit; +extern int amdgpu_vis_vram_limit; extern int amdgpu_gart_size; extern int amdgpu_moverate; extern int amdgpu_benchmarking; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 4c7c262..a14f458 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -73,6 +73,7 @@ #define KMS_DRIVER_PATCHLEVEL 0 int amdgpu_vram_limit = 0; +int amdgpu_vis_vram_limit = 0; int amdgpu_gart_size = -1; /* auto */ int amdgpu_moverate = -1; /* auto */ int amdgpu_benchmarking = 0; @@ -119,6 +120,9 @@ int amdgpu_lbpw = -1; MODULE_PARM_DESC(vramlimit, "Restrict VRAM for testing, in megabytes"); module_param_named(vramlimit, amdgpu_vram_limit, int, 0600); +MODULE_PARM_DESC(vis_vramlimit, "Restrict visible VRAM for testing, in megabytes"); +module_param_named(vis_vramlimit, amdgpu_vis_vram_limit, int, 0444); + MODULE_PARM_DESC(gartsize, "Size of PCIE/IGP gart to setup in megabytes (32, 64, etc., -1 = auto)"); module_param_named(gartsize, amdgpu_gart_size, int, 0600); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index c9b131b..7d14ae1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1095,6 +1095,7 @@ static struct ttm_bo_driver amdgpu_bo_driver = { int amdgpu_ttm_init(struct amdgpu_device *adev) { int r; + u64 vis_vram_limit; r = amdgpu_ttm_global_init(adev); if (r) { @@ -1118,6 +1119,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) DRM_ERROR("Failed initializing VRAM heap.\n"); return r; } + + /* Reduce size of CPU-visible VRAM if requested */ + vis_vram_limit = (u64)amdgpu_vis_vram_limit * 1024 * 1024; + if (amdgpu_vis_vram_limit > 0 && + vis_vram_limit <= adev->mc.visible_vram_size) + adev->mc.visible_vram_size = vis_vram_limit; + /* Change the size here instead of the init above so only lpfn is affected */ amdgpu_ttm_set_active_vram_size(adev, adev->mc.visible_vram_size); -- 2.7.4 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 2/5] drm/amdgpu: Throttle visible VRAM moves separately 2017-06-28 2:33 [PATCH v2] Visible VRAM Management Improvements John Brooks 2017-06-28 2:33 ` [PATCH 1/5] drm/amdgpu: Add vis_vramlimit module parameter John Brooks @ 2017-06-28 2:33 ` John Brooks 2017-06-28 2:33 ` [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo John Brooks ` (2 subsequent siblings) 4 siblings, 0 replies; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx, Michel Dänzer, Marek Olšák, Christian König Cc: dri-devel The BO move throttling code is designed to allow VRAM to fill quickly if it is relatively empty. However, this does not take into account situations where the visible VRAM is smaller than total VRAM, and total VRAM may not be close to full but the visible VRAM segment is under pressure. In such situations, visible VRAM would experience unrestricted swapping and performance would drop. Add a separate counter specifically for moves involving visible VRAM, and check it before moving BOs there. v2: Only perform calculations for separate counter if visible VRAM is smaller than total VRAM. (Michel Dänzer) Fixes: 95844d20ae02 (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2)) Signed-off-by: John Brooks <john@fastquake.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 84 +++++++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 10 ++-- 3 files changed, 78 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 06ed45c..7366115 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1157,7 +1157,9 @@ struct amdgpu_cs_parser { struct list_head validated; struct dma_fence *fence; uint64_t bytes_moved_threshold; + uint64_t bytes_moved_vis_threshold; uint64_t bytes_moved; + uint64_t bytes_moved_vis; struct amdgpu_bo_list_entry *evictable; /* user fence */ @@ -1591,6 +1593,7 @@ struct amdgpu_device { spinlock_t lock; s64 last_update_us; s64 accum_us; /* accumulated microseconds */ + s64 accum_us_vis; /* for visible VRAM */ u32 log2_max_MBps; } mm_stats; @@ -1926,7 +1929,8 @@ bool amdgpu_need_post(struct amdgpu_device *adev); void amdgpu_update_display_priority(struct amdgpu_device *adev); int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data); -void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes); +void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, + u64 num_vis_bytes); void amdgpu_ttm_placement_from_domain(struct amdgpu_bo *abo, u32 domain); bool amdgpu_ttm_bo_is_amdgpu_bo(struct ttm_buffer_object *bo); int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index aeee684..1dfa847 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -223,11 +223,13 @@ static s64 bytes_to_us(struct amdgpu_device *adev, u64 bytes) * ticks. The accumulated microseconds (us) are converted to bytes and * returned. */ -static u64 amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev) +static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, + u64 *max_bytes, + u64 *max_vis_bytes) { s64 time_us, increment_us; - u64 max_bytes; u64 free_vram, total_vram, used_vram; + u64 free_vis_vram = 0, total_vis_vram = 0, used_vis_vram = 0; /* Allow a maximum of 200 accumulated ms. This is basically per-IB * throttling. @@ -238,13 +240,23 @@ static u64 amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev) */ const s64 us_upper_bound = 200000; - if (!adev->mm_stats.log2_max_MBps) - return 0; + if (!adev->mm_stats.log2_max_MBps) { + *max_bytes = 0; + *max_vis_bytes = 0; + return; + } total_vram = adev->mc.real_vram_size - adev->vram_pin_size; used_vram = atomic64_read(&adev->vram_usage); free_vram = used_vram >= total_vram ? 0 : total_vram - used_vram; + if (adev->mc.visible_vram_size < adev->mc.real_vram_size) { + total_vis_vram = adev->mc.visible_vram_size; + used_vis_vram = atomic64_read(&adev->vram_vis_usage); + free_vis_vram = used_vis_vram >= total_vis_vram ? + 0 : total_vis_vram - used_vis_vram; + } + spin_lock(&adev->mm_stats.lock); /* Increase the amount of accumulated us. */ @@ -252,7 +264,11 @@ static u64 amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev) increment_us = time_us - adev->mm_stats.last_update_us; adev->mm_stats.last_update_us = time_us; adev->mm_stats.accum_us = min(adev->mm_stats.accum_us + increment_us, - us_upper_bound); + us_upper_bound); + if (adev->mc.visible_vram_size < adev->mc.real_vram_size) { + adev->mm_stats.accum_us_vis = min(adev->mm_stats.accum_us_vis + + increment_us, us_upper_bound); + } /* This prevents the short period of low performance when the VRAM * usage is low and the driver is in debt or doesn't have enough @@ -280,23 +296,36 @@ static u64 amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev) adev->mm_stats.accum_us = max(min_us, adev->mm_stats.accum_us); } - /* This returns 0 if the driver is in debt to disallow (optional) + /* Do the same for visible VRAM if half of it is free */ + if (adev->mc.visible_vram_size < adev->mc.real_vram_size && + free_vis_vram >= total_vis_vram / 2) { + adev->mm_stats.accum_us_vis = max(bytes_to_us(adev, + free_vis_vram / 2), + adev->mm_stats.accum_us_vis); + } + + /* This is set to 0 if the driver is in debt to disallow (optional) * buffer moves. */ - max_bytes = us_to_bytes(adev, adev->mm_stats.accum_us); + *max_bytes = us_to_bytes(adev, adev->mm_stats.accum_us); + if (adev->mc.visible_vram_size < adev->mc.real_vram_size) + *max_vis_bytes = us_to_bytes(adev, adev->mm_stats.accum_us_vis); + else + *max_vis_bytes = 0; spin_unlock(&adev->mm_stats.lock); - return max_bytes; } /* Report how many bytes have really been moved for the last command * submission. This can result in a debt that can stop buffer migrations * temporarily. */ -void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes) +void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64 num_bytes, + u64 num_vis_bytes) { spin_lock(&adev->mm_stats.lock); adev->mm_stats.accum_us -= bytes_to_us(adev, num_bytes); + adev->mm_stats.accum_us_vis -= bytes_to_us(adev, num_vis_bytes); spin_unlock(&adev->mm_stats.lock); } @@ -304,7 +333,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, struct amdgpu_bo *bo) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); - u64 initial_bytes_moved; + u64 initial_bytes_moved, bytes_moved; uint32_t domain; int r; @@ -314,17 +343,34 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, /* Don't move this buffer if we have depleted our allowance * to move it. Don't move anything if the threshold is zero. */ - if (p->bytes_moved < p->bytes_moved_threshold) - domain = bo->prefered_domains; - else + if (p->bytes_moved < p->bytes_moved_threshold) { + if (adev->mc.visible_vram_size < adev->mc.real_vram_size && + (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) { + /* And don't move a CPU_ACCESS_REQUIRED BO to limited + * visible VRAM if we've depleted our allowance to do + * that. + */ + if (p->bytes_moved_vis < p->bytes_moved_vis_threshold) + domain = bo->prefered_domains; + else + domain = bo->allowed_domains; + } else { + domain = bo->prefered_domains; + } + } else { domain = bo->allowed_domains; + } retry: amdgpu_ttm_placement_from_domain(bo, domain); initial_bytes_moved = atomic64_read(&adev->num_bytes_moved); r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); - p->bytes_moved += atomic64_read(&adev->num_bytes_moved) - - initial_bytes_moved; + bytes_moved = atomic64_read(&adev->num_bytes_moved) - + initial_bytes_moved; + p->bytes_moved += bytes_moved; + if (adev->mc.visible_vram_size < adev->mc.real_vram_size && + bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) + p->bytes_moved_vis += bytes_moved; if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) { domain = bo->allowed_domains; @@ -554,8 +600,10 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, list_splice(&need_pages, &p->validated); } - p->bytes_moved_threshold = amdgpu_cs_get_threshold_for_moves(p->adev); + amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold, + &p->bytes_moved_vis_threshold); p->bytes_moved = 0; + p->bytes_moved_vis = 0; p->evictable = list_last_entry(&p->validated, struct amdgpu_bo_list_entry, tv.head); @@ -579,8 +627,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p, goto error_validate; } - amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved); - + amdgpu_cs_report_moved_bytes(p->adev, p->bytes_moved, + p->bytes_moved_vis); fpriv->vm.last_eviction_counter = atomic64_read(&p->adev->num_evictions); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 8ee6965..dcf1ddb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -322,7 +322,7 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev, struct amdgpu_bo *bo; enum ttm_bo_type type; unsigned long page_align; - u64 initial_bytes_moved; + u64 initial_bytes_moved, bytes_moved; size_t acc_size; int r; @@ -398,8 +398,12 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev, r = ttm_bo_init_reserved(&adev->mman.bdev, &bo->tbo, size, type, &bo->placement, page_align, !kernel, NULL, acc_size, sg, resv, &amdgpu_ttm_bo_destroy); - amdgpu_cs_report_moved_bytes(adev, - atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved); + bytes_moved = atomic64_read(&adev->num_bytes_moved) - + initial_bytes_moved; + if (bo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) + amdgpu_cs_report_moved_bytes(adev, bytes_moved, bytes_moved); + else + amdgpu_cs_report_moved_bytes(adev, bytes_moved, 0); if (unlikely(r != 0)) return r; -- 2.7.4 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo 2017-06-28 2:33 [PATCH v2] Visible VRAM Management Improvements John Brooks 2017-06-28 2:33 ` [PATCH 1/5] drm/amdgpu: Add vis_vramlimit module parameter John Brooks 2017-06-28 2:33 ` [PATCH 2/5] drm/amdgpu: Throttle visible VRAM moves separately John Brooks @ 2017-06-28 2:33 ` John Brooks 2017-06-28 13:06 ` Christian König [not found] ` <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-29 2:33 ` [PATCH v2] Visible VRAM Management Improvements Michel Dänzer 4 siblings, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx, Michel Dänzer, Marek Olšák, Christian König Cc: dri-devel Signed-off-by: John Brooks <john@fastquake.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ 3 files changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index 7366115..34c293a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -428,6 +428,9 @@ struct amdgpu_bo { void *metadata; u32 metadata_size; unsigned prime_shared_count; + unsigned long last_page_fault_jiffies; + unsigned long last_cs_move_jiffies; + /* list of all virtual address to which this bo * is associated to */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 1dfa847..071b592 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -335,6 +335,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); u64 initial_bytes_moved, bytes_moved; uint32_t domain; + uint32_t old_mem; int r; if (bo->pin_count) @@ -364,6 +365,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, retry: amdgpu_ttm_placement_from_domain(bo, domain); initial_bytes_moved = atomic64_read(&adev->num_bytes_moved); + old_mem = bo->tbo.mem.mem_type; r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); bytes_moved = atomic64_read(&adev->num_bytes_moved) - initial_bytes_moved; @@ -377,6 +379,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, goto retry; } + if (bo->tbo.mem.mem_type != old_mem) + bo->last_cs_move_jiffies = jiffies; + return r; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index dcf1ddb..b71775c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -953,6 +953,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) if (bo->mem.mem_type != TTM_PL_VRAM) return 0; + abo->last_page_fault_jiffies = jiffies; + size = bo->mem.num_pages << PAGE_SHIFT; offset = bo->mem.start << PAGE_SHIFT; /* TODO: figure out how to map scattered VRAM to the CPU */ -- 2.7.4 _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo 2017-06-28 2:33 ` [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo John Brooks @ 2017-06-28 13:06 ` Christian König [not found] ` <e3d7be62-e63b-f370-f159-147aaf8d7c50-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Christian König @ 2017-06-28 13:06 UTC (permalink / raw) To: John Brooks, amd-gfx, Michel Dänzer, Marek Olšák; +Cc: dri-devel Am 28.06.2017 um 04:33 schrieb John Brooks: > Signed-off-by: John Brooks <john@fastquake.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ > 3 files changed, 10 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index 7366115..34c293a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -428,6 +428,9 @@ struct amdgpu_bo { > void *metadata; > u32 metadata_size; > unsigned prime_shared_count; > + unsigned long last_page_fault_jiffies; > + unsigned long last_cs_move_jiffies; Please use jiffies64 here, apart from that the patch looks good to me. Christian. > + > /* list of all virtual address to which this bo > * is associated to > */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 1dfa847..071b592 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -335,6 +335,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > u64 initial_bytes_moved, bytes_moved; > uint32_t domain; > + uint32_t old_mem; > int r; > > if (bo->pin_count) > @@ -364,6 +365,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > retry: > amdgpu_ttm_placement_from_domain(bo, domain); > initial_bytes_moved = atomic64_read(&adev->num_bytes_moved); > + old_mem = bo->tbo.mem.mem_type; > r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); > bytes_moved = atomic64_read(&adev->num_bytes_moved) - > initial_bytes_moved; > @@ -377,6 +379,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > goto retry; > } > > + if (bo->tbo.mem.mem_type != old_mem) > + bo->last_cs_move_jiffies = jiffies; > + > return r; > } > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index dcf1ddb..b71775c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -953,6 +953,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > if (bo->mem.mem_type != TTM_PL_VRAM) > return 0; > > + abo->last_page_fault_jiffies = jiffies; > + > size = bo->mem.num_pages << PAGE_SHIFT; > offset = bo->mem.start << PAGE_SHIFT; > /* TODO: figure out how to map scattered VRAM to the CPU */ _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <e3d7be62-e63b-f370-f159-147aaf8d7c50-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo [not found] ` <e3d7be62-e63b-f370-f159-147aaf8d7c50-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-06-28 22:59 ` John Brooks 2017-06-29 8:11 ` Christian König 0 siblings, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-28 22:59 UTC (permalink / raw) To: Christian König Cc: David Airlie, Michel Dänzer, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Marek Olšák, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On Wed, Jun 28, 2017 at 03:06:47PM +0200, Christian König wrote: > Am 28.06.2017 um 04:33 schrieb John Brooks: > >Signed-off-by: John Brooks <john@fastquake.com> > >--- > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ > > 3 files changed, 10 insertions(+) > > > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >index 7366115..34c293a 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > >@@ -428,6 +428,9 @@ struct amdgpu_bo { > > void *metadata; > > u32 metadata_size; > > unsigned prime_shared_count; > >+ unsigned long last_page_fault_jiffies; > >+ unsigned long last_cs_move_jiffies; > > Please use jiffies64 here, apart from that the patch looks good to me. > > Christian. > I'm not sure I understand. Do you mean change these variables to u64 and use get_jiffies_64() instead of the plain jiffies variable below? I believe jiffies_64 can be slower than jiffies also. John > >+ > > /* list of all virtual address to which this bo > > * is associated to > > */ > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >index 1dfa847..071b592 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >@@ -335,6 +335,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); > > u64 initial_bytes_moved, bytes_moved; > > uint32_t domain; > >+ uint32_t old_mem; > > int r; > > if (bo->pin_count) > >@@ -364,6 +365,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > > retry: > > amdgpu_ttm_placement_from_domain(bo, domain); > > initial_bytes_moved = atomic64_read(&adev->num_bytes_moved); > >+ old_mem = bo->tbo.mem.mem_type; > > r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); > > bytes_moved = atomic64_read(&adev->num_bytes_moved) - > > initial_bytes_moved; > >@@ -377,6 +379,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > > goto retry; > > } > >+ if (bo->tbo.mem.mem_type != old_mem) > >+ bo->last_cs_move_jiffies = jiffies; > >+ > > return r; > > } > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >index dcf1ddb..b71775c 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >@@ -953,6 +953,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > > if (bo->mem.mem_type != TTM_PL_VRAM) > > return 0; > >+ abo->last_page_fault_jiffies = jiffies; > >+ > > size = bo->mem.num_pages << PAGE_SHIFT; > > offset = bo->mem.start << PAGE_SHIFT; > > /* TODO: figure out how to map scattered VRAM to the CPU */ > > _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo 2017-06-28 22:59 ` John Brooks @ 2017-06-29 8:11 ` Christian König 0 siblings, 0 replies; 25+ messages in thread From: Christian König @ 2017-06-29 8:11 UTC (permalink / raw) To: John Brooks; +Cc: Michel Dänzer, dri-devel, amd-gfx Am 29.06.2017 um 00:59 schrieb John Brooks: > On Wed, Jun 28, 2017 at 03:06:47PM +0200, Christian König wrote: >> Am 28.06.2017 um 04:33 schrieb John Brooks: >>> Signed-off-by: John Brooks <john@fastquake.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 5 +++++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 ++ >>> 3 files changed, 10 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index 7366115..34c293a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -428,6 +428,9 @@ struct amdgpu_bo { >>> void *metadata; >>> u32 metadata_size; >>> unsigned prime_shared_count; >>> + unsigned long last_page_fault_jiffies; >>> + unsigned long last_cs_move_jiffies; >> Please use jiffies64 here, apart from that the patch looks good to me. >> >> Christian. >> > I'm not sure I understand. Do you mean change these variables to u64 and use > get_jiffies_64() instead of the plain jiffies variable below? Yes, exactly. > I believe jiffies_64 can be slower than jiffies also. Yeah, but it doesn't matter on 64bit systems and they don't wrap around every 49 days on 32bit systems :) Christian. > > John > >>> + >>> /* list of all virtual address to which this bo >>> * is associated to >>> */ >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> index 1dfa847..071b592 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >>> @@ -335,6 +335,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, >>> struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); >>> u64 initial_bytes_moved, bytes_moved; >>> uint32_t domain; >>> + uint32_t old_mem; >>> int r; >>> if (bo->pin_count) >>> @@ -364,6 +365,7 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, >>> retry: >>> amdgpu_ttm_placement_from_domain(bo, domain); >>> initial_bytes_moved = atomic64_read(&adev->num_bytes_moved); >>> + old_mem = bo->tbo.mem.mem_type; >>> r = ttm_bo_validate(&bo->tbo, &bo->placement, true, false); >>> bytes_moved = atomic64_read(&adev->num_bytes_moved) - >>> initial_bytes_moved; >>> @@ -377,6 +379,9 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, >>> goto retry; >>> } >>> + if (bo->tbo.mem.mem_type != old_mem) >>> + bo->last_cs_move_jiffies = jiffies; >>> + >>> return r; >>> } >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> index dcf1ddb..b71775c 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c >>> @@ -953,6 +953,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) >>> if (bo->mem.mem_type != TTM_PL_VRAM) >>> return 0; >>> + abo->last_page_fault_jiffies = jiffies; >>> + >>> size = bo->mem.num_pages << PAGE_SHIFT; >>> offset = bo->mem.start << PAGE_SHIFT; >>> /* TODO: figure out how to map scattered VRAM to the CPU */ >> _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>]
* [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> @ 2017-06-28 2:33 ` John Brooks [not found] ` <1498617201-24557-5-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-28 2:33 ` [PATCH 5/5] drm/amdgpu: Don't force BOs into visible VRAM for page faults John Brooks 1 sibling, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Michel Dänzer, Marek Olšák, Christian König Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, it should only be treated as a hint to initially place a BO somewhere CPU accessible, rather than having a permanent effect on BO placement. Instead of the flag being set in stone at BO creation, set the flag when a page fault occurs so that it goes somewhere CPU-visible, and clear it when the BO is requested by the GPU. However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to invisible VRAM, where they may promptly generate another page fault. When BOs are constantly moved back and forth like this, it is highly detrimental to performance. Only clear the flag on CS if: - The BO wasn't page faulted for a certain amount of time (currently 10 seconds), and - its last page fault didn't occur too soon (currently 500ms) after its last CS request, or vice versa. Setting the flag in amdgpu_fault_reserve_notify() also means that we can remove the loop to restrict lpfn to the end of visible VRAM, because amdgpu_ttm_placement_init() will do it for us. Signed-off-by: John Brooks <john@fastquake.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 45 +++++++++++++++++++++++------- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + 3 files changed, 38 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 071b592..1ac3c2e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -341,6 +341,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, if (bo->pin_count) return 0; + amdgpu_bo_clear_cpu_access_required(bo); + /* Don't move this buffer if we have depleted our allowance * to move it. Don't move anything if the threshold is zero. */ diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index b71775c..658d7b1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -943,8 +943,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) { struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); struct amdgpu_bo *abo; - unsigned long offset, size, lpfn; - int i, r; + unsigned long offset, size; + int r; if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) return 0; @@ -967,15 +967,9 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) /* hurrah the memory is not visible ! */ atomic64_inc(&adev->num_vram_cpu_page_faults); + abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM); - lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT; - for (i = 0; i < abo->placement.num_placement; i++) { - /* Force into visible VRAM */ - if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) && - (!abo->placements[i].lpfn || - abo->placements[i].lpfn > lpfn)) - abo->placements[i].lpfn = lpfn; - } + r = ttm_bo_validate(bo, &abo->placement, false, false); if (unlikely(r == -ENOMEM)) { amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT); @@ -1033,3 +1027,34 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo) return bo->tbo.offset; } + +/** + * amdgpu_bo_clear_cpu_access_required + * @bo: BO to update + * + * Clears CPU_ACCESS_REQUIRED flag if the BO hasn't had a page fault in a while + * and it didn't have a page fault too soon after the last time it was moved to + * VRAM. + * + * Caller should have bo reserved. + * + */ +void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo) +{ + const unsigned int page_fault_timeout_ms = 10000; + const unsigned int min_period_ms = 500; + unsigned int ms_since_pf, period_ms; + + ms_since_pf = jiffies_to_msecs(jiffies - bo->last_page_fault_jiffies); + period_ms = jiffies_to_msecs(abs(bo->last_page_fault_jiffies - + bo->last_cs_move_jiffies)); + + /* + * Try to avoid a revolving door between GTT and VRAM. Clearing the + * flag may move this BO back to VRAM, so don't clear it if it's likely + * to page fault and go right back to GTT. + */ + if ((!bo->last_page_fault_jiffies || !bo->last_cs_move_jiffies) || + (ms_since_pf > page_fault_timeout_ms && period_ms > min_period_ms)) + bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h index 3824851..b0cb137 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h @@ -182,6 +182,7 @@ int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev, struct reservation_object *resv, struct dma_fence **fence, bool direct); +void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo); /* -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <1498617201-24557-5-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <1498617201-24557-5-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> @ 2017-06-28 13:05 ` Christian König 2017-06-28 23:26 ` John Brooks 2017-06-29 3:18 ` Michel Dänzer 1 sibling, 1 reply; 25+ messages in thread From: Christian König @ 2017-06-28 13:05 UTC (permalink / raw) To: John Brooks, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Michel Dänzer, Marek Olšák Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 28.06.2017 um 04:33 schrieb John Brooks: > When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, > it should only be treated as a hint to initially place a BO somewhere CPU > accessible, rather than having a permanent effect on BO placement. > > Instead of the flag being set in stone at BO creation, set the flag when a > page fault occurs so that it goes somewhere CPU-visible, and clear it when > the BO is requested by the GPU. > > However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > invisible VRAM, where they may promptly generate another page fault. When > BOs are constantly moved back and forth like this, it is highly detrimental > to performance. Only clear the flag on CS if: > > - The BO wasn't page faulted for a certain amount of time (currently 10 > seconds), and > - its last page fault didn't occur too soon (currently 500ms) after its > last CS request, or vice versa. > > Setting the flag in amdgpu_fault_reserve_notify() also means that we can > remove the loop to restrict lpfn to the end of visible VRAM, because > amdgpu_ttm_placement_init() will do it for us. I'm fine with the general approach, but I'm still absolutely not keen about clearing the flag when userspace has originally specified it. Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or something like this. Regards, Christian. > > Signed-off-by: John Brooks <john@fastquake.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 45 +++++++++++++++++++++++------- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + > 3 files changed, 38 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > index 071b592..1ac3c2e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > @@ -341,6 +341,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > if (bo->pin_count) > return 0; > > + amdgpu_bo_clear_cpu_access_required(bo); > + > /* Don't move this buffer if we have depleted our allowance > * to move it. Don't move anything if the threshold is zero. > */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index b71775c..658d7b1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -943,8 +943,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > { > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); > struct amdgpu_bo *abo; > - unsigned long offset, size, lpfn; > - int i, r; > + unsigned long offset, size; > + int r; > > if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) > return 0; > @@ -967,15 +967,9 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > > /* hurrah the memory is not visible ! */ > atomic64_inc(&adev->num_vram_cpu_page_faults); > + abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM); > - lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT; > - for (i = 0; i < abo->placement.num_placement; i++) { > - /* Force into visible VRAM */ > - if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) && > - (!abo->placements[i].lpfn || > - abo->placements[i].lpfn > lpfn)) > - abo->placements[i].lpfn = lpfn; > - } > + > r = ttm_bo_validate(bo, &abo->placement, false, false); > if (unlikely(r == -ENOMEM)) { > amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT); > @@ -1033,3 +1027,34 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo) > > return bo->tbo.offset; > } > + > +/** > + * amdgpu_bo_clear_cpu_access_required > + * @bo: BO to update > + * > + * Clears CPU_ACCESS_REQUIRED flag if the BO hasn't had a page fault in a while > + * and it didn't have a page fault too soon after the last time it was moved to > + * VRAM. > + * > + * Caller should have bo reserved. > + * > + */ > +void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo) > +{ > + const unsigned int page_fault_timeout_ms = 10000; > + const unsigned int min_period_ms = 500; > + unsigned int ms_since_pf, period_ms; > + > + ms_since_pf = jiffies_to_msecs(jiffies - bo->last_page_fault_jiffies); > + period_ms = jiffies_to_msecs(abs(bo->last_page_fault_jiffies - > + bo->last_cs_move_jiffies)); > + > + /* > + * Try to avoid a revolving door between GTT and VRAM. Clearing the > + * flag may move this BO back to VRAM, so don't clear it if it's likely > + * to page fault and go right back to GTT. > + */ > + if ((!bo->last_page_fault_jiffies || !bo->last_cs_move_jiffies) || > + (ms_since_pf > page_fault_timeout_ms && period_ms > min_period_ms)) > + bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > +} > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > index 3824851..b0cb137 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > @@ -182,6 +182,7 @@ int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev, > struct reservation_object *resv, > struct dma_fence **fence, > bool direct); > +void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo); > > > /* _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-28 13:05 ` Christian König @ 2017-06-28 23:26 ` John Brooks [not found] ` <20170628232639.GB11762-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-28 23:26 UTC (permalink / raw) To: Christian König; +Cc: Michel Dänzer, dri-devel, amd-gfx On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: > Am 28.06.2017 um 04:33 schrieb John Brooks: > >When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, > >it should only be treated as a hint to initially place a BO somewhere CPU > >accessible, rather than having a permanent effect on BO placement. > > > >Instead of the flag being set in stone at BO creation, set the flag when a > >page fault occurs so that it goes somewhere CPU-visible, and clear it when > >the BO is requested by the GPU. > > > >However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > >invisible VRAM, where they may promptly generate another page fault. When > >BOs are constantly moved back and forth like this, it is highly detrimental > >to performance. Only clear the flag on CS if: > > > >- The BO wasn't page faulted for a certain amount of time (currently 10 > > seconds), and > >- its last page fault didn't occur too soon (currently 500ms) after its > > last CS request, or vice versa. > > > >Setting the flag in amdgpu_fault_reserve_notify() also means that we can > >remove the loop to restrict lpfn to the end of visible VRAM, because > >amdgpu_ttm_placement_init() will do it for us. > > I'm fine with the general approach, but I'm still absolutely not keen about > clearing the flag when userspace has originally specified it. > > Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or > something like this. > > Regards, > Christian. Is it the fact that we clear a flag that userspace expects not to have changed if it queries it later? I think that's the only effect of this that's directly visible to userspace code. Would it be permissible to change the handling of the flag without clearing it? As for a new "hint" flag, I assume this new flag would be an alternative to the existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use it in situations where the kernel *should* place a BO somewhere CPU-accessible, but is permitted to move it elsewhere. Is that correct? John > > > > >Signed-off-by: John Brooks <john@fastquake.com> > >--- > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 ++ > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 45 +++++++++++++++++++++++------- > > drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 + > > 3 files changed, 38 insertions(+), 10 deletions(-) > > > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >index 071b592..1ac3c2e 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >@@ -341,6 +341,8 @@ static int amdgpu_cs_bo_validate(struct amdgpu_cs_parser *p, > > if (bo->pin_count) > > return 0; > >+ amdgpu_bo_clear_cpu_access_required(bo); > >+ > > /* Don't move this buffer if we have depleted our allowance > > * to move it. Don't move anything if the threshold is zero. > > */ > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >index b71775c..658d7b1 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > >@@ -943,8 +943,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > > { > > struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev); > > struct amdgpu_bo *abo; > >- unsigned long offset, size, lpfn; > >- int i, r; > >+ unsigned long offset, size; > >+ int r; > > if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) > > return 0; > >@@ -967,15 +967,9 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > > /* hurrah the memory is not visible ! */ > > atomic64_inc(&adev->num_vram_cpu_page_faults); > >+ abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > > amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM); > >- lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT; > >- for (i = 0; i < abo->placement.num_placement; i++) { > >- /* Force into visible VRAM */ > >- if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) && > >- (!abo->placements[i].lpfn || > >- abo->placements[i].lpfn > lpfn)) > >- abo->placements[i].lpfn = lpfn; > >- } > >+ > > r = ttm_bo_validate(bo, &abo->placement, false, false); > > if (unlikely(r == -ENOMEM)) { > > amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT); > >@@ -1033,3 +1027,34 @@ u64 amdgpu_bo_gpu_offset(struct amdgpu_bo *bo) > > return bo->tbo.offset; > > } > >+ > >+/** > >+ * amdgpu_bo_clear_cpu_access_required > >+ * @bo: BO to update > >+ * > >+ * Clears CPU_ACCESS_REQUIRED flag if the BO hasn't had a page fault in a while > >+ * and it didn't have a page fault too soon after the last time it was moved to > >+ * VRAM. > >+ * > >+ * Caller should have bo reserved. > >+ * > >+ */ > >+void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo) > >+{ > >+ const unsigned int page_fault_timeout_ms = 10000; > >+ const unsigned int min_period_ms = 500; > >+ unsigned int ms_since_pf, period_ms; > >+ > >+ ms_since_pf = jiffies_to_msecs(jiffies - bo->last_page_fault_jiffies); > >+ period_ms = jiffies_to_msecs(abs(bo->last_page_fault_jiffies - > >+ bo->last_cs_move_jiffies)); > >+ > >+ /* > >+ * Try to avoid a revolving door between GTT and VRAM. Clearing the > >+ * flag may move this BO back to VRAM, so don't clear it if it's likely > >+ * to page fault and go right back to GTT. > >+ */ > >+ if ((!bo->last_page_fault_jiffies || !bo->last_cs_move_jiffies) || > >+ (ms_since_pf > page_fault_timeout_ms && period_ms > min_period_ms)) > >+ bo->flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > >+} > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > >index 3824851..b0cb137 100644 > >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h > >@@ -182,6 +182,7 @@ int amdgpu_bo_restore_from_shadow(struct amdgpu_device *adev, > > struct reservation_object *resv, > > struct dma_fence **fence, > > bool direct); > >+void amdgpu_bo_clear_cpu_access_required(struct amdgpu_bo *bo); > > /* > > _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20170628232639.GB11762-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <20170628232639.GB11762-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> @ 2017-06-29 2:35 ` Michel Dänzer 2017-06-29 8:23 ` Christian König 2017-06-30 1:56 ` John Brooks 0 siblings, 2 replies; 25+ messages in thread From: Michel Dänzer @ 2017-06-29 2:35 UTC (permalink / raw) To: John Brooks, Christian König Cc: David Airlie, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Marek Olšák, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 29/06/17 08:26 AM, John Brooks wrote: > On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >> Am 28.06.2017 um 04:33 schrieb John Brooks: >>> When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, >>> it should only be treated as a hint to initially place a BO somewhere CPU >>> accessible, rather than having a permanent effect on BO placement. >>> >>> Instead of the flag being set in stone at BO creation, set the flag when a >>> page fault occurs so that it goes somewhere CPU-visible, and clear it when >>> the BO is requested by the GPU. >>> >>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>> invisible VRAM, where they may promptly generate another page fault. When >>> BOs are constantly moved back and forth like this, it is highly detrimental >>> to performance. Only clear the flag on CS if: >>> >>> - The BO wasn't page faulted for a certain amount of time (currently 10 >>> seconds), and >>> - its last page fault didn't occur too soon (currently 500ms) after its >>> last CS request, or vice versa. >>> >>> Setting the flag in amdgpu_fault_reserve_notify() also means that we can >>> remove the loop to restrict lpfn to the end of visible VRAM, because >>> amdgpu_ttm_placement_init() will do it for us. >> >> I'm fine with the general approach, but I'm still absolutely not keen about >> clearing the flag when userspace has originally specified it. Is there any specific concern you have about that? >> Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or >> something like this. > > Is it the fact that we clear a flag that userspace expects not to have changed > if it queries it later? I think that's the only effect of this that's directly > visible to userspace code. I don't see any way for userspace to query that. > As for a new "hint" flag, I assume this new flag would be an alternative to the > existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use it in > situations where the kernel *should* place a BO somewhere CPU-accessible, but > is permitted to move it elsewhere. Is that correct? That seems silly. The userspace flag could never be more than a hint. Unfortunately we named it to suggest differently, but we have to live with that. If we do need a new hint flag internally in the driver, we should simply translate AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED to the new flag in amdgpu_gem_create_ioctl, and not expose the new flag to userspace. But other than the question in my followup to the cover letter, this patch looks good to me as is. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-29 2:35 ` Michel Dänzer @ 2017-06-29 8:23 ` Christian König 2017-06-29 9:58 ` Michel Dänzer 2017-06-30 1:56 ` John Brooks 1 sibling, 1 reply; 25+ messages in thread From: Christian König @ 2017-06-29 8:23 UTC (permalink / raw) To: Michel Dänzer, John Brooks; +Cc: amd-gfx, dri-devel Am 29.06.2017 um 04:35 schrieb Michel Dänzer: > On 29/06/17 08:26 AM, John Brooks wrote: >> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>> Am 28.06.2017 um 04:33 schrieb John Brooks: >>>> When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, >>>> it should only be treated as a hint to initially place a BO somewhere CPU >>>> accessible, rather than having a permanent effect on BO placement. >>>> >>>> Instead of the flag being set in stone at BO creation, set the flag when a >>>> page fault occurs so that it goes somewhere CPU-visible, and clear it when >>>> the BO is requested by the GPU. >>>> >>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>> invisible VRAM, where they may promptly generate another page fault. When >>>> BOs are constantly moved back and forth like this, it is highly detrimental >>>> to performance. Only clear the flag on CS if: >>>> >>>> - The BO wasn't page faulted for a certain amount of time (currently 10 >>>> seconds), and >>>> - its last page fault didn't occur too soon (currently 500ms) after its >>>> last CS request, or vice versa. >>>> >>>> Setting the flag in amdgpu_fault_reserve_notify() also means that we can >>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>> amdgpu_ttm_placement_init() will do it for us. >>> I'm fine with the general approach, but I'm still absolutely not keen about >>> clearing the flag when userspace has originally specified it. > Is there any specific concern you have about that? Yeah, quite a bunch actually. We want to use this flag for P2P buffer sharing in the future as well and I don't intent to add another one like CPU_ACCESS_REALLY_REQUIRED or something like this. >>> Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or >>> something like this. >> Is it the fact that we clear a flag that userspace expects not to have changed >> if it queries it later? I think that's the only effect of this that's directly >> visible to userspace code. > I don't see any way for userspace to query that. > > >> As for a new "hint" flag, I assume this new flag would be an alternative to the >> existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use it in >> situations where the kernel *should* place a BO somewhere CPU-accessible, but >> is permitted to move it elsewhere. Is that correct? > That seems silly. The userspace flag could never be more than a hint. > Unfortunately we named it to suggest differently, but we have to live > with that. No, just the other way around. The CPU_ACCESS_REQUIRED flag was introduced to note that it is MANDATORY to always have CPU access to the buffer. It's just mesa which uses the flag as a hint to we could get CPU access. Regards, Christian. > If we do need a new hint flag internally in the driver, we should simply > translate AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED to the new flag in > amdgpu_gem_create_ioctl, and not expose the new flag to userspace. > > > But other than the question in my followup to the cover letter, this > patch looks good to me as is. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-29 8:23 ` Christian König @ 2017-06-29 9:58 ` Michel Dänzer 2017-06-29 10:05 ` Daniel Vetter 0 siblings, 1 reply; 25+ messages in thread From: Michel Dänzer @ 2017-06-29 9:58 UTC (permalink / raw) To: Christian König, John Brooks; +Cc: dri-devel, amd-gfx On 29/06/17 05:23 PM, Christian König wrote: > Am 29.06.2017 um 04:35 schrieb Michel Dänzer: >> On 29/06/17 08:26 AM, John Brooks wrote: >>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>>> >>>>> Instead of the flag being set in stone at BO creation, set the flag >>>>> when a >>>>> page fault occurs so that it goes somewhere CPU-visible, and clear >>>>> it when >>>>> the BO is requested by the GPU. >>>>> >>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>>> invisible VRAM, where they may promptly generate another page >>>>> fault. When >>>>> BOs are constantly moved back and forth like this, it is highly >>>>> detrimental >>>>> to performance. Only clear the flag on CS if: >>>>> >>>>> - The BO wasn't page faulted for a certain amount of time >>>>> (currently 10 >>>>> seconds), and >>>>> - its last page fault didn't occur too soon (currently 500ms) after >>>>> its >>>>> last CS request, or vice versa. >>>>> >>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that >>>>> we can >>>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>>> amdgpu_ttm_placement_init() will do it for us. >>>> I'm fine with the general approach, but I'm still absolutely not >>>> keen about >>>> clearing the flag when userspace has originally specified it. >> Is there any specific concern you have about that? > > Yeah, quite a bunch actually. We want to use this flag for P2P buffer > sharing in the future as well and I don't intent to add another one like > CPU_ACCESS_REALLY_REQUIRED or something like this. Won't a BO need to be pinned while it's being shared with another device? >>>> Please add a new flag something like >>>> AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or >>>> something like this. >>> Is it the fact that we clear a flag that userspace expects not to >>> have changed >>> if it queries it later? I think that's the only effect of this that's >>> directly >>> visible to userspace code. >> I don't see any way for userspace to query that. >> >> >>> As for a new "hint" flag, I assume this new flag would be an >>> alternative to the >>> existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use >>> it in >>> situations where the kernel *should* place a BO somewhere >>> CPU-accessible, but >>> is permitted to move it elsewhere. Is that correct? >> That seems silly. The userspace flag could never be more than a hint. >> Unfortunately we named it to suggest differently, but we have to live >> with that. > > No, just the other way around. The CPU_ACCESS_REQUIRED flag was > introduced to note that it is MANDATORY to always have CPU access to the > buffer. > > It's just mesa which uses the flag as a hint to we could get CPU access. Can you describe a specific scenario where userspace would actually need the strict semantics? Otherwise I'm afraid what you're demanding boils down to userspace churn for no benefit. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-29 9:58 ` Michel Dänzer @ 2017-06-29 10:05 ` Daniel Vetter [not found] ` <20170629100523.khsozocltct7tnfu-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Daniel Vetter @ 2017-06-29 10:05 UTC (permalink / raw) To: Michel Dänzer; +Cc: John Brooks, amd-gfx, dri-devel On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: > On 29/06/17 05:23 PM, Christian König wrote: > > Am 29.06.2017 um 04:35 schrieb Michel Dänzer: > >> On 29/06/17 08:26 AM, John Brooks wrote: > >>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: > >>>>> > >>>>> Instead of the flag being set in stone at BO creation, set the flag > >>>>> when a > >>>>> page fault occurs so that it goes somewhere CPU-visible, and clear > >>>>> it when > >>>>> the BO is requested by the GPU. > >>>>> > >>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > >>>>> invisible VRAM, where they may promptly generate another page > >>>>> fault. When > >>>>> BOs are constantly moved back and forth like this, it is highly > >>>>> detrimental > >>>>> to performance. Only clear the flag on CS if: > >>>>> > >>>>> - The BO wasn't page faulted for a certain amount of time > >>>>> (currently 10 > >>>>> seconds), and > >>>>> - its last page fault didn't occur too soon (currently 500ms) after > >>>>> its > >>>>> last CS request, or vice versa. > >>>>> > >>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that > >>>>> we can > >>>>> remove the loop to restrict lpfn to the end of visible VRAM, because > >>>>> amdgpu_ttm_placement_init() will do it for us. > >>>> I'm fine with the general approach, but I'm still absolutely not > >>>> keen about > >>>> clearing the flag when userspace has originally specified it. > >> Is there any specific concern you have about that? > > > > Yeah, quite a bunch actually. We want to use this flag for P2P buffer > > sharing in the future as well and I don't intent to add another one like > > CPU_ACCESS_REALLY_REQUIRED or something like this. > > Won't a BO need to be pinned while it's being shared with another device? That's an artifact of the current kernel implementation, I think we could do better (but for current use-cases where we share a bunch of scanouts and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this never changing. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20170629100523.khsozocltct7tnfu-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <20170629100523.khsozocltct7tnfu-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> @ 2017-06-30 2:24 ` Michel Dänzer [not found] ` <e1568d15-42da-4720-dff8-3a6e373f51d8-otUistvHUpPR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Michel Dänzer @ 2017-06-30 2:24 UTC (permalink / raw) To: Daniel Vetter Cc: John Brooks, Christian König, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 29/06/17 07:05 PM, Daniel Vetter wrote: > On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: >> On 29/06/17 05:23 PM, Christian König wrote: >>> Am 29.06.2017 um 04:35 schrieb Michel Dänzer: >>>> On 29/06/17 08:26 AM, John Brooks wrote: >>>>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>>>>> >>>>>>> Instead of the flag being set in stone at BO creation, set the flag >>>>>>> when a >>>>>>> page fault occurs so that it goes somewhere CPU-visible, and clear >>>>>>> it when >>>>>>> the BO is requested by the GPU. >>>>>>> >>>>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>>>>> invisible VRAM, where they may promptly generate another page >>>>>>> fault. When >>>>>>> BOs are constantly moved back and forth like this, it is highly >>>>>>> detrimental >>>>>>> to performance. Only clear the flag on CS if: >>>>>>> >>>>>>> - The BO wasn't page faulted for a certain amount of time >>>>>>> (currently 10 >>>>>>> seconds), and >>>>>>> - its last page fault didn't occur too soon (currently 500ms) after >>>>>>> its >>>>>>> last CS request, or vice versa. >>>>>>> >>>>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that >>>>>>> we can >>>>>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>>>>> amdgpu_ttm_placement_init() will do it for us. >>>>>> I'm fine with the general approach, but I'm still absolutely not >>>>>> keen about >>>>>> clearing the flag when userspace has originally specified it. >>>> Is there any specific concern you have about that? >>> >>> Yeah, quite a bunch actually. We want to use this flag for P2P buffer >>> sharing in the future as well and I don't intent to add another one like >>> CPU_ACCESS_REALLY_REQUIRED or something like this. >> >> Won't a BO need to be pinned while it's being shared with another device? > > That's an artifact of the current kernel implementation, I think we could > do better (but for current use-cases where we share a bunch of scanouts > and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this never > changing. Surely there will need to be some kind of transaction though to let the driver know when a BO starts/stops being shared with another device? Either via the existing dma-buf callbacks, or something similar. We can't rely on userspace setting a "CPU access" flag to make sure a BO can be shared with other devices? -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <e1568d15-42da-4720-dff8-3a6e373f51d8-otUistvHUpPR7s880joybQ@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <e1568d15-42da-4720-dff8-3a6e373f51d8-otUistvHUpPR7s880joybQ@public.gmane.org> @ 2017-06-30 6:47 ` Christian König [not found] ` <c9b732c1-4bdd-a2ce-1dc2-6abbaf89ce5a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-07-07 14:47 ` Marek Olšák 0 siblings, 2 replies; 25+ messages in thread From: Christian König @ 2017-06-30 6:47 UTC (permalink / raw) To: Michel Dänzer, Daniel Vetter Cc: John Brooks, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 30.06.2017 um 04:24 schrieb Michel Dänzer: > On 29/06/17 07:05 PM, Daniel Vetter wrote: >> On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: >>> On 29/06/17 05:23 PM, Christian König wrote: >>>> Am 29.06.2017 um 04:35 schrieb Michel Dänzer: >>>>> On 29/06/17 08:26 AM, John Brooks wrote: >>>>>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>>>>>> Instead of the flag being set in stone at BO creation, set the flag >>>>>>>> when a >>>>>>>> page fault occurs so that it goes somewhere CPU-visible, and clear >>>>>>>> it when >>>>>>>> the BO is requested by the GPU. >>>>>>>> >>>>>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>>>>>> invisible VRAM, where they may promptly generate another page >>>>>>>> fault. When >>>>>>>> BOs are constantly moved back and forth like this, it is highly >>>>>>>> detrimental >>>>>>>> to performance. Only clear the flag on CS if: >>>>>>>> >>>>>>>> - The BO wasn't page faulted for a certain amount of time >>>>>>>> (currently 10 >>>>>>>> seconds), and >>>>>>>> - its last page fault didn't occur too soon (currently 500ms) after >>>>>>>> its >>>>>>>> last CS request, or vice versa. >>>>>>>> >>>>>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that >>>>>>>> we can >>>>>>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>>>>>> amdgpu_ttm_placement_init() will do it for us. >>>>>>> I'm fine with the general approach, but I'm still absolutely not >>>>>>> keen about >>>>>>> clearing the flag when userspace has originally specified it. >>>>> Is there any specific concern you have about that? >>>> Yeah, quite a bunch actually. We want to use this flag for P2P buffer >>>> sharing in the future as well and I don't intent to add another one like >>>> CPU_ACCESS_REALLY_REQUIRED or something like this. >>> Won't a BO need to be pinned while it's being shared with another device? >> That's an artifact of the current kernel implementation, I think we could >> do better (but for current use-cases where we share a bunch of scanouts >> and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this never >> changing. > Surely there will need to be some kind of transaction though to let the > driver know when a BO starts/stops being shared with another device? > Either via the existing dma-buf callbacks, or something similar. We > can't rely on userspace setting a "CPU access" flag to make sure a BO > can be shared with other devices? Well, the flag was never intended to be used by userspace. See the history was more like we need something in the kernel to place the BO in CPU accessible VRAM. Then the closed source UMD came along and said hey we have the concept of two different heaps for visible and invisible VRAM, how does that maps to amdgpu? I unfortunately was to tired to push back hard enough on this.... Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <c9b732c1-4bdd-a2ce-1dc2-6abbaf89ce5a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <c9b732c1-4bdd-a2ce-1dc2-6abbaf89ce5a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-06-30 12:39 ` Daniel Vetter [not found] ` <20170630123904.afbsnmxkxxzuydr2-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Daniel Vetter @ 2017-06-30 12:39 UTC (permalink / raw) To: Christian König Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, John Brooks, Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter On Fri, Jun 30, 2017 at 08:47:27AM +0200, Christian König wrote: > Am 30.06.2017 um 04:24 schrieb Michel Dänzer: > > On 29/06/17 07:05 PM, Daniel Vetter wrote: > > > On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: > > > > On 29/06/17 05:23 PM, Christian König wrote: > > > > > Am 29.06.2017 um 04:35 schrieb Michel Dänzer: > > > > > > On 29/06/17 08:26 AM, John Brooks wrote: > > > > > > > On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: > > > > > > > > > Instead of the flag being set in stone at BO creation, set the flag > > > > > > > > > when a > > > > > > > > > page fault occurs so that it goes somewhere CPU-visible, and clear > > > > > > > > > it when > > > > > > > > > the BO is requested by the GPU. > > > > > > > > > > > > > > > > > > However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > > > > > > > > > invisible VRAM, where they may promptly generate another page > > > > > > > > > fault. When > > > > > > > > > BOs are constantly moved back and forth like this, it is highly > > > > > > > > > detrimental > > > > > > > > > to performance. Only clear the flag on CS if: > > > > > > > > > > > > > > > > > > - The BO wasn't page faulted for a certain amount of time > > > > > > > > > (currently 10 > > > > > > > > > seconds), and > > > > > > > > > - its last page fault didn't occur too soon (currently 500ms) after > > > > > > > > > its > > > > > > > > > last CS request, or vice versa. > > > > > > > > > > > > > > > > > > Setting the flag in amdgpu_fault_reserve_notify() also means that > > > > > > > > > we can > > > > > > > > > remove the loop to restrict lpfn to the end of visible VRAM, because > > > > > > > > > amdgpu_ttm_placement_init() will do it for us. > > > > > > > > I'm fine with the general approach, but I'm still absolutely not > > > > > > > > keen about > > > > > > > > clearing the flag when userspace has originally specified it. > > > > > > Is there any specific concern you have about that? > > > > > Yeah, quite a bunch actually. We want to use this flag for P2P buffer > > > > > sharing in the future as well and I don't intent to add another one like > > > > > CPU_ACCESS_REALLY_REQUIRED or something like this. > > > > Won't a BO need to be pinned while it's being shared with another device? > > > That's an artifact of the current kernel implementation, I think we could > > > do better (but for current use-cases where we share a bunch of scanouts > > > and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this never > > > changing. > > Surely there will need to be some kind of transaction though to let the > > driver know when a BO starts/stops being shared with another device? > > Either via the existing dma-buf callbacks, or something similar. We > > can't rely on userspace setting a "CPU access" flag to make sure a BO > > can be shared with other devices? Well I just jumped into the middle of this that it's not entirely out of the question as an idea, but yeah we'd need to rework the dma-buf stuff with probably a callback to evict mappings/stall for outstanding rendering or something like that. > Well, the flag was never intended to be used by userspace. > > See the history was more like we need something in the kernel to place the > BO in CPU accessible VRAM. > > Then the closed source UMD came along and said hey we have the concept of > two different heaps for visible and invisible VRAM, how does that maps to > amdgpu? > > I unfortunately was to tired to push back hard enough on this.... Ehrm, are you saying you have uapi for the closed source stack only? I can help with the push back on this with a revert, no problem :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20170630123904.afbsnmxkxxzuydr2-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <20170630123904.afbsnmxkxxzuydr2-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> @ 2017-06-30 12:46 ` Christian König 0 siblings, 0 replies; 25+ messages in thread From: Christian König @ 2017-06-30 12:46 UTC (permalink / raw) To: Daniel Vetter Cc: John Brooks, Michel Dänzer, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 30.06.2017 um 14:39 schrieb Daniel Vetter: > On Fri, Jun 30, 2017 at 08:47:27AM +0200, Christian König wrote: >> Am 30.06.2017 um 04:24 schrieb Michel Dänzer: >>> On 29/06/17 07:05 PM, Daniel Vetter wrote: >>>> On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: >>>>> On 29/06/17 05:23 PM, Christian König wrote: >>>>>> Am 29.06.2017 um 04:35 schrieb Michel Dänzer: >>>>>>> On 29/06/17 08:26 AM, John Brooks wrote: >>>>>>>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>>>>>>>> Instead of the flag being set in stone at BO creation, set the flag >>>>>>>>>> when a >>>>>>>>>> page fault occurs so that it goes somewhere CPU-visible, and clear >>>>>>>>>> it when >>>>>>>>>> the BO is requested by the GPU. >>>>>>>>>> >>>>>>>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>>>>>>>> invisible VRAM, where they may promptly generate another page >>>>>>>>>> fault. When >>>>>>>>>> BOs are constantly moved back and forth like this, it is highly >>>>>>>>>> detrimental >>>>>>>>>> to performance. Only clear the flag on CS if: >>>>>>>>>> >>>>>>>>>> - The BO wasn't page faulted for a certain amount of time >>>>>>>>>> (currently 10 >>>>>>>>>> seconds), and >>>>>>>>>> - its last page fault didn't occur too soon (currently 500ms) after >>>>>>>>>> its >>>>>>>>>> last CS request, or vice versa. >>>>>>>>>> >>>>>>>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that >>>>>>>>>> we can >>>>>>>>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>>>>>>>> amdgpu_ttm_placement_init() will do it for us. >>>>>>>>> I'm fine with the general approach, but I'm still absolutely not >>>>>>>>> keen about >>>>>>>>> clearing the flag when userspace has originally specified it. >>>>>>> Is there any specific concern you have about that? >>>>>> Yeah, quite a bunch actually. We want to use this flag for P2P buffer >>>>>> sharing in the future as well and I don't intent to add another one like >>>>>> CPU_ACCESS_REALLY_REQUIRED or something like this. >>>>> Won't a BO need to be pinned while it's being shared with another device? >>>> That's an artifact of the current kernel implementation, I think we could >>>> do better (but for current use-cases where we share a bunch of scanouts >>>> and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this never >>>> changing. >>> Surely there will need to be some kind of transaction though to let the >>> driver know when a BO starts/stops being shared with another device? >>> Either via the existing dma-buf callbacks, or something similar. We >>> can't rely on userspace setting a "CPU access" flag to make sure a BO >>> can be shared with other devices? > Well I just jumped into the middle of this that it's not entirely out of > the question as an idea, but yeah we'd need to rework the dma-buf stuff > with probably a callback to evict mappings/stall for outstanding rendering > or something like that. > >> Well, the flag was never intended to be used by userspace. >> >> See the history was more like we need something in the kernel to place the >> BO in CPU accessible VRAM. >> >> Then the closed source UMD came along and said hey we have the concept of >> two different heaps for visible and invisible VRAM, how does that maps to >> amdgpu? >> >> I unfortunately was to tired to push back hard enough on this.... > Ehrm, are you saying you have uapi for the closed source stack only? No, Mesa is using that flag as well. What I'm saying is that we have a flag which became uapi because I was to lazy to distinct between uapi and kernel internal flags. > I can help with the push back on this with a revert, no problem :-) That would break Mesa and is not an option (unfortunately :). Christian. _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-30 6:47 ` Christian König [not found] ` <c9b732c1-4bdd-a2ce-1dc2-6abbaf89ce5a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-07-07 14:47 ` Marek Olšák 1 sibling, 0 replies; 25+ messages in thread From: Marek Olšák @ 2017-07-07 14:47 UTC (permalink / raw) To: Christian König Cc: John Brooks, Michel Dänzer, amd-gfx mailing list, dri-devel On Fri, Jun 30, 2017 at 8:47 AM, Christian König <deathsimple@vodafone.de> wrote: > Am 30.06.2017 um 04:24 schrieb Michel Dänzer: >> >> On 29/06/17 07:05 PM, Daniel Vetter wrote: >>> >>> On Thu, Jun 29, 2017 at 06:58:05PM +0900, Michel Dänzer wrote: >>>> >>>> On 29/06/17 05:23 PM, Christian König wrote: >>>>> >>>>> Am 29.06.2017 um 04:35 schrieb Michel Dänzer: >>>>>> >>>>>> On 29/06/17 08:26 AM, John Brooks wrote: >>>>>>> >>>>>>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>>>>>>> >>>>>>>>> Instead of the flag being set in stone at BO creation, set the flag >>>>>>>>> when a >>>>>>>>> page fault occurs so that it goes somewhere CPU-visible, and clear >>>>>>>>> it when >>>>>>>>> the BO is requested by the GPU. >>>>>>>>> >>>>>>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT >>>>>>>>> to >>>>>>>>> invisible VRAM, where they may promptly generate another page >>>>>>>>> fault. When >>>>>>>>> BOs are constantly moved back and forth like this, it is highly >>>>>>>>> detrimental >>>>>>>>> to performance. Only clear the flag on CS if: >>>>>>>>> >>>>>>>>> - The BO wasn't page faulted for a certain amount of time >>>>>>>>> (currently 10 >>>>>>>>> seconds), and >>>>>>>>> - its last page fault didn't occur too soon (currently 500ms) after >>>>>>>>> its >>>>>>>>> last CS request, or vice versa. >>>>>>>>> >>>>>>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that >>>>>>>>> we can >>>>>>>>> remove the loop to restrict lpfn to the end of visible VRAM, >>>>>>>>> because >>>>>>>>> amdgpu_ttm_placement_init() will do it for us. >>>>>>>> >>>>>>>> I'm fine with the general approach, but I'm still absolutely not >>>>>>>> keen about >>>>>>>> clearing the flag when userspace has originally specified it. >>>>>> >>>>>> Is there any specific concern you have about that? >>>>> >>>>> Yeah, quite a bunch actually. We want to use this flag for P2P buffer >>>>> sharing in the future as well and I don't intent to add another one >>>>> like >>>>> CPU_ACCESS_REALLY_REQUIRED or something like this. >>>> >>>> Won't a BO need to be pinned while it's being shared with another >>>> device? >>> >>> That's an artifact of the current kernel implementation, I think we could >>> do better (but for current use-cases where we share a bunch of scanouts >>> and maybe a few pixmaps it's pointless). I wouldn't bet uapi on this >>> never >>> changing. >> >> Surely there will need to be some kind of transaction though to let the >> driver know when a BO starts/stops being shared with another device? >> Either via the existing dma-buf callbacks, or something similar. We >> can't rely on userspace setting a "CPU access" flag to make sure a BO >> can be shared with other devices? > > > Well, the flag was never intended to be used by userspace. > > See the history was more like we need something in the kernel to place the > BO in CPU accessible VRAM. > > Then the closed source UMD came along and said hey we have the concept of > two different heaps for visible and invisible VRAM, how does that maps to > amdgpu? Mesa stopped using CPU_ACCESS_REQUIRED a couple of days ago. Marek _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS 2017-06-29 2:35 ` Michel Dänzer 2017-06-29 8:23 ` Christian König @ 2017-06-30 1:56 ` John Brooks [not found] ` <20170630015638.GA735-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> 1 sibling, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-30 1:56 UTC (permalink / raw) To: Michel Dänzer; +Cc: amd-gfx, dri-devel On Thu, Jun 29, 2017 at 11:35:53AM +0900, Michel Dänzer wrote: > On 29/06/17 08:26 AM, John Brooks wrote: > > On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: > >> Am 28.06.2017 um 04:33 schrieb John Brooks: > >>> When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, > >>> it should only be treated as a hint to initially place a BO somewhere CPU > >>> accessible, rather than having a permanent effect on BO placement. > >>> > >>> Instead of the flag being set in stone at BO creation, set the flag when a > >>> page fault occurs so that it goes somewhere CPU-visible, and clear it when > >>> the BO is requested by the GPU. > >>> > >>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > >>> invisible VRAM, where they may promptly generate another page fault. When > >>> BOs are constantly moved back and forth like this, it is highly detrimental > >>> to performance. Only clear the flag on CS if: > >>> > >>> - The BO wasn't page faulted for a certain amount of time (currently 10 > >>> seconds), and > >>> - its last page fault didn't occur too soon (currently 500ms) after its > >>> last CS request, or vice versa. > >>> > >>> Setting the flag in amdgpu_fault_reserve_notify() also means that we can > >>> remove the loop to restrict lpfn to the end of visible VRAM, because > >>> amdgpu_ttm_placement_init() will do it for us. > >> > >> I'm fine with the general approach, but I'm still absolutely not keen about > >> clearing the flag when userspace has originally specified it. > > Is there any specific concern you have about that? > > > >> Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or > >> something like this. > > > > Is it the fact that we clear a flag that userspace expects not to have changed > > if it queries it later? I think that's the only effect of this that's directly > > visible to userspace code. > > I don't see any way for userspace to query that. I was looking at amdgpu_gem_metadata_ioctl(). It looked like it was possible to query a BO's flags through that method. I don't know if it actually matters; it was just a stab in the dark for any possibly tangible effect on userspace that might arise from the kernel changing the flag. John > > > > As for a new "hint" flag, I assume this new flag would be an alternative to the > > existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use it in > > situations where the kernel *should* place a BO somewhere CPU-accessible, but > > is permitted to move it elsewhere. Is that correct? > > That seems silly. The userspace flag could never be more than a hint. > Unfortunately we named it to suggest differently, but we have to live > with that. > > If we do need a new hint flag internally in the driver, we should simply > translate AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED to the new flag in > amdgpu_gem_create_ioctl, and not expose the new flag to userspace. > > > But other than the question in my followup to the cover letter, this > patch looks good to me as is. > > > -- > Earthling Michel Dänzer | http://www.amd.com > Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20170630015638.GA735-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>]
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <20170630015638.GA735-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> @ 2017-06-30 2:16 ` John Brooks 0 siblings, 0 replies; 25+ messages in thread From: John Brooks @ 2017-06-30 2:16 UTC (permalink / raw) To: Michel Dänzer Cc: David Airlie, Christian König, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Marek Olšák, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 2017-06-29 09:56 PM, John Brooks wrote: > On Thu, Jun 29, 2017 at 11:35:53AM +0900, Michel Dänzer wrote: >> On 29/06/17 08:26 AM, John Brooks wrote: >>> On Wed, Jun 28, 2017 at 03:05:32PM +0200, Christian König wrote: >>>> Am 28.06.2017 um 04:33 schrieb John Brooks: >>>>> When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, >>>>> it should only be treated as a hint to initially place a BO somewhere CPU >>>>> accessible, rather than having a permanent effect on BO placement. >>>>> >>>>> Instead of the flag being set in stone at BO creation, set the flag when a >>>>> page fault occurs so that it goes somewhere CPU-visible, and clear it when >>>>> the BO is requested by the GPU. >>>>> >>>>> However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to >>>>> invisible VRAM, where they may promptly generate another page fault. When >>>>> BOs are constantly moved back and forth like this, it is highly detrimental >>>>> to performance. Only clear the flag on CS if: >>>>> >>>>> - The BO wasn't page faulted for a certain amount of time (currently 10 >>>>> seconds), and >>>>> - its last page fault didn't occur too soon (currently 500ms) after its >>>>> last CS request, or vice versa. >>>>> >>>>> Setting the flag in amdgpu_fault_reserve_notify() also means that we can >>>>> remove the loop to restrict lpfn to the end of visible VRAM, because >>>>> amdgpu_ttm_placement_init() will do it for us. >>>> I'm fine with the general approach, but I'm still absolutely not keen about >>>> clearing the flag when userspace has originally specified it. >> Is there any specific concern you have about that? >> >> >>>> Please add a new flag something like AMDGPU_GEM_CREATE_CPU_ACCESS_HINT or >>>> something like this. >>> Is it the fact that we clear a flag that userspace expects not to have changed >>> if it queries it later? I think that's the only effect of this that's directly >>> visible to userspace code. >> I don't see any way for userspace to query that. > I was looking at amdgpu_gem_metadata_ioctl(). It looked like it was possible to > query a BO's flags through that method. I don't know if it actually matters; it > was just a stab in the dark for any possibly tangible effect on userspace that > might arise from the kernel changing the flag. > > John And it looks like I am not correct about this as I misread it. It exposes bo->metadata_flags, not bo->flags. John >> >>> As for a new "hint" flag, I assume this new flag would be an alternative to the >>> existing CPU_ACCESS_REQUIRED flag, and we'd change Mesa et al to use it in >>> situations where the kernel *should* place a BO somewhere CPU-accessible, but >>> is permitted to move it elsewhere. Is that correct? >> That seems silly. The userspace flag could never be more than a hint. >> Unfortunately we named it to suggest differently, but we have to live >> with that. >> >> If we do need a new hint flag internally in the driver, we should simply >> translate AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED to the new flag in >> amdgpu_gem_create_ioctl, and not expose the new flag to userspace. >> >> >> But other than the question in my followup to the cover letter, this >> patch looks good to me as is. >> >> >> -- >> Earthling Michel Dänzer | http://www.amd.com >> Libre software enthusiast | Mesa and X developer > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS [not found] ` <1498617201-24557-5-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-28 13:05 ` Christian König @ 2017-06-29 3:18 ` Michel Dänzer 1 sibling, 0 replies; 25+ messages in thread From: Michel Dänzer @ 2017-06-29 3:18 UTC (permalink / raw) To: John Brooks, Marek Olšák, Christian König Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW On 28/06/17 11:33 AM, John Brooks wrote: > When the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag is given by userspace, > it should only be treated as a hint to initially place a BO somewhere CPU > accessible, rather than having a permanent effect on BO placement. > > Instead of the flag being set in stone at BO creation, set the flag when a > page fault occurs so that it goes somewhere CPU-visible, and clear it when > the BO is requested by the GPU. > > However, clearing the CPU_ACCESS_REQUIRED flag may move BOs in GTT to > invisible VRAM, where they may promptly generate another page fault. When > BOs are constantly moved back and forth like this, it is highly detrimental > to performance. Only clear the flag on CS if: > > - The BO wasn't page faulted for a certain amount of time (currently 10 > seconds), and > - its last page fault didn't occur too soon (currently 500ms) after its > last CS request, or vice versa. > > Setting the flag in amdgpu_fault_reserve_notify() also means that we can > remove the loop to restrict lpfn to the end of visible VRAM, because > amdgpu_ttm_placement_init() will do it for us. > > Signed-off-by: John Brooks <john@fastquake.com> [...] > @@ -967,15 +967,9 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > > /* hurrah the memory is not visible ! */ > atomic64_inc(&adev->num_vram_cpu_page_faults); > + abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; Talking about this on IRC reminded me that I think we should set the flag in this function regardless of where the BO is currently located. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH 5/5] drm/amdgpu: Don't force BOs into visible VRAM for page faults [not found] ` <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-28 2:33 ` [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS John Brooks @ 2017-06-28 2:33 ` John Brooks 2017-06-29 2:30 ` Michel Dänzer 1 sibling, 1 reply; 25+ messages in thread From: John Brooks @ 2017-06-28 2:33 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Michel Dänzer, Marek Olšák, Christian König Cc: David Airlie, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW There is no need for page faults to force BOs into visible VRAM if it's full, and the time it takes to do so is great enough to cause noticeable stuttering. Add GTT as a possible placement so that if visible VRAM is full, page faults move BOs to GTT instead of evicting other BOs from VRAM. Signed-off-by: John Brooks <john@fastquake.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 658d7b1..a215d8c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -968,19 +968,21 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) /* hurrah the memory is not visible ! */ atomic64_inc(&adev->num_vram_cpu_page_faults); abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; - amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM); + amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM | + AMDGPU_GEM_DOMAIN_GTT); + + /* Avoid costly evictions; only set GTT as a busy placement */ + abo->placement.num_busy_placement = 1; + abo->placement.busy_placement = &abo->placements[1]; r = ttm_bo_validate(bo, &abo->placement, false, false); - if (unlikely(r == -ENOMEM)) { - amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT); - return ttm_bo_validate(bo, &abo->placement, false, false); - } else if (unlikely(r != 0)) { + if (unlikely(r != 0)) return r; - } offset = bo->mem.start << PAGE_SHIFT; /* this should never happen */ - if ((offset + size) > adev->mc.visible_vram_size) + if (bo->mem.mem_type == TTM_PL_VRAM && + (offset + size) > adev->mc.visible_vram_size) return -EINVAL; return 0; -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH 5/5] drm/amdgpu: Don't force BOs into visible VRAM for page faults 2017-06-28 2:33 ` [PATCH 5/5] drm/amdgpu: Don't force BOs into visible VRAM for page faults John Brooks @ 2017-06-29 2:30 ` Michel Dänzer 0 siblings, 0 replies; 25+ messages in thread From: Michel Dänzer @ 2017-06-29 2:30 UTC (permalink / raw) To: John Brooks, Marek Olšák, Christian König Cc: dri-devel, amd-gfx On 28/06/17 11:33 AM, John Brooks wrote: > There is no need for page faults to force BOs into visible VRAM if it's > full, and the time it takes to do so is great enough to cause noticeable > stuttering. Add GTT as a possible placement so that if visible VRAM is > full, page faults move BOs to GTT instead of evicting other BOs from VRAM. > > Signed-off-by: John Brooks <john@fastquake.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 16 +++++++++------- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index 658d7b1..a215d8c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -968,19 +968,21 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) > /* hurrah the memory is not visible ! */ > atomic64_inc(&adev->num_vram_cpu_page_faults); > abo->flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED; > - amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM); > + amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM | > + AMDGPU_GEM_DOMAIN_GTT); > + > + /* Avoid costly evictions; only set GTT as a busy placement */ > + abo->placement.num_busy_placement = 1; > + abo->placement.busy_placement = &abo->placements[1]; > > r = ttm_bo_validate(bo, &abo->placement, false, false); > - if (unlikely(r == -ENOMEM)) { > - amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT); > - return ttm_bo_validate(bo, &abo->placement, false, false); > - } else if (unlikely(r != 0)) { > + if (unlikely(r != 0)) > return r; > - } > > offset = bo->mem.start << PAGE_SHIFT; > /* this should never happen */ > - if ((offset + size) > adev->mc.visible_vram_size) > + if (bo->mem.mem_type == TTM_PL_VRAM && > + (offset + size) > adev->mc.visible_vram_size) > return -EINVAL; > > return 0; > Reviewed-by: Michel Dänzer <michel.daenzer@amd.com> -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] Visible VRAM Management Improvements 2017-06-28 2:33 [PATCH v2] Visible VRAM Management Improvements John Brooks ` (3 preceding siblings ...) [not found] ` <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> @ 2017-06-29 2:33 ` Michel Dänzer 4 siblings, 0 replies; 25+ messages in thread From: Michel Dänzer @ 2017-06-29 2:33 UTC (permalink / raw) To: John Brooks, Marek Olšák, Christian König Cc: dri-devel, amd-gfx On 28/06/17 11:33 AM, John Brooks wrote: > Changes in this version: > - Dropped former patch 1 ("Separate placements and busy placements") as it was > no longer necessary > - Dropped former patch 4 ("Don't force BOs into visible VRAM if they can go to > GTT instead") as it was unnecessary and had too many side effects (Thanks > Christian) > - Dropped former patches 8 and 9 ("Asynchronously move BOs to visible VRAM" and > an associated optimization), as there may be better ways to address the root > of the problem > - Added a u64 cast to patch 1 (Thanks Christian) > - Optimized patch 2 in the case where visible VRAM is not smaller than total > VRAM (Thanks Michel) > > The series as it is now fixes Dying Light and improves DiRT Rally frame times. > Without the asynchronous moves, there is still occasional stalling in DiRT > Rally. > > > The "Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS" patch received > objections, but I have not removed it yet as there seems to be some support for > the idea. The precise details of when to set and clear the flag may need further > discussion. Have you tried my idea of only clearing the flag when a BO is moved to VRAM? If that works well enough, patch 3 should not be necessary either. -- Earthling Michel Dänzer | http://www.amd.com Libre software enthusiast | Mesa and X developer _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2017-07-07 14:47 UTC | newest] Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-06-28 2:33 [PATCH v2] Visible VRAM Management Improvements John Brooks 2017-06-28 2:33 ` [PATCH 1/5] drm/amdgpu: Add vis_vramlimit module parameter John Brooks 2017-06-28 2:33 ` [PATCH 2/5] drm/amdgpu: Throttle visible VRAM moves separately John Brooks 2017-06-28 2:33 ` [PATCH 3/5] drm/amdgpu: Track time of last page fault and last CS move in struct amdgpu_bo John Brooks 2017-06-28 13:06 ` Christian König [not found] ` <e3d7be62-e63b-f370-f159-147aaf8d7c50-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-06-28 22:59 ` John Brooks 2017-06-29 8:11 ` Christian König [not found] ` <1498617201-24557-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-28 2:33 ` [PATCH 4/5] drm/amdgpu: Set/clear CPU_ACCESS_REQUIRED flag on page fault and CS John Brooks [not found] ` <1498617201-24557-5-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org> 2017-06-28 13:05 ` Christian König 2017-06-28 23:26 ` John Brooks [not found] ` <20170628232639.GB11762-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> 2017-06-29 2:35 ` Michel Dänzer 2017-06-29 8:23 ` Christian König 2017-06-29 9:58 ` Michel Dänzer 2017-06-29 10:05 ` Daniel Vetter [not found] ` <20170629100523.khsozocltct7tnfu-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 2017-06-30 2:24 ` Michel Dänzer [not found] ` <e1568d15-42da-4720-dff8-3a6e373f51d8-otUistvHUpPR7s880joybQ@public.gmane.org> 2017-06-30 6:47 ` Christian König [not found] ` <c9b732c1-4bdd-a2ce-1dc2-6abbaf89ce5a-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-06-30 12:39 ` Daniel Vetter [not found] ` <20170630123904.afbsnmxkxxzuydr2-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org> 2017-06-30 12:46 ` Christian König 2017-07-07 14:47 ` Marek Olšák 2017-06-30 1:56 ` John Brooks [not found] ` <20170630015638.GA735-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org> 2017-06-30 2:16 ` John Brooks 2017-06-29 3:18 ` Michel Dänzer 2017-06-28 2:33 ` [PATCH 5/5] drm/amdgpu: Don't force BOs into visible VRAM for page faults John Brooks 2017-06-29 2:30 ` Michel Dänzer 2017-06-29 2:33 ` [PATCH v2] Visible VRAM Management Improvements Michel Dänzer
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.