All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
@ 2017-05-18  9:08 Michel Dänzer
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-18  9:08 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Michel Dänzer <michel.daenzer@amd.com>

This series was developed and tested under the following scenario:

Running the PTS dirt-rally benchmark (1920x1080, Ultra) on Tonga with
2G, with CPU visible VRAM artificially restricted to 64 MB.

Without this series, there's a lot of stutter during about the first
minute of a benchmark run. During this time there are significant amounts
of buffer moves (starting from about 500 MB on the HUD) and evictions,
gradually declining until the buffer moves settle around 8 MB on the HUD.

With this series, there's only slight stutter during the first seconds
after the car launches, even though the buffer move volume is about the
same as without the series. Buffer evictions are eliminated almost
completely, except for a few at the beginning. Buffer moves still settle
around 8 MB on the HUD, but with less variance than before.

Note that there's only little if any improvement of the average framerate
reported, but the minimum framerate as seen on the HUD goes from ~10 fps
to ~17.


Patch 1 is a cleanup that I noticed along the way.

Patch 2 makes the main difference for the above scenario.

Patch 3 doesn't make as much difference, I'm fine with it not landing at
least for now.

Michel Dänzer (3):
  drm/amdgpu: Drop useless loops for placement restrictions
  drm/amdgpu: Don't evict other BOs from VRAM for page faults
  drm/amdgpu: Try evicting from CPU visible to invisible VRAM first

 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 42 ++++++++++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 46 ++++++++++++++++++++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  6 +---
 3 files changed, 51 insertions(+), 43 deletions(-)

-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-18  9:08   ` Michel Dänzer
       [not found]     ` <20170518090809.15916-2-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
  2017-05-18  9:08   ` [PATCH 2/3] drm/amdgpu: Don't evict other BOs from VRAM for page faults Michel Dänzer
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-18  9:08 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Michel Dänzer <michel.daenzer@amd.com>

We know how the placements were initialized in these cases, so we can
set the restrictions directly without a loop.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++----------
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 19 ++++---------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  6 +-----
 3 files changed, 8 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d7948d..b3252bc8fe51 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -939,8 +939,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 {
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
 	struct amdgpu_bo *abo;
-	unsigned long offset, size, lpfn;
-	int i, r;
+	unsigned long offset, size;
+	int r;
 
 	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
 		return 0;
@@ -961,14 +961,7 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 
 	/* hurrah the memory is not visible ! */
 	amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
-	lpfn =	adev->mc.visible_vram_size >> PAGE_SHIFT;
-	for (i = 0; i < abo->placement.num_placement; i++) {
-		/* Force into visible VRAM */
-		if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
-		    (!abo->placements[i].lpfn ||
-		     abo->placements[i].lpfn > lpfn))
-			abo->placements[i].lpfn = lpfn;
-	}
+	abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
 	r = ttm_bo_validate(bo, &abo->placement, false, false);
 	if (unlikely(r == -ENOMEM)) {
 		amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 5db0230e45c6..57789b860768 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -191,7 +191,6 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 		.lpfn = 0,
 		.flags = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM
 	};
-	unsigned i;
 
 	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) {
 		placement->placement = &placements;
@@ -209,20 +208,10 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
 		} else {
 			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
-			for (i = 0; i < abo->placement.num_placement; ++i) {
-				if (!(abo->placements[i].flags &
-				      TTM_PL_FLAG_TT))
-					continue;
-
-				if (abo->placements[i].lpfn)
-					continue;
-
-				/* set an upper limit to force directly
-				 * allocating address space for the BO.
-				 */
-				abo->placements[i].lpfn =
-					adev->mc.gtt_size >> PAGE_SHIFT;
-			}
+			/* Set an upper limit to force directly allocating
+			 * address space for the BO.
+			 */
+			abo->placements[0].lpfn = adev->mc.gtt_size >> PAGE_SHIFT;
 		}
 		break;
 	case TTM_PL_TT:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
index 2ca09f111f08..60688fa5ef98 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
@@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
 
 static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
 {
-	int i;
-	for (i = 0; i < abo->placement.num_placement; ++i) {
-		abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
-		abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
-	}
+	abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
 }
 
 static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)
-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/3] drm/amdgpu: Don't evict other BOs from VRAM for page faults
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
  2017-05-18  9:08   ` [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions Michel Dänzer
@ 2017-05-18  9:08   ` Michel Dänzer
  2017-05-18  9:08   ` [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first Michel Dänzer
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-18  9:08 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Michel Dänzer <michel.daenzer@amd.com>

If there is no free space in CPU visible VRAM for the faulting BO, move
it to GTT instead of evicting other BOs from CPU visible VRAM.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index b3252bc8fe51..41ee353b22c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -940,7 +940,6 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
 	struct amdgpu_bo *abo;
 	unsigned long offset, size;
-	int r;
 
 	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
 		return 0;
@@ -960,22 +959,17 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 		return -EINVAL;
 
 	/* hurrah the memory is not visible ! */
-	amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
+	amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
+					 AMDGPU_GEM_DOMAIN_GTT);
 	abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
-	r = ttm_bo_validate(bo, &abo->placement, false, false);
-	if (unlikely(r == -ENOMEM)) {
-		amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
-		return ttm_bo_validate(bo, &abo->placement, false, false);
-	} else if (unlikely(r != 0)) {
-		return r;
-	}
 
-	offset = bo->mem.start << PAGE_SHIFT;
-	/* this should never happen */
-	if ((offset + size) > adev->mc.visible_vram_size)
-		return -EINVAL;
+	/* Only set GTT as busy placement; if there is no space in CPU visible
+	 * VRAM, move this BO to GTT instead of evicting other BOs
+	 */
+	abo->placement.busy_placement = &abo->placements[1];
+	abo->placement.num_busy_placement = 1;
 
-	return 0;
+	return ttm_bo_validate(bo, &abo->placement, false, false);
 }
 
 /**
-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
  2017-05-18  9:08   ` [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions Michel Dänzer
  2017-05-18  9:08   ` [PATCH 2/3] drm/amdgpu: Don't evict other BOs from VRAM for page faults Michel Dänzer
@ 2017-05-18  9:08   ` Michel Dänzer
       [not found]     ` <20170518090809.15916-4-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
  2017-05-19  3:04   ` [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM John Brooks
  2017-05-19 16:14   ` Marek Olšák
  4 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-18  9:08 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

From: Michel Dänzer <michel.daenzer@amd.com>

In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag set to CPU visible VRAM with more force.

For other BOs, this gives another chance to stay in VRAM if they
happened to lie in the CPU visible part and another BO needs to go
there.

This should allow BOs to stay in VRAM longer in some cases.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++++++++++++++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 27 +++++++++++++++++++++++++++
 2 files changed, 41 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 41ee353b22c8..d45c2325c61a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -963,11 +963,20 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
 					 AMDGPU_GEM_DOMAIN_GTT);
 	abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
 
-	/* Only set GTT as busy placement; if there is no space in CPU visible
-	 * VRAM, move this BO to GTT instead of evicting other BOs
-	 */
-	abo->placement.busy_placement = &abo->placements[1];
-	abo->placement.num_busy_placement = 1;
+	if (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+		/* Only set VRAM as normal placement; if there is no space in
+		 * CPU visible VRAM, evict other BOs, only fall back to GTT as
+		 * last resort
+		 */
+		abo->placement.num_placement = 1;
+	} else {
+		/* Only set GTT as busy placement; if there is no space in CPU
+		 * visible VRAM, move this BO to GTT instead of evicting other
+		 * BOs
+		 */
+		abo->placement.busy_placement = &abo->placements[1];
+		abo->placement.num_busy_placement = 1;
+	}
 
 	return ttm_bo_validate(bo, &abo->placement, false, false);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 57789b860768..d5ed85026542 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 		    adev->mman.buffer_funcs_ring &&
 		    adev->mman.buffer_funcs_ring->ready == false) {
 			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
+		} else if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
+			unsigned fpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
+			struct drm_mm_node *node = bo->mem.mm_node;
+			unsigned long pages_left;
+
+			for (pages_left = bo->mem.num_pages;
+			     pages_left;
+			     pages_left -= node->size, node++) {
+				if (node->start < fpfn)
+					break;
+			}
+
+			if (!pages_left)
+				goto gtt;
+
+			/* Try evicting to the CPU inaccessible part of VRAM
+			 * first, but only set GTT as busy placement, so this
+			 * BO will be evicted to GTT rather than causing other
+			 * BOs to be evicted from VRAM
+			 */
+			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
+							 AMDGPU_GEM_DOMAIN_GTT);
+			abo->placements[0].fpfn = fpfn;
+			abo->placements[0].lpfn = 0;
+			abo->placement.busy_placement = &abo->placements[1];
+			abo->placement.num_busy_placement = 1;
 		} else {
+gtt:
 			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
 			/* Set an upper limit to force directly allocating
 			 * address space for the BO.
-- 
2.11.0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions
       [not found]     ` <20170518090809.15916-2-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-18  9:17       ` Christian König
       [not found]         ` <6c3e7a55-d3b0-fc3f-39f4-e85934119e29-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Christian König @ 2017-05-18  9:17 UTC (permalink / raw)
  To: Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Am 18.05.2017 um 11:08 schrieb Michel Dänzer:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> We know how the placements were initialized in these cases, so we can
> set the restrictions directly without a loop.
>
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 +++----------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 19 ++++---------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  6 +-----
>   3 files changed, 8 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 365883d7948d..b3252bc8fe51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -939,8 +939,8 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
>   {
>   	struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
>   	struct amdgpu_bo *abo;
> -	unsigned long offset, size, lpfn;
> -	int i, r;
> +	unsigned long offset, size;
> +	int r;
>   
>   	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo))
>   		return 0;
> @@ -961,14 +961,7 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
>   
>   	/* hurrah the memory is not visible ! */
>   	amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM);
> -	lpfn =	adev->mc.visible_vram_size >> PAGE_SHIFT;
> -	for (i = 0; i < abo->placement.num_placement; i++) {
> -		/* Force into visible VRAM */
> -		if ((abo->placements[i].flags & TTM_PL_FLAG_VRAM) &&
> -		    (!abo->placements[i].lpfn ||
> -		     abo->placements[i].lpfn > lpfn))
> -			abo->placements[i].lpfn = lpfn;
> -	}
> +	abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
>   	r = ttm_bo_validate(bo, &abo->placement, false, false);
>   	if (unlikely(r == -ENOMEM)) {
>   		amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 5db0230e45c6..57789b860768 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -191,7 +191,6 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>   		.lpfn = 0,
>   		.flags = TTM_PL_MASK_CACHING | TTM_PL_FLAG_SYSTEM
>   	};
> -	unsigned i;
>   
>   	if (!amdgpu_ttm_bo_is_amdgpu_bo(bo)) {
>   		placement->placement = &placements;
> @@ -209,20 +208,10 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>   			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
>   		} else {
>   			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
> -			for (i = 0; i < abo->placement.num_placement; ++i) {
> -				if (!(abo->placements[i].flags &
> -				      TTM_PL_FLAG_TT))
> -					continue;
> -
> -				if (abo->placements[i].lpfn)
> -					continue;
> -
> -				/* set an upper limit to force directly
> -				 * allocating address space for the BO.
> -				 */
> -				abo->placements[i].lpfn =
> -					adev->mc.gtt_size >> PAGE_SHIFT;
> -			}
> +			/* Set an upper limit to force directly allocating
> +			 * address space for the BO.
> +			 */
> +			abo->placements[0].lpfn = adev->mc.gtt_size >> PAGE_SHIFT;
>   		}
>   		break;
>   	case TTM_PL_TT:
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> index 2ca09f111f08..60688fa5ef98 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
> @@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device *adev, struct drm_file *filp)
>   
>   static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
>   {
> -	int i;
> -	for (i = 0; i < abo->placement.num_placement; ++i) {
> -		abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
> -		abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
> -	}
> +	abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;

This is not correct. The restriction applies to all placements, not only 
the first one.

Christian.

>   }
>   
>   static u64 amdgpu_uvd_get_addr_from_ctx(struct amdgpu_uvd_cs_ctx *ctx)


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions
       [not found]         ` <6c3e7a55-d3b0-fc3f-39f4-e85934119e29-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-05-18  9:47           ` Michel Dänzer
  0 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-18  9:47 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 18/05/17 06:17 PM, Christian König wrote:
> Am 18.05.2017 um 11:08 schrieb Michel Dänzer:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> We know how the placements were initialized in these cases, so we can
>> set the restrictions directly without a loop.
>>
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

[...]

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> index 2ca09f111f08..60688fa5ef98 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c
>> @@ -375,11 +375,7 @@ void amdgpu_uvd_free_handles(struct amdgpu_device
>> *adev, struct drm_file *filp)
>>     static void amdgpu_uvd_force_into_uvd_segment(struct amdgpu_bo *abo)
>>   {
>> -    int i;
>> -    for (i = 0; i < abo->placement.num_placement; ++i) {
>> -        abo->placements[i].fpfn = 0 >> PAGE_SHIFT;
>> -        abo->placements[i].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
>> -    }
>> +    abo->placements[0].lpfn = (256 * 1024 * 1024) >> PAGE_SHIFT;
> 
> This is not correct. The restriction applies to all placements, not only
> the first one.

I see, there can be multiple placements in amdgpu_uvd_cs_pass1, if cmd
!= 0x0 and != 0x3? I'll drop this hunk then, thanks.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
       [not found]     ` <20170518090809.15916-4-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-18 15:43       ` John Brooks
       [not found]         ` <20170518154346.GA5730-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-18 15:43 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
> 
> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> flag set to CPU visible VRAM with more force.
> 
> For other BOs, this gives another chance to stay in VRAM if they
> happened to lie in the CPU visible part and another BO needs to go
> there.
> 
> This should allow BOs to stay in VRAM longer in some cases.
> 
> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 ++++++++++++++-----
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 27 +++++++++++++++++++++++++++
>  2 files changed, 41 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 41ee353b22c8..d45c2325c61a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -963,11 +963,20 @@ int amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo)
>  					 AMDGPU_GEM_DOMAIN_GTT);
>  	abo->placements[0].lpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
>  
> -	/* Only set GTT as busy placement; if there is no space in CPU visible
> -	 * VRAM, move this BO to GTT instead of evicting other BOs
> -	 */
> -	abo->placement.busy_placement = &abo->placements[1];
> -	abo->placement.num_busy_placement = 1;
> +	if (abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> +		/* Only set VRAM as normal placement; if there is no space in
> +		 * CPU visible VRAM, evict other BOs, only fall back to GTT as
> +		 * last resort
> +		 */
> +		abo->placement.num_placement = 1;
> +	} else {
> +		/* Only set GTT as busy placement; if there is no space in CPU
> +		 * visible VRAM, move this BO to GTT instead of evicting other
> +		 * BOs
> +		 */
> +		abo->placement.busy_placement = &abo->placements[1];
> +		abo->placement.num_busy_placement = 1;
> +	}
>  
>  	return ttm_bo_validate(bo, &abo->placement, false, false);
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 57789b860768..d5ed85026542 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>  		    adev->mman.buffer_funcs_ring &&
>  		    adev->mman.buffer_funcs_ring->ready == false) {
>  			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
> +		} else if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
> +			unsigned fpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
> +			struct drm_mm_node *node = bo->mem.mm_node;
> +			unsigned long pages_left;
> +
> +			for (pages_left = bo->mem.num_pages;
> +			     pages_left;
> +			     pages_left -= node->size, node++) {
> +				if (node->start < fpfn)
> +					break;
> +			}
> +
> +			if (!pages_left)
> +				goto gtt;
> +
> +			/* Try evicting to the CPU inaccessible part of VRAM
> +			 * first, but only set GTT as busy placement, so this
> +			 * BO will be evicted to GTT rather than causing other
> +			 * BOs to be evicted from VRAM
> +			 */
> +			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
> +							 AMDGPU_GEM_DOMAIN_GTT);
> +			abo->placements[0].fpfn = fpfn;
> +			abo->placements[0].lpfn = 0;
> +			abo->placement.busy_placement = &abo->placements[1];

Are you sure you want to hardcode the placements index? It'll be dependent on
the order set up in amdgpu_ttm_placement_init.

> +			abo->placement.num_busy_placement = 1;
>  		} else {
> +gtt:
>  			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_GTT);
>  			/* Set an upper limit to force directly allocating
>  			 * address space for the BO.
> -- 
> 2.11.0
> 
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

--
John Brooks
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
       [not found]         ` <20170518154346.GA5730-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
@ 2017-05-19  1:20           ` Michel Dänzer
       [not found]             ` <b65d8565-47e6-39d7-12f7-f7b60cf9787e-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-19  1:20 UTC (permalink / raw)
  To: John Brooks; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 19/05/17 12:43 AM, John Brooks wrote:
> On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
>> From: Michel Dänzer <michel.daenzer@amd.com>
>>
>> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
>> flag set to CPU visible VRAM with more force.
>>
>> For other BOs, this gives another chance to stay in VRAM if they
>> happened to lie in the CPU visible part and another BO needs to go
>> there.
>>
>> This should allow BOs to stay in VRAM longer in some cases.
>>
>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>

[...]

>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 57789b860768..d5ed85026542 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>>  		    adev->mman.buffer_funcs_ring &&
>>  		    adev->mman.buffer_funcs_ring->ready == false) {
>>  			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
>> +		} else if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
>> +			unsigned fpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
>> +			struct drm_mm_node *node = bo->mem.mm_node;
>> +			unsigned long pages_left;
>> +
>> +			for (pages_left = bo->mem.num_pages;
>> +			     pages_left;
>> +			     pages_left -= node->size, node++) {
>> +				if (node->start < fpfn)
>> +					break;
>> +			}
>> +
>> +			if (!pages_left)
>> +				goto gtt;
>> +
>> +			/* Try evicting to the CPU inaccessible part of VRAM
>> +			 * first, but only set GTT as busy placement, so this
>> +			 * BO will be evicted to GTT rather than causing other
>> +			 * BOs to be evicted from VRAM
>> +			 */
>> +			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
>> +							 AMDGPU_GEM_DOMAIN_GTT);
>> +			abo->placements[0].fpfn = fpfn;
>> +			abo->placements[0].lpfn = 0;
>> +			abo->placement.busy_placement = &abo->placements[1];
> 
> Are you sure you want to hardcode the placements index? It'll be dependent on
> the order set up in amdgpu_ttm_placement_init.

Yes, see patch 1. Looping over the placements and testing their contents
is silly when we know exactly how they were set up. Or do you mean this
code shouldn't call amdgpu_ttm_placement_from_domain at all and just set
up the placements itself?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
       [not found]             ` <b65d8565-47e6-39d7-12f7-f7b60cf9787e-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-19  1:43               ` John Brooks
       [not found]                 ` <20170519014314.GA29857-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-19  1:43 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, May 19, 2017 at 10:20:37AM +0900, Michel Dänzer wrote:
> On 19/05/17 12:43 AM, John Brooks wrote:
> > On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
> >> From: Michel Dänzer <michel.daenzer@amd.com>
> >>
> >> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> >> flag set to CPU visible VRAM with more force.
> >>
> >> For other BOs, this gives another chance to stay in VRAM if they
> >> happened to lie in the CPU visible part and another BO needs to go
> >> there.
> >>
> >> This should allow BOs to stay in VRAM longer in some cases.
> >>
> >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
> 
> [...]
> 
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> index 57789b860768..d5ed85026542 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
> >>  		    adev->mman.buffer_funcs_ring &&
> >>  		    adev->mman.buffer_funcs_ring->ready == false) {
> >>  			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
> >> +		} else if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
> >> +			unsigned fpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
> >> +			struct drm_mm_node *node = bo->mem.mm_node;
> >> +			unsigned long pages_left;
> >> +
> >> +			for (pages_left = bo->mem.num_pages;
> >> +			     pages_left;
> >> +			     pages_left -= node->size, node++) {
> >> +				if (node->start < fpfn)
> >> +					break;
> >> +			}
> >> +
> >> +			if (!pages_left)
> >> +				goto gtt;
> >> +
> >> +			/* Try evicting to the CPU inaccessible part of VRAM
> >> +			 * first, but only set GTT as busy placement, so this
> >> +			 * BO will be evicted to GTT rather than causing other
> >> +			 * BOs to be evicted from VRAM
> >> +			 */
> >> +			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
> >> +							 AMDGPU_GEM_DOMAIN_GTT);
> >> +			abo->placements[0].fpfn = fpfn;
> >> +			abo->placements[0].lpfn = 0;
> >> +			abo->placement.busy_placement = &abo->placements[1];
> > 
> > Are you sure you want to hardcode the placements index? It'll be dependent on
> > the order set up in amdgpu_ttm_placement_init.
> 
> Yes, see patch 1. Looping over the placements and testing their contents
> is silly when we know exactly how they were set up. 

Agreed

> Or do you mean this code shouldn't call amdgpu_ttm_placement_from_domain at
> all and just set up the placements itself?

Calling amdgpu_ttm_placement_from_domain makes sense. I was just imagining a
scenario where code like this that makes assumptions about the ordering of
placements in the array would break silently if that order were changed, and
you'd have to go about finding the places where integer literals were used to
address specific placements.

> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer

--
John Brooks
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
       [not found]                 ` <20170519014314.GA29857-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
@ 2017-05-19  1:50                   ` Michel Dänzer
  0 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-19  1:50 UTC (permalink / raw)
  To: John Brooks; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 19/05/17 10:43 AM, John Brooks wrote:
> On Fri, May 19, 2017 at 10:20:37AM +0900, Michel Dänzer wrote:
>> On 19/05/17 12:43 AM, John Brooks wrote:
>>> On Thu, May 18, 2017 at 06:08:09PM +0900, Michel Dänzer wrote:
>>>> From: Michel Dänzer <michel.daenzer@amd.com>
>>>>
>>>> In exchange, move BOs with the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
>>>> flag set to CPU visible VRAM with more force.
>>>>
>>>> For other BOs, this gives another chance to stay in VRAM if they
>>>> happened to lie in the CPU visible part and another BO needs to go
>>>> there.
>>>>
>>>> This should allow BOs to stay in VRAM longer in some cases.
>>>>
>>>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
>>
>> [...]
>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> index 57789b860768..d5ed85026542 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>>>> @@ -206,7 +206,34 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>>>>  		    adev->mman.buffer_funcs_ring &&
>>>>  		    adev->mman.buffer_funcs_ring->ready == false) {
>>>>  			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU);
>>>> +		} else if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
>>>> +			unsigned fpfn = adev->mc.visible_vram_size >> PAGE_SHIFT;
>>>> +			struct drm_mm_node *node = bo->mem.mm_node;
>>>> +			unsigned long pages_left;
>>>> +
>>>> +			for (pages_left = bo->mem.num_pages;
>>>> +			     pages_left;
>>>> +			     pages_left -= node->size, node++) {
>>>> +				if (node->start < fpfn)
>>>> +					break;
>>>> +			}
>>>> +
>>>> +			if (!pages_left)
>>>> +				goto gtt;
>>>> +
>>>> +			/* Try evicting to the CPU inaccessible part of VRAM
>>>> +			 * first, but only set GTT as busy placement, so this
>>>> +			 * BO will be evicted to GTT rather than causing other
>>>> +			 * BOs to be evicted from VRAM
>>>> +			 */
>>>> +			amdgpu_ttm_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_VRAM |
>>>> +							 AMDGPU_GEM_DOMAIN_GTT);
>>>> +			abo->placements[0].fpfn = fpfn;
>>>> +			abo->placements[0].lpfn = 0;
>>>> +			abo->placement.busy_placement = &abo->placements[1];
>>>
>>> Are you sure you want to hardcode the placements index? It'll be dependent on
>>> the order set up in amdgpu_ttm_placement_init.
>>
>> Yes, see patch 1. Looping over the placements and testing their contents
>> is silly when we know exactly how they were set up. 
> 
> Agreed
> 
>> Or do you mean this code shouldn't call amdgpu_ttm_placement_from_domain at
>> all and just set up the placements itself?
> 
> Calling amdgpu_ttm_placement_from_domain makes sense. I was just imagining a
> scenario where code like this that makes assumptions about the ordering of
> placements in the array would break silently if that order were changed, and
> you'd have to go about finding the places where integer literals were used to
> address specific placements.

Right, if we make changes to amdgpu_ttm_placement_init, we'll need to
audit and possibly update its callers. I think that's reasonable, though
others might disagree. :)

Thanks for your feedback.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-05-18  9:08   ` [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first Michel Dänzer
@ 2017-05-19  3:04   ` John Brooks
       [not found]     ` <1495163086-4747-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
  2017-05-19 16:14   ` Marek Olšák
  4 siblings, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-19  3:04 UTC (permalink / raw)
  To: Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

I'm glad this is being worked on. However, somewhat to my surprise, this patch
series didn't help Dying Light's BO eviction problem. For those who don't know,
that game performs very badly in certain areas, and it is correlated with
increased TTM eviction rates. Relevant screenshots of gallium HUD and sysprof:

http://www.fastquake.com/images/screen-dlgalliumhud1-20170513-171241.png
http://www.fastquake.com/images/screen-dlsysprof-20170515-225919.png

I noticed last week that adding RADEON_DOMAIN_GTT to the domains in radeonsi
(patch: http://www.fastquake.com/files/text/radeon-gtt.txt ) greatly improved
performance in these areas, to the tune of about a 30fps increase. Obviously,
putting GTT in every buffer's domain is not a proper solution. But it lead me
to believe that perhaps the problem wasn't just the swapping of resident BOs,
but the creation of new ones that only have VRAM in their domain, and they
cause existing BOs to be evicted from visible VRAM unconditionally.

The attached patch assigns GTT as the busy placement for newly created BOs that
have the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they will go to
GTT if visible VRAM is full, instead of evicting established BOs. Since there
is no way to know what the usage patterns of a new BO will be, we shouldn't
evict established BOs (for which we have hypothetically had the opportunity to
gather usage data) from visible VRAM for new, unknown BOs.

With this patch I get hugely improved performance in Dying Light just like with
the Mesa patch: I observed 30-40fps where I got 14 before, and 60fps where I
got 40 before. TTM evictions and bytes moved have dropped to zero where they
were exceedingly high before. Buffer evictions no longer dominate the prof
trace. Screenshots:

http://www.fastquake.com/images/screen-dl-gtt_busy_only-20170518-192602.png
http://www.fastquake.com/images/screen-dlsysprof-gttpatch-20170518-223200.png

--
John Brooks (Frogging101)

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]     ` <1495163086-4747-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
@ 2017-05-19  3:04       ` John Brooks
       [not found]         ` <1495163086-4747-2-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
  2017-05-19 15:24       ` [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM Marek Olšák
  1 sibling, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-19  3:04 UTC (permalink / raw)
  To: Michel Dänzer, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Set GTT as the busy placement for newly created BOs that have the
AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
established BOs to be evicted from visible VRAM.

Signed-off-by: John Brooks <john@fastquake.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 365883d..655718a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
 #endif
 
 	amdgpu_fill_placement_to_bo(bo, placement);
+
+	/* This is a new BO that wants to be CPU-visible; set GTT as the busy
+	 * placement so that it does not evict established BOs from visible VRAM.
+	 */
+	if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
+	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+		bo->placement.num_placement = 1;
+		bo->placement.num_busy_placement = 1;
+		bo->placement.busy_placement = &bo->placement.placement[1];
+	}
+
 	/* Kernel allocation are uninterruptible */
 
 	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
@@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 	memset(&placements, 0,
 	       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
 
+
+	/* New CPU-visible BOs will have GTT set as their busy placement */
+	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
+	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
+		domain |= AMDGPU_GEM_DOMAIN_GTT;
+	}
+
 	amdgpu_ttm_placement_init(adev, &placement,
 				  placements, domain, flags);
 
-- 
2.7.4

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]         ` <1495163086-4747-2-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
@ 2017-05-19  5:37           ` zhoucm1
  2017-05-19  7:03           ` Michel Dänzer
  1 sibling, 0 replies; 39+ messages in thread
From: zhoucm1 @ 2017-05-19  5:37 UTC (permalink / raw)
  To: John Brooks, Michel Dänzer,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW



On 2017年05月19日 11:04, John Brooks wrote:
> Set GTT as the busy placement for newly created BOs that have the
> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
> established BOs to be evicted from visible VRAM.
You this patch is consistent with my opinion I mentioned in Marek thread 
(Plan: BO move throttling for visible VRAM evictions).
"I agree on this opinion, from our performance tuning experiment, this 
case indeed often happen especially under vram memory pressure. 
redirecting to GTT is better than heavy eviction between VRAM and GTT.
But we should get a condition for redirecting (eviction counter?), 
otherwise, BO have no change back to prefer domain."
You're using busy placement to solve it, then BO is trying its prefer 
domain first, so it has chance back to prefer domain when memory is free.
Although it cannot prevent evil program occupying memory, it still looks 
reasonable, since we can gain more from heavy eviction under vram 
pressure in most cases.

Above all, I vote 'Yes' to your idea, but I also want to listen what 
others' voice is.

Thanks,
David Zhou
>
> Signed-off-by: John Brooks <john@fastquake.com>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 365883d..655718a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>   #endif
>   
>   	amdgpu_fill_placement_to_bo(bo, placement);
> +
> +	/* This is a new BO that wants to be CPU-visible; set GTT as the busy
> +	 * placement so that it does not evict established BOs from visible VRAM.
> +	 */
> +	if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> +		bo->placement.num_placement = 1;
> +		bo->placement.num_busy_placement = 1;
> +		bo->placement.busy_placement = &bo->placement.placement[1];
> +	}
> +
>   	/* Kernel allocation are uninterruptible */
>   
>   	initial_bytes_moved = atomic64_read(&adev->num_bytes_moved);
> @@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   	memset(&placements, 0,
>   	       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
>   
> +
> +	/* New CPU-visible BOs will have GTT set as their busy placement */
> +	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> +		domain |= AMDGPU_GEM_DOMAIN_GTT;
> +	}
> +
>   	amdgpu_ttm_placement_init(adev, &placement,
>   				  placements, domain, flags);
>   

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]         ` <1495163086-4747-2-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
  2017-05-19  5:37           ` zhoucm1
@ 2017-05-19  7:03           ` Michel Dänzer
       [not found]             ` <b2ecb0f8-cd1b-9a00-6ba6-082674fde776-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-19  7:03 UTC (permalink / raw)
  To: John Brooks; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 19/05/17 12:04 PM, John Brooks wrote:
> Set GTT as the busy placement for newly created BOs that have the
> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
> established BOs to be evicted from visible VRAM.

This is an interesting idea, but there are some issues with this patch.


> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 365883d..655718a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>  #endif
>  
>  	amdgpu_fill_placement_to_bo(bo, placement);
> +
> +	/* This is a new BO that wants to be CPU-visible; set GTT as the busy
> +	 * placement so that it does not evict established BOs from visible VRAM.
> +	 */
> +	if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&

Should be something like

	if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&

otherwise it would also match e.g. BOs with domain ==
AMDGPU_GEM_DOMAIN_GTT and the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag set (not that this makes sense, but there's nothing to prevent it).


> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> +		bo->placement.num_placement = 1;
> +		bo->placement.num_busy_placement = 1;
> +		bo->placement.busy_placement = &bo->placement.placement[1];
> +	}

The original placements set by amdgpu_fill_placement_to_bo need to be
restored before returning from this function, otherwise I suspect such
BOs which end up being created in GTT will stay there permanently.

Does the patch still help for Dying Light if you fix this?

The patch as is doesn't seem to make any difference for my dirt-rally
test case.


> @@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>  	memset(&placements, 0,
>  	       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
>  
> +
> +	/* New CPU-visible BOs will have GTT set as their busy placement */
> +	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {

Make this

	if ((domain & AMDGPU_GEM_DOMAIN_VRAM) &&
	    (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {

to match the coding style.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]     ` <1495163086-4747-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
  2017-05-19  3:04       ` [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full John Brooks
@ 2017-05-19 15:24       ` Marek Olšák
       [not found]         ` <CAAxE2A6pcm8EiXDRO2kO64TWJRR4BVrzbpiVrLdzhh3uERzTUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-19 15:24 UTC (permalink / raw)
  To: John Brooks; +Cc: Michel Dänzer, amd-gfx mailing list

Where is your "attached" patch?

Marek

On Fri, May 19, 2017 at 5:04 AM, John Brooks <john@fastquake.com> wrote:
> I'm glad this is being worked on. However, somewhat to my surprise, this patch
> series didn't help Dying Light's BO eviction problem. For those who don't know,
> that game performs very badly in certain areas, and it is correlated with
> increased TTM eviction rates. Relevant screenshots of gallium HUD and sysprof:
>
> http://www.fastquake.com/images/screen-dlgalliumhud1-20170513-171241.png
> http://www.fastquake.com/images/screen-dlsysprof-20170515-225919.png
>
> I noticed last week that adding RADEON_DOMAIN_GTT to the domains in radeonsi
> (patch: http://www.fastquake.com/files/text/radeon-gtt.txt ) greatly improved
> performance in these areas, to the tune of about a 30fps increase. Obviously,
> putting GTT in every buffer's domain is not a proper solution. But it lead me
> to believe that perhaps the problem wasn't just the swapping of resident BOs,
> but the creation of new ones that only have VRAM in their domain, and they
> cause existing BOs to be evicted from visible VRAM unconditionally.
>
> The attached patch assigns GTT as the busy placement for newly created BOs that
> have the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they will go to
> GTT if visible VRAM is full, instead of evicting established BOs. Since there
> is no way to know what the usage patterns of a new BO will be, we shouldn't
> evict established BOs (for which we have hypothetically had the opportunity to
> gather usage data) from visible VRAM for new, unknown BOs.
>
> With this patch I get hugely improved performance in Dying Light just like with
> the Mesa patch: I observed 30-40fps where I got 14 before, and 60fps where I
> got 40 before. TTM evictions and bytes moved have dropped to zero where they
> were exceedingly high before. Buffer evictions no longer dominate the prof
> trace. Screenshots:
>
> http://www.fastquake.com/images/screen-dl-gtt_busy_only-20170518-192602.png
> http://www.fastquake.com/images/screen-dlsysprof-gttpatch-20170518-223200.png
>
> --
> John Brooks (Frogging101)
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]         ` <CAAxE2A6pcm8EiXDRO2kO64TWJRR4BVrzbpiVrLdzhh3uERzTUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-19 15:27           ` John Brooks
       [not found]             ` <20170519152732.GA11484-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-19 15:27 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Michel Dänzer, amd-gfx mailing list

On Fri, May 19, 2017 at 05:24:36PM +0200, Marek Olšák wrote:
> Where is your "attached" patch?
> 
> Marek

It's actually a reply to my message. Sorry if that was unclear.

> 
> On Fri, May 19, 2017 at 5:04 AM, John Brooks <john@fastquake.com> wrote:
> > I'm glad this is being worked on. However, somewhat to my surprise, this patch
> > series didn't help Dying Light's BO eviction problem. For those who don't know,
> > that game performs very badly in certain areas, and it is correlated with
> > increased TTM eviction rates. Relevant screenshots of gallium HUD and sysprof:
> >
> > http://www.fastquake.com/images/screen-dlgalliumhud1-20170513-171241.png
> > http://www.fastquake.com/images/screen-dlsysprof-20170515-225919.png
> >
> > I noticed last week that adding RADEON_DOMAIN_GTT to the domains in radeonsi
> > (patch: http://www.fastquake.com/files/text/radeon-gtt.txt ) greatly improved
> > performance in these areas, to the tune of about a 30fps increase. Obviously,
> > putting GTT in every buffer's domain is not a proper solution. But it lead me
> > to believe that perhaps the problem wasn't just the swapping of resident BOs,
> > but the creation of new ones that only have VRAM in their domain, and they
> > cause existing BOs to be evicted from visible VRAM unconditionally.
> >
> > The attached patch assigns GTT as the busy placement for newly created BOs that
> > have the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they will go to
> > GTT if visible VRAM is full, instead of evicting established BOs. Since there
> > is no way to know what the usage patterns of a new BO will be, we shouldn't
> > evict established BOs (for which we have hypothetically had the opportunity to
> > gather usage data) from visible VRAM for new, unknown BOs.
> >
> > With this patch I get hugely improved performance in Dying Light just like with
> > the Mesa patch: I observed 30-40fps where I got 14 before, and 60fps where I
> > got 40 before. TTM evictions and bytes moved have dropped to zero where they
> > were exceedingly high before. Buffer evictions no longer dominate the prof
> > trace. Screenshots:
> >
> > http://www.fastquake.com/images/screen-dl-gtt_busy_only-20170518-192602.png
> > http://www.fastquake.com/images/screen-dlsysprof-gttpatch-20170518-223200.png
> >
> > --
> > John Brooks (Frogging101)
> >
> > _______________________________________________
> > amd-gfx mailing list
> > amd-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/amd-gfx

--
John Brooks
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]             ` <20170519152732.GA11484-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
@ 2017-05-19 15:47               ` Marek Olšák
       [not found]                 ` <CAAxE2A77Yu9c02Mu-wK3HD_gAKvPbN8q7aEtHcxrcOUhGG4Pfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-19 15:47 UTC (permalink / raw)
  To: John Brooks; +Cc: Michel Dänzer, amd-gfx mailing list

On Fri, May 19, 2017 at 5:27 PM, John Brooks <john@fastquake.com> wrote:
> On Fri, May 19, 2017 at 05:24:36PM +0200, Marek Olšák wrote:
>> Where is your "attached" patch?
>>
>> Marek
>
> It's actually a reply to my message. Sorry if that was unclear.

That's OK, but I don't see any patch from you here.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                 ` <CAAxE2A77Yu9c02Mu-wK3HD_gAKvPbN8q7aEtHcxrcOUhGG4Pfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-19 15:49                   ` Marek Olšák
  0 siblings, 0 replies; 39+ messages in thread
From: Marek Olšák @ 2017-05-19 15:49 UTC (permalink / raw)
  To: John Brooks; +Cc: Michel Dänzer, amd-gfx mailing list

On Fri, May 19, 2017 at 5:47 PM, Marek Olšák <maraeo@gmail.com> wrote:
> On Fri, May 19, 2017 at 5:27 PM, John Brooks <john@fastquake.com> wrote:
>> On Fri, May 19, 2017 at 05:24:36PM +0200, Marek Olšák wrote:
>>> Where is your "attached" patch?
>>>
>>> Marek
>>
>> It's actually a reply to my message. Sorry if that was unclear.
>
> That's OK, but I don't see any patch from you here.

Nevermind. I guess it's "drm/amdgpu: Place new CPU-accessbile BOs in
GTT if visible VRAM is full", which is a different thread in Gmail.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]             ` <b2ecb0f8-cd1b-9a00-6ba6-082674fde776-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-19 15:52               ` Marek Olšák
       [not found]                 ` <CAAxE2A7DjWR8x1WYE=Fgqn7vZ7hyRdQHA3wKo_HcNdZ14KdmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-05-19 19:23               ` John Brooks
  1 sibling, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-19 15:52 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: John Brooks, amd-gfx mailing list

On Fri, May 19, 2017 at 9:03 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 19/05/17 12:04 PM, John Brooks wrote:
>> Set GTT as the busy placement for newly created BOs that have the
>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
>> established BOs to be evicted from visible VRAM.
>
> This is an interesting idea, but there are some issues with this patch.
>
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 365883d..655718a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>>  #endif
>>
>>       amdgpu_fill_placement_to_bo(bo, placement);
>> +
>> +     /* This is a new BO that wants to be CPU-visible; set GTT as the busy
>> +      * placement so that it does not evict established BOs from visible VRAM.
>> +      */
>> +     if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
>
> Should be something like
>
>         if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
>
> otherwise it would also match e.g. BOs with domain ==
> AMDGPU_GEM_DOMAIN_GTT and the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> flag set (not that this makes sense, but there's nothing to prevent it).

I think it should be like this, which trivially rules out GTT and
VRAM|GTT cases:
if (domain == AMDGPU_GEM_DOMAIN_VRAM &&

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
                     ` (3 preceding siblings ...)
  2017-05-19  3:04   ` [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM John Brooks
@ 2017-05-19 16:14   ` Marek Olšák
       [not found]     ` <CAAxE2A65deE+hue7oPMQ+=K5s8TEGZE+bJMNVOdp=LQuq_RnMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  4 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-19 16:14 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx mailing list

Hi Michel,

I've applied your series and it doesn't help with low Dirt Rally
performance on Fiji. I see TTM buffer moves at 800MB/s and many VRAM
page faults.

Marek

On Thu, May 18, 2017 at 11:08 AM, Michel Dänzer <michel@daenzer.net> wrote:
> From: Michel Dänzer <michel.daenzer@amd.com>
>
> This series was developed and tested under the following scenario:
>
> Running the PTS dirt-rally benchmark (1920x1080, Ultra) on Tonga with
> 2G, with CPU visible VRAM artificially restricted to 64 MB.
>
> Without this series, there's a lot of stutter during about the first
> minute of a benchmark run. During this time there are significant amounts
> of buffer moves (starting from about 500 MB on the HUD) and evictions,
> gradually declining until the buffer moves settle around 8 MB on the HUD.
>
> With this series, there's only slight stutter during the first seconds
> after the car launches, even though the buffer move volume is about the
> same as without the series. Buffer evictions are eliminated almost
> completely, except for a few at the beginning. Buffer moves still settle
> around 8 MB on the HUD, but with less variance than before.
>
> Note that there's only little if any improvement of the average framerate
> reported, but the minimum framerate as seen on the HUD goes from ~10 fps
> to ~17.
>
>
> Patch 1 is a cleanup that I noticed along the way.
>
> Patch 2 makes the main difference for the above scenario.
>
> Patch 3 doesn't make as much difference, I'm fine with it not landing at
> least for now.
>
> Michel Dänzer (3):
>   drm/amdgpu: Drop useless loops for placement restrictions
>   drm/amdgpu: Don't evict other BOs from VRAM for page faults
>   drm/amdgpu: Try evicting from CPU visible to invisible VRAM first
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 42 ++++++++++++---------------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 46 ++++++++++++++++++++----------
>  drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c    |  6 +---
>  3 files changed, 51 insertions(+), 43 deletions(-)
>
> --
> 2.11.0
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]             ` <b2ecb0f8-cd1b-9a00-6ba6-082674fde776-otUistvHUpPR7s880joybQ@public.gmane.org>
  2017-05-19 15:52               ` Marek Olšák
@ 2017-05-19 19:23               ` John Brooks
       [not found]                 ` <20170519192352.GA21642-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-19 19:23 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Fri, May 19, 2017 at 04:03:28PM +0900, Michel Dänzer wrote:
> On 19/05/17 12:04 PM, John Brooks wrote:
> > Set GTT as the busy placement for newly created BOs that have the
> > AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
> > established BOs to be evicted from visible VRAM.
> 
> This is an interesting idea, but there are some issues with this patch.
> 
> 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > index 365883d..655718a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> > @@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
> >  #endif
> >  
> >  	amdgpu_fill_placement_to_bo(bo, placement);
> > +
> > +	/* This is a new BO that wants to be CPU-visible; set GTT as the busy
> > +	 * placement so that it does not evict established BOs from visible VRAM.
> > +	 */
> > +	if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
> 
> Should be something like
> 
> 	if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
> 
> otherwise it would also match e.g. BOs with domain ==
> AMDGPU_GEM_DOMAIN_GTT and the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
> flag set (not that this makes sense, but there's nothing to prevent it).
> 
> 
> > +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> > +		bo->placement.num_placement = 1;
> > +		bo->placement.num_busy_placement = 1;
> > +		bo->placement.busy_placement = &bo->placement.placement[1];
> > +	}
> 
> The original placements set by amdgpu_fill_placement_to_bo need to be
> restored before returning from this function, otherwise I suspect such
> BOs which end up being created in GTT will stay there permanently.
> 

I'm curious, what makes you think that the BOs cannot move back to VRAM
otherwise? VRAM is still in the placements and prefered/allowed domains, just
not in the busy placements.

> Does the patch still help for Dying Light if you fix this?
> 
> The patch as is doesn't seem to make any difference for my dirt-rally
> test case.
> 
> 
> > @@ -484,6 +495,13 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
> >  	memset(&placements, 0,
> >  	       (AMDGPU_GEM_DOMAIN_MAX + 1) * sizeof(struct ttm_place));
> >  
> > +
> > +	/* New CPU-visible BOs will have GTT set as their busy placement */
> > +	if (domain & AMDGPU_GEM_DOMAIN_VRAM &&
> > +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> 
> Make this
> 
> 	if ((domain & AMDGPU_GEM_DOMAIN_VRAM) &&
> 	    (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED)) {
> 
> to match the coding style.
> 
> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer

Thanks very much for your feedback. I'll send an updated patch soon.

--
John Brooks
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]                 ` <CAAxE2A7DjWR8x1WYE=Fgqn7vZ7hyRdQHA3wKo_HcNdZ14KdmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-20  1:22                   ` Michel Dänzer
  0 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-20  1:22 UTC (permalink / raw)
  To: Marek Olšák; +Cc: John Brooks, amd-gfx mailing list

On 20/05/17 12:52 AM, Marek Olšák wrote:
> On Fri, May 19, 2017 at 9:03 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 19/05/17 12:04 PM, John Brooks wrote:
>>> Set GTT as the busy placement for newly created BOs that have the
>>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
>>> established BOs to be evicted from visible VRAM.
>>
>> This is an interesting idea, but there are some issues with this patch.
>>
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 365883d..655718a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -392,6 +392,17 @@ int amdgpu_bo_create_restricted(struct amdgpu_device *adev,
>>>  #endif
>>>
>>>       amdgpu_fill_placement_to_bo(bo, placement);
>>> +
>>> +     /* This is a new BO that wants to be CPU-visible; set GTT as the busy
>>> +      * placement so that it does not evict established BOs from visible VRAM.
>>> +      */
>>> +     if (domain & (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
>>
>> Should be something like
>>
>>         if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT) &&
>>
>> otherwise it would also match e.g. BOs with domain ==
>> AMDGPU_GEM_DOMAIN_GTT and the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
>> flag set (not that this makes sense, but there's nothing to prevent it).
> 
> I think it should be like this, which trivially rules out GTT and
> VRAM|GTT cases:
> if (domain == AMDGPU_GEM_DOMAIN_VRAM &&

That won't work as is due to the second hunk of the patch, which adds in
AMDGPU_GEM_DOMAIN_GTT before calling this function.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]     ` <CAAxE2A65deE+hue7oPMQ+=K5s8TEGZE+bJMNVOdp=LQuq_RnMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-20  1:26       ` Michel Dänzer
       [not found]         ` <CAAxE2A6ZiNSswhArstqHpYnm30ieoDZj=j_-aFvZz1_ByXLOGA@mail.gmail.com>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-20  1:26 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 20/05/17 01:14 AM, Marek Olšák wrote:
> Hi Michel,
> 
> I've applied your series

Thanks for testing it.

> and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
> buffer moves at 800MB/s and many VRAM page faults.

Did you see this:

>> Note that there's only little if any improvement of the average framerate
>> reported, but the minimum framerate as seen on the HUD goes from ~10 fps
>> to ~17.

I.e. it mostly affects the minimum framerate and smoothness for me as well.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]                 ` <20170519192352.GA21642-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
@ 2017-05-20  1:27                   ` Michel Dänzer
       [not found]                     ` <d0535946-a10d-3b77-0d94-c52098dce021-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-20  1:27 UTC (permalink / raw)
  To: John Brooks; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 20/05/17 04:23 AM, John Brooks wrote:
> On Fri, May 19, 2017 at 04:03:28PM +0900, Michel Dänzer wrote:
>> On 19/05/17 12:04 PM, John Brooks wrote:
>>> Set GTT as the busy placement for newly created BOs that have the
>>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
>>> established BOs to be evicted from visible VRAM.
>>
>> This is an interesting idea, but there are some issues with this patch.

[...]

>>> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>>> +		bo->placement.num_placement = 1;
>>> +		bo->placement.num_busy_placement = 1;
>>> +		bo->placement.busy_placement = &bo->placement.placement[1];
>>> +	}
>>
>> The original placements set by amdgpu_fill_placement_to_bo need to be
>> restored before returning from this function, otherwise I suspect such
>> BOs which end up being created in GTT will stay there permanently.
>>
> 
> I'm curious, what makes you think that the BOs cannot move back to VRAM
> otherwise? VRAM is still in the placements and prefered/allowed domains, just
> not in the busy placements.

If there is not enough free space when the BO is created, there probably
won't be either when it's validated for GPU command streams later.


>> Does the patch still help for Dying Light if you fix this?

Please test this. The result should tell us whether the problem with
Dying Light is really pressure on CPU visible VRAM, or something else.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]                     ` <d0535946-a10d-3b77-0d94-c52098dce021-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-20  3:21                       ` John Brooks
       [not found]                         ` <20170520032101.GA16371-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: John Brooks @ 2017-05-20  3:21 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Sat, May 20, 2017 at 10:27:14AM +0900, Michel Dänzer wrote:
> On 20/05/17 04:23 AM, John Brooks wrote:
> > On Fri, May 19, 2017 at 04:03:28PM +0900, Michel Dänzer wrote:
> >> On 19/05/17 12:04 PM, John Brooks wrote:
> >>> Set GTT as the busy placement for newly created BOs that have the
> >>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
> >>> established BOs to be evicted from visible VRAM.
> >>
> >> This is an interesting idea, but there are some issues with this patch.
> 
> [...]
> 
> >>> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
> >>> +		bo->placement.num_placement = 1;
> >>> +		bo->placement.num_busy_placement = 1;
> >>> +		bo->placement.busy_placement = &bo->placement.placement[1];
> >>> +	}
> >>
> >> The original placements set by amdgpu_fill_placement_to_bo need to be
> >> restored before returning from this function, otherwise I suspect such
> >> BOs which end up being created in GTT will stay there permanently.
> >>
> > 
> > I'm curious, what makes you think that the BOs cannot move back to VRAM
> > otherwise? VRAM is still in the placements and prefered/allowed domains, just
> > not in the busy placements.
> 
> If there is not enough free space when the BO is created, there probably
> won't be either when it's validated for GPU command streams later.
> 
> 
> >> Does the patch still help for Dying Light if you fix this?
> 
> Please test this. The result should tell us whether the problem with
> Dying Light is really pressure on CPU visible VRAM, or something else.
> 

I did some tests. The patch still helps if I restore the old placement values
after ttm_bo_init_reserved. But while doing this, I made another observation
that throws a wrench into things: It *does* kill performance if I remove
AMDGPU_GEM_DOMAIN_GTT from bo->prefered_domains. I think that GTT getting into
prefered_domains was what "fixed" the game this whole time and the BO creation
was irrelevant. Oh well, it was worth a try.

I'll see what else I can find out.

John

> 
> -- 
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]             ` <CAAxE2A4zdgFRh-GDqyDRrXmEWqjSrkfCo=Z41hfo-UNY0E5FhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-20  9:26               ` Marek Olšák
       [not found]                 ` <CAAxE2A67T3Mhw68thNewFtcdua3ufSP-FDX_RN153x-aLKUh1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-20  9:26 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx mailing list


[-- Attachment #1.1: Type: text/plain, Size: 1013 bytes --]

On May 20, 2017 3:26 AM, "Michel Dänzer" <michel-otUistvHUpPR7s880joybQ@public.gmane.org> wrote:

On 20/05/17 01:14 AM, Marek Olšák wrote:
> Hi Michel,
>
> I've applied your series

Thanks for testing it.

> and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
> buffer moves at 800MB/s and many VRAM page faults.

Did you see this:

>> Note that there's only little if any improvement of the average framerate
>> reported, but the minimum framerate as seen on the HUD goes from ~10 fps
>> to ~17.

I.e. it mostly affects the minimum framerate and smoothness for me as well.


Without the series, I get 70 average fps. With the series, I get 30 average
fps. That might just be random bad luck. I don't know. In any case, 30 fps
is really bad, so I don't think the series does what you think it does.

Marek



--
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer

[-- Attachment #1.2: Type: text/html, Size: 1994 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                 ` <CAAxE2A67T3Mhw68thNewFtcdua3ufSP-FDX_RN153x-aLKUh1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-20  9:35                   ` Michel Dänzer
  2017-05-22 10:00                   ` Michel Dänzer
  1 sibling, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-20  9:35 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 20/05/17 06:26 PM, Marek Olšák wrote:
> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
> <mailto:michel@daenzer.net>> wrote:
> 
>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>     > Hi Michel,
>     >
>     > I've applied your series
> 
>     Thanks for testing it.
> 
>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>     > buffer moves at 800MB/s and many VRAM page faults.
> 
>     Did you see this:
> 
>     >> Note that there's only little if any improvement of the average
>     framerate
>     >> reported, but the minimum framerate as seen on the HUD goes from
>     ~10 fps
>     >> to ~17.
> 
>     I.e. it mostly affects the minimum framerate and smoothness for me
>     as well.
> 
> 
> Without the series, I get 70 average fps. With the series, I get 30
> average fps. That might just be random bad luck. I don't know. In any
> case, 30 fps is really bad, so I don't think the series does what you
> think it does.

Ah, yeah, sure. "Doesn't help" sounded like it just didn't improve
things. :)

How about with only patches 1 & 2?


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full
       [not found]                         ` <20170520032101.GA16371-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
@ 2017-05-22  1:30                           ` Michel Dänzer
  0 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-22  1:30 UTC (permalink / raw)
  To: John Brooks; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On 20/05/17 12:21 PM, John Brooks wrote:
> On Sat, May 20, 2017 at 10:27:14AM +0900, Michel Dänzer wrote:
>> On 20/05/17 04:23 AM, John Brooks wrote:
>>> On Fri, May 19, 2017 at 04:03:28PM +0900, Michel Dänzer wrote:
>>>> On 19/05/17 12:04 PM, John Brooks wrote:
>>>>> Set GTT as the busy placement for newly created BOs that have the
>>>>> AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED flag, so that they don't cause
>>>>> established BOs to be evicted from visible VRAM.
>>>>
>>>> This is an interesting idea, but there are some issues with this patch.
>>
>> [...]
>>
>>>>> +	    flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>>>>> +		bo->placement.num_placement = 1;
>>>>> +		bo->placement.num_busy_placement = 1;
>>>>> +		bo->placement.busy_placement = &bo->placement.placement[1];
>>>>> +	}
>>>>
>>>> The original placements set by amdgpu_fill_placement_to_bo need to be
>>>> restored before returning from this function, otherwise I suspect such
>>>> BOs which end up being created in GTT will stay there permanently.
>>>>
>>>
>>> I'm curious, what makes you think that the BOs cannot move back to VRAM
>>> otherwise? VRAM is still in the placements and prefered/allowed domains, just
>>> not in the busy placements.
>>
>> If there is not enough free space when the BO is created, there probably
>> won't be either when it's validated for GPU command streams later.
>>
>>
>>>> Does the patch still help for Dying Light if you fix this?
>>
>> Please test this. The result should tell us whether the problem with
>> Dying Light is really pressure on CPU visible VRAM, or something else.
>>
> 
> I did some tests. The patch still helps if I restore the old placement values
> after ttm_bo_init_reserved. But while doing this, I made another observation
> that throws a wrench into things: It *does* kill performance if I remove
> AMDGPU_GEM_DOMAIN_GTT from bo->prefered_domains. I think that GTT getting into
> prefered_domains was what "fixed" the game this whole time and the BO creation
> was irrelevant.

Ah, right, so the problem I was thinking of was actually caused by the
second hunk, not the first one. :) amdgpu_cs_bo_validate re-generates
the placements for each command stream according to bo->prefered_domains
or bo->allowed_domains.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                 ` <CAAxE2A67T3Mhw68thNewFtcdua3ufSP-FDX_RN153x-aLKUh1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-05-20  9:35                   ` Michel Dänzer
@ 2017-05-22 10:00                   ` Michel Dänzer
       [not found]                     ` <7649341f-6407-1a12-9674-fb65076125fc-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-22 10:00 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 20/05/17 06:26 PM, Marek Olšák wrote:
> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
> <mailto:michel@daenzer.net>> wrote:
> 
>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>     > Hi Michel,
>     >
>     > I've applied your series
> 
>     Thanks for testing it.
> 
>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>     > buffer moves at 800MB/s and many VRAM page faults.
> 
>     Did you see this:
> 
>     >> Note that there's only little if any improvement of the average
>     framerate
>     >> reported, but the minimum framerate as seen on the HUD goes from
>     ~10 fps
>     >> to ~17.
> 
>     I.e. it mostly affects the minimum framerate and smoothness for me
>     as well.
> 
> 
> Without the series, I get 70 average fps. With the series, I get 30
> average fps. That might just be random bad luck. I don't know.

Hmm, yeah, maybe that was just one of the random slowdowns you've been
talking about in other threads and on IRC?

I can't reproduce any slowdown with these patches, even leaving visible
VRAM size at 256 MB.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                     ` <7649341f-6407-1a12-9674-fb65076125fc-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-22 10:09                       ` Marek Olšák
       [not found]                         ` <CAAxE2A446KFb+CiwdZusvdgRSfQS8jLgsCy4K7L0_tc00pvWKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-22 10:09 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx mailing list

On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
> On 20/05/17 06:26 PM, Marek Olšák wrote:
>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>> <mailto:michel@daenzer.net>> wrote:
>>
>>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>>     > Hi Michel,
>>     >
>>     > I've applied your series
>>
>>     Thanks for testing it.
>>
>>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>     > buffer moves at 800MB/s and many VRAM page faults.
>>
>>     Did you see this:
>>
>>     >> Note that there's only little if any improvement of the average
>>     framerate
>>     >> reported, but the minimum framerate as seen on the HUD goes from
>>     ~10 fps
>>     >> to ~17.
>>
>>     I.e. it mostly affects the minimum framerate and smoothness for me
>>     as well.
>>
>>
>> Without the series, I get 70 average fps. With the series, I get 30
>> average fps. That might just be random bad luck. I don't know.
>
> Hmm, yeah, maybe that was just one of the random slowdowns you've been
> talking about in other threads and on IRC?
>
> I can't reproduce any slowdown with these patches, even leaving visible
> VRAM size at 256 MB.

The random slowdowns with Dirt Rally are only caused by the pressure
on visible VRAM. This whole thread is about those random slowdowns. If
you're saying "maybe it was just one of the random slowdowns", you're
saying "maybe it was just the visible VRAM pressure". It's only
random with Dirt Rally, which makes it difficult to believe statements
such as "I can't reproduce any slowdown". It's not random with Dying
Light.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                         ` <CAAxE2A446KFb+CiwdZusvdgRSfQS8jLgsCy4K7L0_tc00pvWKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-22 18:20                           ` John Brooks
  2017-05-23  0:45                           ` Michel Dänzer
  1 sibling, 0 replies; 39+ messages in thread
From: John Brooks @ 2017-05-22 18:20 UTC (permalink / raw)
  To: Marek Olšák, Michel Dänzer; +Cc: amd-gfx mailing list

On Mon, May 22, 2017 at 12:09:21PM +0200, Marek Olšák wrote:
> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
> > On 20/05/17 06:26 PM, Marek Olšák wrote:
> >> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
> >> <mailto:michel@daenzer.net>> wrote:
> >>
> >>     On 20/05/17 01:14 AM, Marek Olšák wrote:
> >>     > Hi Michel,
> >>     >
> >>     > I've applied your series
> >>
> >>     Thanks for testing it.
> >>
> >>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
> >>     > buffer moves at 800MB/s and many VRAM page faults.
> >>
> >>     Did you see this:
> >>
> >>     >> Note that there's only little if any improvement of the average
> >>     framerate
> >>     >> reported, but the minimum framerate as seen on the HUD goes from
> >>     ~10 fps
> >>     >> to ~17.
> >>
> >>     I.e. it mostly affects the minimum framerate and smoothness for me
> >>     as well.
> >>
> >>
> >> Without the series, I get 70 average fps. With the series, I get 30
> >> average fps. That might just be random bad luck. I don't know.
> >
> > Hmm, yeah, maybe that was just one of the random slowdowns you've been
> > talking about in other threads and on IRC?
> >
> > I can't reproduce any slowdown with these patches, even leaving visible
> > VRAM size at 256 MB.
> 
> The random slowdowns with Dirt Rally are only caused by the pressure
> on visible VRAM. This whole thread is about those random slowdowns. If
> you're saying "maybe it was just one of the random slowdowns", you're
> saying "maybe it was just the visible VRAM pressure". It's only
> random with Dirt Rally, which makes it difficult to believe statements
> such as "I can't reproduce any slowdown". It's not random with Dying
> Light.
> 
> Marek

For what it's worth, the best place to reproduce it in Dying Light is the
courtyard where Zere's trailer is. It's the first place you go after leaving
the Tower for the first time at the start of the game. The Tower lobby is
another good place. Outside of these areas the slowdown may not be apparent at
all.

John

> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                         ` <CAAxE2A446KFb+CiwdZusvdgRSfQS8jLgsCy4K7L0_tc00pvWKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2017-05-22 18:20                           ` John Brooks
@ 2017-05-23  0:45                           ` Michel Dänzer
       [not found]                             ` <c3dfa73a-e0c1-c6e9-0264-3c52e444c297-otUistvHUpPR7s880joybQ@public.gmane.org>
  1 sibling, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-23  0:45 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 22/05/17 07:09 PM, Marek Olšák wrote:
> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>>> <mailto:michel@daenzer.net>> wrote:
>>>
>>>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>     > Hi Michel,
>>>     >
>>>     > I've applied your series
>>>
>>>     Thanks for testing it.
>>>
>>>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>>     > buffer moves at 800MB/s and many VRAM page faults.
>>>
>>>     Did you see this:
>>>
>>>     >> Note that there's only little if any improvement of the average
>>>     framerate
>>>     >> reported, but the minimum framerate as seen on the HUD goes from
>>>     ~10 fps
>>>     >> to ~17.
>>>
>>>     I.e. it mostly affects the minimum framerate and smoothness for me
>>>     as well.
>>>
>>>
>>> Without the series, I get 70 average fps. With the series, I get 30
>>> average fps. That might just be random bad luck. I don't know.
>>
>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>> talking about in other threads and on IRC?
>>
>> I can't reproduce any slowdown with these patches, even leaving visible
>> VRAM size at 256 MB.
> 
> The random slowdowns with Dirt Rally are only caused by the pressure
> on visible VRAM. This whole thread is about those random slowdowns.

No, this thread is about the scenario described in the cover letter of
this patch series.


> If you're saying "maybe it was just one of the random slowdowns", you're
> saying "maybe it was just the visible VRAM pressure". It's only
> random with Dirt Rally, which makes it difficult to believe statements
> such as "I can't reproduce any slowdown".

I could say the same thing about you seeing random slowdowns... I've
never seen that, I had to artificially limit the size of visible VRAM to
64 MB to make it significantly affect the benchmark result.

How many times do you need to run the benchmark on average to hit a
random slowdown? Which desktop environment and other X clients are
running during the benchmark? Which tab is active in the Steam window
while the benchmark runs?

In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
the library.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                             ` <c3dfa73a-e0c1-c6e9-0264-3c52e444c297-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-23 10:38                               ` Marek Olšák
       [not found]                                 ` <CAAxE2A7tKvvmZ+qf8f06FJODWNLgHcv-XPiJJgLana-PcXBR_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-23 10:38 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx mailing list

On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 22/05/17 07:09 PM, Marek Olšák wrote:
>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>>>> <mailto:michel@daenzer.net>> wrote:
>>>>
>>>>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>>     > Hi Michel,
>>>>     >
>>>>     > I've applied your series
>>>>
>>>>     Thanks for testing it.
>>>>
>>>>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>>>     > buffer moves at 800MB/s and many VRAM page faults.
>>>>
>>>>     Did you see this:
>>>>
>>>>     >> Note that there's only little if any improvement of the average
>>>>     framerate
>>>>     >> reported, but the minimum framerate as seen on the HUD goes from
>>>>     ~10 fps
>>>>     >> to ~17.
>>>>
>>>>     I.e. it mostly affects the minimum framerate and smoothness for me
>>>>     as well.
>>>>
>>>>
>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>> average fps. That might just be random bad luck. I don't know.
>>>
>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>> talking about in other threads and on IRC?
>>>
>>> I can't reproduce any slowdown with these patches, even leaving visible
>>> VRAM size at 256 MB.
>>
>> The random slowdowns with Dirt Rally are only caused by the pressure
>> on visible VRAM. This whole thread is about those random slowdowns.
>
> No, this thread is about the scenario described in the cover letter of
> this patch series.
>
>
>> If you're saying "maybe it was just one of the random slowdowns", you're
>> saying "maybe it was just the visible VRAM pressure". It's only
>> random with Dirt Rally, which makes it difficult to believe statements
>> such as "I can't reproduce any slowdown".
>
> I could say the same thing about you seeing random slowdowns... I've
> never seen that, I had to artificially limit the size of visible VRAM to
> 64 MB to make it significantly affect the benchmark result.
>
> How many times do you need to run the benchmark on average to hit a
> random slowdown? Which desktop environment and other X clients are
> running during the benchmark? Which tab is active in the Steam window
> while the benchmark runs?
>
> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
> the library.

Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
Dirt Rally.

Every single time I run the game with this series, I get 700-1000MB/s
of TTM BO moves. There doesn't seem to be any randomness.

It was better without this series. (meaning it was sometimes OK, sometimes bad)

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                 ` <CAAxE2A7tKvvmZ+qf8f06FJODWNLgHcv-XPiJJgLana-PcXBR_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-24  7:56                                   ` Michel Dänzer
       [not found]                                     ` <f04e8239-e8cb-1c6c-a0ae-581facadb3cf-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-24  7:56 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

On 23/05/17 07:38 PM, Marek Olšák wrote:
> On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 22/05/17 07:09 PM, Marek Olšák wrote:
>>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>>>>> <mailto:michel@daenzer.net>> wrote:
>>>>>
>>>>>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>>>     > Hi Michel,
>>>>>     >
>>>>>     > I've applied your series
>>>>>
>>>>>     Thanks for testing it.
>>>>>
>>>>>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>>>>     > buffer moves at 800MB/s and many VRAM page faults.
>>>>>
>>>>>     Did you see this:
>>>>>
>>>>>     >> Note that there's only little if any improvement of the average
>>>>>     framerate
>>>>>     >> reported, but the minimum framerate as seen on the HUD goes from
>>>>>     ~10 fps
>>>>>     >> to ~17.
>>>>>
>>>>>     I.e. it mostly affects the minimum framerate and smoothness for me
>>>>>     as well.
>>>>>
>>>>>
>>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>>> average fps. That might just be random bad luck. I don't know.
>>>>
>>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>>> talking about in other threads and on IRC?
>>>>
>>>> I can't reproduce any slowdown with these patches, even leaving visible
>>>> VRAM size at 256 MB.
>>>
>>> The random slowdowns with Dirt Rally are only caused by the pressure
>>> on visible VRAM. This whole thread is about those random slowdowns.
>>
>> No, this thread is about the scenario described in the cover letter of
>> this patch series.
>>
>>
>>> If you're saying "maybe it was just one of the random slowdowns", you're
>>> saying "maybe it was just the visible VRAM pressure". It's only
>>> random with Dirt Rally, which makes it difficult to believe statements
>>> such as "I can't reproduce any slowdown".
>>
>> I could say the same thing about you seeing random slowdowns... I've
>> never seen that, I had to artificially limit the size of visible VRAM to
>> 64 MB to make it significantly affect the benchmark result.
>>
>> How many times do you need to run the benchmark on average to hit a
>> random slowdown? Which desktop environment and other X clients are
>> running during the benchmark? Which tab is active in the Steam window
>> while the benchmark runs?
>>
>> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
>> the library.
> 
> Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
> Dirt Rally.
> 
> Every single time I run the game with this series, I get 700-1000MB/s
> of TTM BO moves. There doesn't seem to be any randomness.
> 
> It was better without this series. (meaning it was sometimes OK, sometimes bad)

Thanks for the additional details. I presume that in the bad case there
are some BOs lying around in visible VRAM (e.g. from Unity), which
causes some of Dirt Rally's BOs to go back and forth between GTT on CPU
page faults and VRAM on GPU usage.

This means at least patch 2 goes out the window. I'll see if I can
salvage something out of patch 3.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                     ` <f04e8239-e8cb-1c6c-a0ae-581facadb3cf-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-24 11:03                                       ` Marek Olšák
       [not found]                                         ` <CAAxE2A5uha-MPx4B=ZeW9kf2qQXOTNcK7sxMJDtEk54S=us47g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-24 11:03 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: amd-gfx mailing list

On Wed, May 24, 2017 at 9:56 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 23/05/17 07:38 PM, Marek Olšák wrote:
>> On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>> On 22/05/17 07:09 PM, Marek Olšák wrote:
>>>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>>>>>> <mailto:michel@daenzer.net>> wrote:
>>>>>>
>>>>>>     On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>>>>     > Hi Michel,
>>>>>>     >
>>>>>>     > I've applied your series
>>>>>>
>>>>>>     Thanks for testing it.
>>>>>>
>>>>>>     > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>>>>>     > buffer moves at 800MB/s and many VRAM page faults.
>>>>>>
>>>>>>     Did you see this:
>>>>>>
>>>>>>     >> Note that there's only little if any improvement of the average
>>>>>>     framerate
>>>>>>     >> reported, but the minimum framerate as seen on the HUD goes from
>>>>>>     ~10 fps
>>>>>>     >> to ~17.
>>>>>>
>>>>>>     I.e. it mostly affects the minimum framerate and smoothness for me
>>>>>>     as well.
>>>>>>
>>>>>>
>>>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>>>> average fps. That might just be random bad luck. I don't know.
>>>>>
>>>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>>>> talking about in other threads and on IRC?
>>>>>
>>>>> I can't reproduce any slowdown with these patches, even leaving visible
>>>>> VRAM size at 256 MB.
>>>>
>>>> The random slowdowns with Dirt Rally are only caused by the pressure
>>>> on visible VRAM. This whole thread is about those random slowdowns.
>>>
>>> No, this thread is about the scenario described in the cover letter of
>>> this patch series.
>>>
>>>
>>>> If you're saying "maybe it was just one of the random slowdowns", you're
>>>> saying "maybe it was just the visible VRAM pressure". It's only
>>>> random with Dirt Rally, which makes it difficult to believe statements
>>>> such as "I can't reproduce any slowdown".
>>>
>>> I could say the same thing about you seeing random slowdowns... I've
>>> never seen that, I had to artificially limit the size of visible VRAM to
>>> 64 MB to make it significantly affect the benchmark result.
>>>
>>> How many times do you need to run the benchmark on average to hit a
>>> random slowdown? Which desktop environment and other X clients are
>>> running during the benchmark? Which tab is active in the Steam window
>>> while the benchmark runs?
>>>
>>> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
>>> the library.
>>
>> Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
>> Dirt Rally.
>>
>> Every single time I run the game with this series, I get 700-1000MB/s
>> of TTM BO moves. There doesn't seem to be any randomness.
>>
>> It was better without this series. (meaning it was sometimes OK, sometimes bad)
>
> Thanks for the additional details. I presume that in the bad case there
> are some BOs lying around in visible VRAM (e.g. from Unity), which
> causes some of Dirt Rally's BOs to go back and forth between GTT on CPU
> page faults and VRAM on GPU usage.
>
> This means at least patch 2 goes out the window. I'll see if I can
> salvage something out of patch 3.

I think the final solution (done in fault_reserve_notify) should be:
if (bo->num_cpu_page_faults++ > 20)
   bo->preferred_domain = GTT_WC;

Otherwise I think we'll be just going in circles and not get anywhere.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                         ` <CAAxE2A5uha-MPx4B=ZeW9kf2qQXOTNcK7sxMJDtEk54S=us47g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-24 11:27                                           ` Christian König
       [not found]                                             ` <eafd4ca6-fbbf-2bb4-be97-16f54fe402dd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Christian König @ 2017-05-24 11:27 UTC (permalink / raw)
  To: Marek Olšák, Michel Dänzer; +Cc: amd-gfx mailing list

Am 24.05.2017 um 13:03 schrieb Marek Olšák:
> On Wed, May 24, 2017 at 9:56 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 23/05/17 07:38 PM, Marek Olšák wrote:
>>> On Tue, May 23, 2017 at 2:45 AM, Michel Dänzer <michel@daenzer.net> wrote:
>>>> On 22/05/17 07:09 PM, Marek Olšák wrote:
>>>>> On Mon, May 22, 2017 at 12:00 PM, Michel Dänzer <michel@daenzer.net> wrote:
>>>>>> On 20/05/17 06:26 PM, Marek Olšák wrote:
>>>>>>> On May 20, 2017 3:26 AM, "Michel Dänzer" <michel@daenzer.net
>>>>>>> <mailto:michel@daenzer.net>> wrote:
>>>>>>>
>>>>>>>      On 20/05/17 01:14 AM, Marek Olšák wrote:
>>>>>>>      > Hi Michel,
>>>>>>>      >
>>>>>>>      > I've applied your series
>>>>>>>
>>>>>>>      Thanks for testing it.
>>>>>>>
>>>>>>>      > and it doesn't help with low Dirt Rally performance on Fiji. I see TTM
>>>>>>>      > buffer moves at 800MB/s and many VRAM page faults.
>>>>>>>
>>>>>>>      Did you see this:
>>>>>>>
>>>>>>>      >> Note that there's only little if any improvement of the average
>>>>>>>      framerate
>>>>>>>      >> reported, but the minimum framerate as seen on the HUD goes from
>>>>>>>      ~10 fps
>>>>>>>      >> to ~17.
>>>>>>>
>>>>>>>      I.e. it mostly affects the minimum framerate and smoothness for me
>>>>>>>      as well.
>>>>>>>
>>>>>>>
>>>>>>> Without the series, I get 70 average fps. With the series, I get 30
>>>>>>> average fps. That might just be random bad luck. I don't know.
>>>>>> Hmm, yeah, maybe that was just one of the random slowdowns you've been
>>>>>> talking about in other threads and on IRC?
>>>>>>
>>>>>> I can't reproduce any slowdown with these patches, even leaving visible
>>>>>> VRAM size at 256 MB.
>>>>> The random slowdowns with Dirt Rally are only caused by the pressure
>>>>> on visible VRAM. This whole thread is about those random slowdowns.
>>>> No, this thread is about the scenario described in the cover letter of
>>>> this patch series.
>>>>
>>>>
>>>>> If you're saying "maybe it was just one of the random slowdowns", you're
>>>>> saying "maybe it was just the visible VRAM pressure". It's only
>>>>> random with Dirt Rally, which makes it difficult to believe statements
>>>>> such as "I can't reproduce any slowdown".
>>>> I could say the same thing about you seeing random slowdowns... I've
>>>> never seen that, I had to artificially limit the size of visible VRAM to
>>>> 64 MB to make it significantly affect the benchmark result.
>>>>
>>>> How many times do you need to run the benchmark on average to hit a
>>>> random slowdown? Which desktop environment and other X clients are
>>>> running during the benchmark? Which tab is active in the Steam window
>>>> while the benchmark runs?
>>>>
>>>> In my case, it's only xfwm4, xterm and steam on the Dirt Rally page in
>>>> the library.
>>> Ubuntu Unity, Steam small mode (there are no tabs), Ultra settings in
>>> Dirt Rally.
>>>
>>> Every single time I run the game with this series, I get 700-1000MB/s
>>> of TTM BO moves. There doesn't seem to be any randomness.
>>>
>>> It was better without this series. (meaning it was sometimes OK, sometimes bad)
>> Thanks for the additional details. I presume that in the bad case there
>> are some BOs lying around in visible VRAM (e.g. from Unity), which
>> causes some of Dirt Rally's BOs to go back and forth between GTT on CPU
>> page faults and VRAM on GPU usage.
>>
>> This means at least patch 2 goes out the window. I'll see if I can
>> salvage something out of patch 3.
> I think the final solution (done in fault_reserve_notify) should be:
> if (bo->num_cpu_page_faults++ > 20)
>     bo->preferred_domain = GTT_WC;

I more or less agree on that, but setting preferred_domain permanently 
to GTT_WC is what worries me a bit.

E.g. imagine you alt+tab from a game to your browser and back and the 
game runs way slower now because BOs are never moved back to VRAM.

What we need is a global limit of number of bytes transfered per second 
for swap operations or something like that.

Or maybe a timeout which says when a BO was moved (either by swapping it 
out or by a CPU page fault) only move it back after +n jiffies or 
something like that.

Christian.

>
> Otherwise I think we'll be just going in circles and not get anywhere.
>
> Marek
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                             ` <eafd4ca6-fbbf-2bb4-be97-16f54fe402dd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-05-25  3:31                                               ` Michel Dänzer
       [not found]                                                 ` <0535f00f-9095-6209-72e9-e6bb46569a63-otUistvHUpPR7s880joybQ@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Michel Dänzer @ 2017-05-25  3:31 UTC (permalink / raw)
  To: Christian König, Marek Olšák; +Cc: amd-gfx mailing list

On 24/05/17 08:27 PM, Christian König wrote:
> Am 24.05.2017 um 13:03 schrieb Marek Olšák:
>>>
>> I think the final solution (done in fault_reserve_notify) should be:
>> if (bo->num_cpu_page_faults++ > 20)
>>     bo->preferred_domain = GTT_WC;

I agree something like that will probably be part of the solution, but I
doubt it's quite that simple or that it's the only thing that can be
improved.


> I more or less agree on that, but setting preferred_domain permanently
> to GTT_WC is what worries me a bit.
> 
> E.g. imagine you alt+tab from a game to your browser and back and the
> game runs way slower now because BOs are never moved back to VRAM.

Right, permanently moving a BO to GTT might itself cause performance to
drop down a cliff in some cases. It's possible that this is irrelevant
compared to excessive buffer migration for CPU access though.


> What we need is a global limit of number of bytes transfered per second
> for swap operations or something like that.
> 
> Or maybe a timeout which says when a BO was moved (either by swapping it
> out or by a CPU page fault) only move it back after +n jiffies or
> something like that.

I also feel like something like this will be more useful than the number
of CPU page faults per se. But I'm curious what Marek comes up with. :)


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                                 ` <0535f00f-9095-6209-72e9-e6bb46569a63-otUistvHUpPR7s880joybQ@public.gmane.org>
@ 2017-05-25 11:24                                                   ` Marek Olšák
       [not found]                                                     ` <CAAxE2A6YFjK1VPuxW=HROqKX+-ejvVJ=YneULp+cDkj033YAbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 39+ messages in thread
From: Marek Olšák @ 2017-05-25 11:24 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: Christian König, amd-gfx mailing list

On Thu, May 25, 2017 at 5:31 AM, Michel Dänzer <michel@daenzer.net> wrote:
> On 24/05/17 08:27 PM, Christian König wrote:
>> Am 24.05.2017 um 13:03 schrieb Marek Olšák:
>>>>
>>> I think the final solution (done in fault_reserve_notify) should be:
>>> if (bo->num_cpu_page_faults++ > 20)
>>>     bo->preferred_domain = GTT_WC;
>
> I agree something like that will probably be part of the solution, but I
> doubt it's quite that simple or that it's the only thing that can be
> improved.
>
>
>> I more or less agree on that, but setting preferred_domain permanently
>> to GTT_WC is what worries me a bit.
>>
>> E.g. imagine you alt+tab from a game to your browser and back and the
>> game runs way slower now because BOs are never moved back to VRAM.
>
> Right, permanently moving a BO to GTT might itself cause performance to
> drop down a cliff in some cases. It's possible that this is irrelevant
> compared to excessive buffer migration for CPU access though.
>
>
>> What we need is a global limit of number of bytes transfered per second
>> for swap operations or something like that.
>>
>> Or maybe a timeout which says when a BO was moved (either by swapping it
>> out or by a CPU page fault) only move it back after +n jiffies or
>> something like that.
>
> I also feel like something like this will be more useful than the number
> of CPU page faults per se. But I'm curious what Marek comes up with. :)

I don't have any better idea at the moment. It looks like John Brooks
has already solved this issue based on his IRC comments.

Marek
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM
       [not found]                                                     ` <CAAxE2A6YFjK1VPuxW=HROqKX+-ejvVJ=YneULp+cDkj033YAbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-05-26  0:46                                                       ` Michel Dänzer
  0 siblings, 0 replies; 39+ messages in thread
From: Michel Dänzer @ 2017-05-26  0:46 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Christian König, amd-gfx mailing list

On 25/05/17 08:24 PM, Marek Olšák wrote:
> On Thu, May 25, 2017 at 5:31 AM, Michel Dänzer <michel@daenzer.net> wrote:
>> On 24/05/17 08:27 PM, Christian König wrote:
>>> Am 24.05.2017 um 13:03 schrieb Marek Olšák:
>>>>>
>>>> I think the final solution (done in fault_reserve_notify) should be:
>>>> if (bo->num_cpu_page_faults++ > 20)
>>>>     bo->preferred_domain = GTT_WC;
>>
>> I agree something like that will probably be part of the solution, but I
>> doubt it's quite that simple or that it's the only thing that can be
>> improved.
>>
>>
>>> I more or less agree on that, but setting preferred_domain permanently
>>> to GTT_WC is what worries me a bit.
>>>
>>> E.g. imagine you alt+tab from a game to your browser and back and the
>>> game runs way slower now because BOs are never moved back to VRAM.
>>
>> Right, permanently moving a BO to GTT might itself cause performance to
>> drop down a cliff in some cases. It's possible that this is irrelevant
>> compared to excessive buffer migration for CPU access though.
>>
>>
>>> What we need is a global limit of number of bytes transfered per second
>>> for swap operations or something like that.
>>>
>>> Or maybe a timeout which says when a BO was moved (either by swapping it
>>> out or by a CPU page fault) only move it back after +n jiffies or
>>> something like that.
>>
>> I also feel like something like this will be more useful than the number
>> of CPU page faults per se. But I'm curious what Marek comes up with. :)
> 
> I don't have any better idea at the moment. It looks like John Brooks
> has already solved this issue based on his IRC comments.

I don't think there's "the issue" with a single solution. None of John's
patches that I've tried so far help for the scenario described in the
cover letter of this series.


-- 
Earthling Michel Dänzer               |               http://www.amd.com
Libre software enthusiast             |             Mesa and X developer
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2017-05-26  0:46 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-18  9:08 [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM Michel Dänzer
     [not found] ` <20170518090809.15916-1-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-18  9:08   ` [PATCH 1/3] drm/amdgpu: Drop useless loops for placement restrictions Michel Dänzer
     [not found]     ` <20170518090809.15916-2-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-18  9:17       ` Christian König
     [not found]         ` <6c3e7a55-d3b0-fc3f-39f4-e85934119e29-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-18  9:47           ` Michel Dänzer
2017-05-18  9:08   ` [PATCH 2/3] drm/amdgpu: Don't evict other BOs from VRAM for page faults Michel Dänzer
2017-05-18  9:08   ` [PATCH 3/3] drm/amdgpu: Try evicting from CPU visible to invisible VRAM first Michel Dänzer
     [not found]     ` <20170518090809.15916-4-michel-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-18 15:43       ` John Brooks
     [not found]         ` <20170518154346.GA5730-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
2017-05-19  1:20           ` Michel Dänzer
     [not found]             ` <b65d8565-47e6-39d7-12f7-f7b60cf9787e-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-19  1:43               ` John Brooks
     [not found]                 ` <20170519014314.GA29857-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
2017-05-19  1:50                   ` Michel Dänzer
2017-05-19  3:04   ` [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM John Brooks
     [not found]     ` <1495163086-4747-1-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
2017-05-19  3:04       ` [PATCH] drm/amdgpu: Place new CPU-accessbile BOs in GTT if visible VRAM is full John Brooks
     [not found]         ` <1495163086-4747-2-git-send-email-john-xq/Ko7C6e2Bl57MIdRCFDg@public.gmane.org>
2017-05-19  5:37           ` zhoucm1
2017-05-19  7:03           ` Michel Dänzer
     [not found]             ` <b2ecb0f8-cd1b-9a00-6ba6-082674fde776-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-19 15:52               ` Marek Olšák
     [not found]                 ` <CAAxE2A7DjWR8x1WYE=Fgqn7vZ7hyRdQHA3wKo_HcNdZ14KdmkA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-20  1:22                   ` Michel Dänzer
2017-05-19 19:23               ` John Brooks
     [not found]                 ` <20170519192352.GA21642-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
2017-05-20  1:27                   ` Michel Dänzer
     [not found]                     ` <d0535946-a10d-3b77-0d94-c52098dce021-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-20  3:21                       ` John Brooks
     [not found]                         ` <20170520032101.GA16371-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
2017-05-22  1:30                           ` Michel Dänzer
2017-05-19 15:24       ` [PATCH 0/3] drm/amdgpu: Tweaks for high pressure on CPU visible VRAM Marek Olšák
     [not found]         ` <CAAxE2A6pcm8EiXDRO2kO64TWJRR4BVrzbpiVrLdzhh3uERzTUg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-19 15:27           ` John Brooks
     [not found]             ` <20170519152732.GA11484-6hIufAJW0g7Gr8qjsLp7YGXnswh1EIUO@public.gmane.org>
2017-05-19 15:47               ` Marek Olšák
     [not found]                 ` <CAAxE2A77Yu9c02Mu-wK3HD_gAKvPbN8q7aEtHcxrcOUhGG4Pfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-19 15:49                   ` Marek Olšák
2017-05-19 16:14   ` Marek Olšák
     [not found]     ` <CAAxE2A65deE+hue7oPMQ+=K5s8TEGZE+bJMNVOdp=LQuq_RnMQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-20  1:26       ` Michel Dänzer
     [not found]         ` <CAAxE2A6ZiNSswhArstqHpYnm30ieoDZj=j_-aFvZz1_ByXLOGA@mail.gmail.com>
     [not found]           ` <CAAxE2A4zdgFRh-GDqyDRrXmEWqjSrkfCo=Z41hfo-UNY0E5FhA@mail.gmail.com>
     [not found]             ` <CAAxE2A4zdgFRh-GDqyDRrXmEWqjSrkfCo=Z41hfo-UNY0E5FhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-20  9:26               ` Marek Olšák
     [not found]                 ` <CAAxE2A67T3Mhw68thNewFtcdua3ufSP-FDX_RN153x-aLKUh1Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-20  9:35                   ` Michel Dänzer
2017-05-22 10:00                   ` Michel Dänzer
     [not found]                     ` <7649341f-6407-1a12-9674-fb65076125fc-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-22 10:09                       ` Marek Olšák
     [not found]                         ` <CAAxE2A446KFb+CiwdZusvdgRSfQS8jLgsCy4K7L0_tc00pvWKA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-22 18:20                           ` John Brooks
2017-05-23  0:45                           ` Michel Dänzer
     [not found]                             ` <c3dfa73a-e0c1-c6e9-0264-3c52e444c297-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-23 10:38                               ` Marek Olšák
     [not found]                                 ` <CAAxE2A7tKvvmZ+qf8f06FJODWNLgHcv-XPiJJgLana-PcXBR_g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-24  7:56                                   ` Michel Dänzer
     [not found]                                     ` <f04e8239-e8cb-1c6c-a0ae-581facadb3cf-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-24 11:03                                       ` Marek Olšák
     [not found]                                         ` <CAAxE2A5uha-MPx4B=ZeW9kf2qQXOTNcK7sxMJDtEk54S=us47g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-24 11:27                                           ` Christian König
     [not found]                                             ` <eafd4ca6-fbbf-2bb4-be97-16f54fe402dd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-05-25  3:31                                               ` Michel Dänzer
     [not found]                                                 ` <0535f00f-9095-6209-72e9-e6bb46569a63-otUistvHUpPR7s880joybQ@public.gmane.org>
2017-05-25 11:24                                                   ` Marek Olšák
     [not found]                                                     ` <CAAxE2A6YFjK1VPuxW=HROqKX+-ejvVJ=YneULp+cDkj033YAbA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-05-26  0:46                                                       ` Michel Dänzer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.