amd-gfx.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
@ 2022-05-06 11:23 Christian König
  2022-05-06 11:23 ` [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC Christian König
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Christian König @ 2022-05-06 11:23 UTC (permalink / raw)
  To: Marek.Olsak, amd-gfx

Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO
doesn't needs to be preserved during eviction.

KFD was already using a similar functionality for SVM BOs so replace the
internal flag with the new UAPI.

Only compile tested!

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_svm.c       | 2 +-
 include/uapi/drm/amdgpu_drm.h              | 4 ++++
 6 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 2e16484bf606..bf97d8f07f57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -302,8 +302,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
 		      AMDGPU_GEM_CREATE_VRAM_CLEARED |
 		      AMDGPU_GEM_CREATE_VM_ALWAYS_VALID |
 		      AMDGPU_GEM_CREATE_EXPLICIT_SYNC |
-		      AMDGPU_GEM_CREATE_ENCRYPTED))
-
+		      AMDGPU_GEM_CREATE_ENCRYPTED |
+		      AMDGPU_GEM_CREATE_DISCARDABLE))
 		return -EINVAL;
 
 	/* reject invalid gem domains */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 8b7ee1142d9a..1944ef37a61e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 		bp->domain;
 	bo->allowed_domains = bo->preferred_domains;
 	if (bp->type != ttm_bo_type_kernel &&
+	    !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
 	    bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
 		bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index 4c9cbdc66995..147b79c10cbb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -41,7 +41,6 @@
 
 /* BO flag to indicate a KFD userptr BO */
 #define AMDGPU_AMDKFD_CREATE_USERPTR_BO	(1ULL << 63)
-#define AMDGPU_AMDKFD_CREATE_SVM_BO	(1ULL << 62)
 
 #define to_amdgpu_bo_user(abo) container_of((abo), struct amdgpu_bo_user, bo)
 #define to_amdgpu_bo_vm(abo) container_of((abo), struct amdgpu_bo_vm, bo)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 41d6f604813d..ba3221a25e75 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -117,7 +117,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
 	}
 
 	abo = ttm_to_amdgpu_bo(bo);
-	if (abo->flags & AMDGPU_AMDKFD_CREATE_SVM_BO) {
+	if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
 		placement->num_placement = 0;
 		placement->num_busy_placement = 0;
 		return;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 5ed8d9b549a4..835b5187f0b8 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -531,7 +531,7 @@ svm_range_vram_node_new(struct amdgpu_device *adev, struct svm_range *prange,
 	bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
 	bp.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
 	bp.flags |= clear ? AMDGPU_GEM_CREATE_VRAM_CLEARED : 0;
-	bp.flags |= AMDGPU_AMDKFD_CREATE_SVM_BO;
+	bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE;
 	bp.type = ttm_bo_type_device;
 	bp.resv = NULL;
 
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 9a1d210d135d..57b9d8f0133a 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -140,6 +140,10 @@ extern "C" {
  * not require GTT memory accounting
  */
 #define AMDGPU_GEM_CREATE_PREEMPTIBLE		(1 << 11)
+/* Flag that BO can be discarded under memory pressure without keeping the
+ * content.
+ */
+#define AMDGPU_GEM_CREATE_DISCARDABLE		(1 << 12)
 
 struct drm_amdgpu_gem_create_in  {
 	/** the requested memory size */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-06 11:23 [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Christian König
@ 2022-05-06 11:23 ` Christian König
  2022-05-10 21:21   ` Marek Olšák
  2022-05-06 11:23 ` [PATCH 3/3] drm/amdgpu: bump minor version number Christian König
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-06 11:23 UTC (permalink / raw)
  To: Marek.Olsak, amd-gfx

Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL allocation.

Only compile tested!

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
 drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
 include/uapi/drm/amdgpu_drm.h           | 2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index bf97d8f07f57..d8129626581f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct amdgpu_device *adev, uint32_t flags)
 		pte_flag |= AMDGPU_PTE_WRITEABLE;
 	if (flags & AMDGPU_VM_PAGE_PRT)
 		pte_flag |= AMDGPU_PTE_PRT;
+	if (flags & AMDGPU_VM_PAGE_NOALLOC)
+		pte_flag |= AMDGPU_PTE_NOALLOC;
 
 	if (adev->gmc.gmc_funcs->map_mtype)
 		pte_flag |= amdgpu_gmc_map_mtype(adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
index b8c79789e1e4..9077dfccaf3c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
@@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device *adev,
 	*flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
 	*flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
 
+	*flags &= ~AMDGPU_PTE_NOALLOC;
+	*flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
 	if (mapping->flags & AMDGPU_PTE_PRT) {
 		*flags |= AMDGPU_PTE_PRT;
 		*flags |= AMDGPU_PTE_SNOOPED;
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
index 8d733eeac556..32ee56adb602 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
@@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device *adev,
 	*flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
 	*flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
 
+	*flags &= ~AMDGPU_PTE_NOALLOC;
+	*flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
+
 	if (mapping->flags & AMDGPU_PTE_PRT) {
 		*flags |= AMDGPU_PTE_PRT;
 		*flags |= AMDGPU_PTE_SNOOPED;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 57b9d8f0133a..9d71d6330687 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
 #define AMDGPU_VM_MTYPE_UC		(4 << 5)
 /* Use Read Write MTYPE instead of default MTYPE */
 #define AMDGPU_VM_MTYPE_RW		(5 << 5)
+/* don't allocate MALL */
+#define AMDGPU_VM_PAGE_NOALLOC		(1 << 9)
 
 struct drm_amdgpu_gem_va {
 	/** GEM object handle */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] drm/amdgpu: bump minor version number
  2022-05-06 11:23 [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Christian König
  2022-05-06 11:23 ` [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC Christian König
@ 2022-05-06 11:23 ` Christian König
  2022-05-06 13:34   ` Alex Deucher
  2022-05-06 15:04 ` [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Felix Kuehling
  2022-05-10 20:13 ` Marek Olšák
  3 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-06 11:23 UTC (permalink / raw)
  To: Marek.Olsak, amd-gfx

Increase the minor version number to indicate that the new flags are
avaiable.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 16871baee784..3dbf406b4194 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -99,10 +99,11 @@
  * - 3.43.0 - Add device hot plug/unplug support
  * - 3.44.0 - DCN3 supports DCC independent block settings: !64B && 128B, 64B && 128B
  * - 3.45.0 - Add context ioctl stable pstate interface
- * * 3.46.0 - To enable hot plug amdgpu tests in libdrm
+ * - 3.46.0 - To enable hot plug amdgpu tests in libdrm
+ * * 3.47.0 - Add AMDGPU_GEM_CREATE_DISCARDABLE and AMDGPU_VM_NOALLOC flags
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	46
+#define KMS_DRIVER_MINOR	47
 #define KMS_DRIVER_PATCHLEVEL	0
 
 int amdgpu_vram_limit;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: bump minor version number
  2022-05-06 11:23 ` [PATCH 3/3] drm/amdgpu: bump minor version number Christian König
@ 2022-05-06 13:34   ` Alex Deucher
  2022-05-12  2:38     ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Alex Deucher @ 2022-05-06 13:34 UTC (permalink / raw)
  To: Christian König; +Cc: Marek Olšák, amd-gfx list

On Fri, May 6, 2022 at 7:23 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Increase the minor version number to indicate that the new flags are
> avaiable.

typo: available.  Other than that the series is:
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Once we get the Mesa patches.

Alex


>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 16871baee784..3dbf406b4194 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -99,10 +99,11 @@
>   * - 3.43.0 - Add device hot plug/unplug support
>   * - 3.44.0 - DCN3 supports DCC independent block settings: !64B && 128B, 64B && 128B
>   * - 3.45.0 - Add context ioctl stable pstate interface
> - * * 3.46.0 - To enable hot plug amdgpu tests in libdrm
> + * - 3.46.0 - To enable hot plug amdgpu tests in libdrm
> + * * 3.47.0 - Add AMDGPU_GEM_CREATE_DISCARDABLE and AMDGPU_VM_NOALLOC flags
>   */
>  #define KMS_DRIVER_MAJOR       3
> -#define KMS_DRIVER_MINOR       46
> +#define KMS_DRIVER_MINOR       47
>  #define KMS_DRIVER_PATCHLEVEL  0
>
>  int amdgpu_vram_limit;
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-06 11:23 [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Christian König
  2022-05-06 11:23 ` [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC Christian König
  2022-05-06 11:23 ` [PATCH 3/3] drm/amdgpu: bump minor version number Christian König
@ 2022-05-06 15:04 ` Felix Kuehling
  2022-05-10 20:13 ` Marek Olšák
  3 siblings, 0 replies; 32+ messages in thread
From: Felix Kuehling @ 2022-05-06 15:04 UTC (permalink / raw)
  To: Christian König, Marek.Olsak, amd-gfx


Am 2022-05-06 um 07:23 schrieb Christian König:
> Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO
> doesn't needs to be preserved during eviction.
>
> KFD was already using a similar functionality for SVM BOs so replace the
> internal flag with the new UAPI.
>
> Only compile tested!
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 2 +-
>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c       | 2 +-
>   include/uapi/drm/amdgpu_drm.h              | 4 ++++
>   6 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 2e16484bf606..bf97d8f07f57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -302,8 +302,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev, void *data,
>   		      AMDGPU_GEM_CREATE_VRAM_CLEARED |
>   		      AMDGPU_GEM_CREATE_VM_ALWAYS_VALID |
>   		      AMDGPU_GEM_CREATE_EXPLICIT_SYNC |
> -		      AMDGPU_GEM_CREATE_ENCRYPTED))
> -
> +		      AMDGPU_GEM_CREATE_ENCRYPTED |
> +		      AMDGPU_GEM_CREATE_DISCARDABLE))
>   		return -EINVAL;
>   
>   	/* reject invalid gem domains */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 8b7ee1142d9a..1944ef37a61e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>   		bp->domain;
>   	bo->allowed_domains = bo->preferred_domains;
>   	if (bp->type != ttm_bo_type_kernel &&
> +	    !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>   	    bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>   		bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>   
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 4c9cbdc66995..147b79c10cbb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -41,7 +41,6 @@
>   
>   /* BO flag to indicate a KFD userptr BO */
>   #define AMDGPU_AMDKFD_CREATE_USERPTR_BO	(1ULL << 63)
> -#define AMDGPU_AMDKFD_CREATE_SVM_BO	(1ULL << 62)
>   
>   #define to_amdgpu_bo_user(abo) container_of((abo), struct amdgpu_bo_user, bo)
>   #define to_amdgpu_bo_vm(abo) container_of((abo), struct amdgpu_bo_vm, bo)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 41d6f604813d..ba3221a25e75 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -117,7 +117,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo,
>   	}
>   
>   	abo = ttm_to_amdgpu_bo(bo);
> -	if (abo->flags & AMDGPU_AMDKFD_CREATE_SVM_BO) {
> +	if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
>   		placement->num_placement = 0;
>   		placement->num_busy_placement = 0;
>   		return;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 5ed8d9b549a4..835b5187f0b8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -531,7 +531,7 @@ svm_range_vram_node_new(struct amdgpu_device *adev, struct svm_range *prange,
>   	bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
>   	bp.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
>   	bp.flags |= clear ? AMDGPU_GEM_CREATE_VRAM_CLEARED : 0;
> -	bp.flags |= AMDGPU_AMDKFD_CREATE_SVM_BO;
> +	bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE;
>   	bp.type = ttm_bo_type_device;
>   	bp.resv = NULL;
>   
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 9a1d210d135d..57b9d8f0133a 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -140,6 +140,10 @@ extern "C" {
>    * not require GTT memory accounting
>    */
>   #define AMDGPU_GEM_CREATE_PREEMPTIBLE		(1 << 11)
> +/* Flag that BO can be discarded under memory pressure without keeping the
> + * content.
> + */
> +#define AMDGPU_GEM_CREATE_DISCARDABLE		(1 << 12)
>   
>   struct drm_amdgpu_gem_create_in  {
>   	/** the requested memory size */

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-06 11:23 [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Christian König
                   ` (2 preceding siblings ...)
  2022-05-06 15:04 ` [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Felix Kuehling
@ 2022-05-10 20:13 ` Marek Olšák
  2022-05-10 20:43   ` Marek Olšák
  3 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-10 20:13 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 4880 bytes --]

Does this really guarantee VRAM placement? The code doesn't say anything
about that.

Marek


On Fri, May 6, 2022 at 7:23 AM Christian König <
ckoenig.leichtzumerken@gmail.com> wrote:

> Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO
> doesn't needs to be preserved during eviction.
>
> KFD was already using a similar functionality for SVM BOs so replace the
> internal flag with the new UAPI.
>
> Only compile tested!
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 2 +-
>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c       | 2 +-
>  include/uapi/drm/amdgpu_drm.h              | 4 ++++
>  6 files changed, 9 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 2e16484bf606..bf97d8f07f57 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -302,8 +302,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev,
> void *data,
>                       AMDGPU_GEM_CREATE_VRAM_CLEARED |
>                       AMDGPU_GEM_CREATE_VM_ALWAYS_VALID |
>                       AMDGPU_GEM_CREATE_EXPLICIT_SYNC |
> -                     AMDGPU_GEM_CREATE_ENCRYPTED))
> -
> +                     AMDGPU_GEM_CREATE_ENCRYPTED |
> +                     AMDGPU_GEM_CREATE_DISCARDABLE))
>                 return -EINVAL;
>
>         /* reject invalid gem domains */
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index 8b7ee1142d9a..1944ef37a61e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>                 bp->domain;
>         bo->allowed_domains = bo->preferred_domains;
>         if (bp->type != ttm_bo_type_kernel &&
> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> index 4c9cbdc66995..147b79c10cbb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
> @@ -41,7 +41,6 @@
>
>  /* BO flag to indicate a KFD userptr BO */
>  #define AMDGPU_AMDKFD_CREATE_USERPTR_BO        (1ULL << 63)
> -#define AMDGPU_AMDKFD_CREATE_SVM_BO    (1ULL << 62)
>
>  #define to_amdgpu_bo_user(abo) container_of((abo), struct amdgpu_bo_user,
> bo)
>  #define to_amdgpu_bo_vm(abo) container_of((abo), struct amdgpu_bo_vm, bo)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 41d6f604813d..ba3221a25e75 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -117,7 +117,7 @@ static void amdgpu_evict_flags(struct
> ttm_buffer_object *bo,
>         }
>
>         abo = ttm_to_amdgpu_bo(bo);
> -       if (abo->flags & AMDGPU_AMDKFD_CREATE_SVM_BO) {
> +       if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
>                 placement->num_placement = 0;
>                 placement->num_busy_placement = 0;
>                 return;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> index 5ed8d9b549a4..835b5187f0b8 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
> @@ -531,7 +531,7 @@ svm_range_vram_node_new(struct amdgpu_device *adev,
> struct svm_range *prange,
>         bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
>         bp.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
>         bp.flags |= clear ? AMDGPU_GEM_CREATE_VRAM_CLEARED : 0;
> -       bp.flags |= AMDGPU_AMDKFD_CREATE_SVM_BO;
> +       bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE;
>         bp.type = ttm_bo_type_device;
>         bp.resv = NULL;
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 9a1d210d135d..57b9d8f0133a 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -140,6 +140,10 @@ extern "C" {
>   * not require GTT memory accounting
>   */
>  #define AMDGPU_GEM_CREATE_PREEMPTIBLE          (1 << 11)
> +/* Flag that BO can be discarded under memory pressure without keeping the
> + * content.
> + */
> +#define AMDGPU_GEM_CREATE_DISCARDABLE          (1 << 12)
>
>  struct drm_amdgpu_gem_create_in  {
>         /** the requested memory size */
> --
> 2.25.1
>
>

[-- Attachment #2: Type: text/html, Size: 5793 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-10 20:13 ` Marek Olšák
@ 2022-05-10 20:43   ` Marek Olšák
  2022-05-11  6:04     ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-10 20:43 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 5170 bytes --]

A better flag name would be:
AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD

Marek

On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:

> Does this really guarantee VRAM placement? The code doesn't say anything
> about that.
>
> Marek
>
>
> On Fri, May 6, 2022 at 7:23 AM Christian König <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Add a AMDGPU_GEM_CREATE_DISCARDABLE flag to note that the content of a BO
>> doesn't needs to be preserved during eviction.
>>
>> KFD was already using a similar functionality for SVM BOs so replace the
>> internal flag with the new UAPI.
>>
>> Only compile tested!
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 4 ++--
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 -
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    | 2 +-
>>  drivers/gpu/drm/amd/amdkfd/kfd_svm.c       | 2 +-
>>  include/uapi/drm/amdgpu_drm.h              | 4 ++++
>>  6 files changed, 9 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 2e16484bf606..bf97d8f07f57 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -302,8 +302,8 @@ int amdgpu_gem_create_ioctl(struct drm_device *dev,
>> void *data,
>>                       AMDGPU_GEM_CREATE_VRAM_CLEARED |
>>                       AMDGPU_GEM_CREATE_VM_ALWAYS_VALID |
>>                       AMDGPU_GEM_CREATE_EXPLICIT_SYNC |
>> -                     AMDGPU_GEM_CREATE_ENCRYPTED))
>> -
>> +                     AMDGPU_GEM_CREATE_ENCRYPTED |
>> +                     AMDGPU_GEM_CREATE_DISCARDABLE))
>>                 return -EINVAL;
>>
>>         /* reject invalid gem domains */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 8b7ee1142d9a..1944ef37a61e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>                 bp->domain;
>>         bo->allowed_domains = bo->preferred_domains;
>>         if (bp->type != ttm_bo_type_kernel &&
>> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> index 4c9cbdc66995..147b79c10cbb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
>> @@ -41,7 +41,6 @@
>>
>>  /* BO flag to indicate a KFD userptr BO */
>>  #define AMDGPU_AMDKFD_CREATE_USERPTR_BO        (1ULL << 63)
>> -#define AMDGPU_AMDKFD_CREATE_SVM_BO    (1ULL << 62)
>>
>>  #define to_amdgpu_bo_user(abo) container_of((abo), struct
>> amdgpu_bo_user, bo)
>>  #define to_amdgpu_bo_vm(abo) container_of((abo), struct amdgpu_bo_vm, bo)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index 41d6f604813d..ba3221a25e75 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -117,7 +117,7 @@ static void amdgpu_evict_flags(struct
>> ttm_buffer_object *bo,
>>         }
>>
>>         abo = ttm_to_amdgpu_bo(bo);
>> -       if (abo->flags & AMDGPU_AMDKFD_CREATE_SVM_BO) {
>> +       if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) {
>>                 placement->num_placement = 0;
>>                 placement->num_busy_placement = 0;
>>                 return;
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> index 5ed8d9b549a4..835b5187f0b8 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
>> @@ -531,7 +531,7 @@ svm_range_vram_node_new(struct amdgpu_device *adev,
>> struct svm_range *prange,
>>         bp.domain = AMDGPU_GEM_DOMAIN_VRAM;
>>         bp.flags = AMDGPU_GEM_CREATE_NO_CPU_ACCESS;
>>         bp.flags |= clear ? AMDGPU_GEM_CREATE_VRAM_CLEARED : 0;
>> -       bp.flags |= AMDGPU_AMDKFD_CREATE_SVM_BO;
>> +       bp.flags |= AMDGPU_GEM_CREATE_DISCARDABLE;
>>         bp.type = ttm_bo_type_device;
>>         bp.resv = NULL;
>>
>> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
>> index 9a1d210d135d..57b9d8f0133a 100644
>> --- a/include/uapi/drm/amdgpu_drm.h
>> +++ b/include/uapi/drm/amdgpu_drm.h
>> @@ -140,6 +140,10 @@ extern "C" {
>>   * not require GTT memory accounting
>>   */
>>  #define AMDGPU_GEM_CREATE_PREEMPTIBLE          (1 << 11)
>> +/* Flag that BO can be discarded under memory pressure without keeping
>> the
>> + * content.
>> + */
>> +#define AMDGPU_GEM_CREATE_DISCARDABLE          (1 << 12)
>>
>>  struct drm_amdgpu_gem_create_in  {
>>         /** the requested memory size */
>> --
>> 2.25.1
>>
>>

[-- Attachment #2: Type: text/html, Size: 6288 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-06 11:23 ` [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC Christian König
@ 2022-05-10 21:21   ` Marek Olšák
  2022-05-11  6:06     ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-10 21:21 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 3252 bytes --]

A better name would be:
AMDGPU_VM_PAGE_BYPASS_MALL

Marek

On Fri, May 6, 2022 at 7:23 AM Christian König <
ckoenig.leichtzumerken@gmail.com> wrote:

> Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL allocation.
>
> Only compile tested!
>
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>  drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>  drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>  include/uapi/drm/amdgpu_drm.h           | 2 ++
>  4 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index bf97d8f07f57..d8129626581f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct amdgpu_device
> *adev, uint32_t flags)
>                 pte_flag |= AMDGPU_PTE_WRITEABLE;
>         if (flags & AMDGPU_VM_PAGE_PRT)
>                 pte_flag |= AMDGPU_PTE_PRT;
> +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
> +               pte_flag |= AMDGPU_PTE_NOALLOC;
>
>         if (adev->gmc.gmc_funcs->map_mtype)
>                 pte_flag |= amdgpu_gmc_map_mtype(adev,
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> index b8c79789e1e4..9077dfccaf3c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct amdgpu_device
> *adev,
>         *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>         *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
> +       *flags &= ~AMDGPU_PTE_NOALLOC;
> +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> +
>         if (mapping->flags & AMDGPU_PTE_PRT) {
>                 *flags |= AMDGPU_PTE_PRT;
>                 *flags |= AMDGPU_PTE_SNOOPED;
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> index 8d733eeac556..32ee56adb602 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct amdgpu_device
> *adev,
>         *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>         *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
> +       *flags &= ~AMDGPU_PTE_NOALLOC;
> +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> +
>         if (mapping->flags & AMDGPU_PTE_PRT) {
>                 *flags |= AMDGPU_PTE_PRT;
>                 *flags |= AMDGPU_PTE_SNOOPED;
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 57b9d8f0133a..9d71d6330687 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>  #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>  /* Use Read Write MTYPE instead of default MTYPE */
>  #define AMDGPU_VM_MTYPE_RW             (5 << 5)
> +/* don't allocate MALL */
> +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>
>  struct drm_amdgpu_gem_va {
>         /** GEM object handle */
> --
> 2.25.1
>
>

[-- Attachment #2: Type: text/html, Size: 4028 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-10 20:43   ` Marek Olšák
@ 2022-05-11  6:04     ` Christian König
  2022-05-11  7:08       ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-11  6:04 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 1486 bytes --]

Hi Marek,

Am 10.05.22 um 22:43 schrieb Marek Olšák:
> A better flag name would be:
> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD

A bit long for my taste and I think the best placement is just a side 
effect.

>
> Marek
>
> On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>     Does this really guarantee VRAM placement? The code doesn't say
>     anything about that.
>

Yes, see the code here:

>
>         diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         index 8b7ee1142d9a..1944ef37a61e 100644
>         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>         @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device
>         *adev,
>                         bp->domain;
>                 bo->allowed_domains = bo->preferred_domains;
>                 if (bp->type != ttm_bo_type_kernel &&
>         +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>                     bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>                         bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>

The only case where this could be circumvented is when you try to 
allocate more than physically available on an APU.

E.g. you only have something like 32 MiB VRAM and request 64 MiB, then 
the GEM code will catch the error and fallback to GTT (IIRC).

Regards,
Christian.

[-- Attachment #2: Type: text/html, Size: 3504 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-10 21:21   ` Marek Olšák
@ 2022-05-11  6:06     ` Christian König
  2022-05-11  6:15       ` Lazar, Lijo
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-11  6:06 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 3798 bytes --]

Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
entries on write.

How about AMDGPU_VM_PAGE_NO_MALL ?

Christian.

Am 10.05.22 um 23:21 schrieb Marek Olšák:
> A better name would be:
> AMDGPU_VM_PAGE_BYPASS_MALL
>
> Marek
>
> On Fri, May 6, 2022 at 7:23 AM Christian König 
> <ckoenig.leichtzumerken@gmail.com> wrote:
>
>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>     allocation.
>
>     Only compile tested!
>
>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     ---
>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>      4 files changed, 10 insertions(+)
>
>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     index bf97d8f07f57..d8129626581f 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>     amdgpu_device *adev, uint32_t flags)
>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>             if (flags & AMDGPU_VM_PAGE_PRT)
>                     pte_flag |= AMDGPU_PTE_PRT;
>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>
>             if (adev->gmc.gmc_funcs->map_mtype)
>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     index b8c79789e1e4..9077dfccaf3c 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>     amdgpu_device *adev,
>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>     +
>             if (mapping->flags & AMDGPU_PTE_PRT) {
>                     *flags |= AMDGPU_PTE_PRT;
>                     *flags |= AMDGPU_PTE_SNOOPED;
>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     index 8d733eeac556..32ee56adb602 100644
>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>     amdgpu_device *adev,
>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>
>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>     +
>             if (mapping->flags & AMDGPU_PTE_PRT) {
>                     *flags |= AMDGPU_PTE_PRT;
>                     *flags |= AMDGPU_PTE_SNOOPED;
>     diff --git a/include/uapi/drm/amdgpu_drm.h
>     b/include/uapi/drm/amdgpu_drm.h
>     index 57b9d8f0133a..9d71d6330687 100644
>     --- a/include/uapi/drm/amdgpu_drm.h
>     +++ b/include/uapi/drm/amdgpu_drm.h
>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>      /* Use Read Write MTYPE instead of default MTYPE */
>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>     +/* don't allocate MALL */
>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>
>      struct drm_amdgpu_gem_va {
>             /** GEM object handle */
>     -- 
>     2.25.1
>

[-- Attachment #2: Type: text/html, Size: 5691 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-11  6:06     ` Christian König
@ 2022-05-11  6:15       ` Lazar, Lijo
  2022-05-11  7:22         ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Lazar, Lijo @ 2022-05-11  6:15 UTC (permalink / raw)
  To: Christian König, Marek Olšák; +Cc: amd-gfx mailing list



On 5/11/2022 11:36 AM, Christian König wrote:
> Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL 
> entries on write.
> 
> How about AMDGPU_VM_PAGE_NO_MALL ?

One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort 
of attribute which decides LLC behaviour]

Thanks,
Lijo

> 
> Christian.
> 
> Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> A better name would be:
>> AMDGPU_VM_PAGE_BYPASS_MALL
>>
>> Marek
>>
>> On Fri, May 6, 2022 at 7:23 AM Christian König 
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>     allocation.
>>
>>     Only compile tested!
>>
>>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>     ---
>>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>      4 files changed, 10 insertions(+)
>>
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     index bf97d8f07f57..d8129626581f 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>     amdgpu_device *adev, uint32_t flags)
>>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>             if (flags & AMDGPU_VM_PAGE_PRT)
>>                     pte_flag |= AMDGPU_PTE_PRT;
>>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>
>>             if (adev->gmc.gmc_funcs->map_mtype)
>>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     index b8c79789e1e4..9077dfccaf3c 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     index 8d733eeac556..32ee56adb602 100644
>>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>     amdgpu_device *adev,
>>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>
>>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>     +
>>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                     *flags |= AMDGPU_PTE_PRT;
>>                     *flags |= AMDGPU_PTE_SNOOPED;
>>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>     b/include/uapi/drm/amdgpu_drm.h
>>     index 57b9d8f0133a..9d71d6330687 100644
>>     --- a/include/uapi/drm/amdgpu_drm.h
>>     +++ b/include/uapi/drm/amdgpu_drm.h
>>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>      /* Use Read Write MTYPE instead of default MTYPE */
>>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>     +/* don't allocate MALL */
>>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>
>>      struct drm_amdgpu_gem_va {
>>             /** GEM object handle */
>>     -- 
>>     2.25.1
>>
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-11  6:04     ` Christian König
@ 2022-05-11  7:08       ` Marek Olšák
  2022-05-11 21:58         ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-11  7:08 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 1574 bytes --]

OK that sounds good.

Marek

On Wed, May 11, 2022 at 2:04 AM Christian König <
ckoenig.leichtzumerken@gmail.com> wrote:

> Hi Marek,
>
> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>
> A better flag name would be:
> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>
>
> A bit long for my taste and I think the best placement is just a side
> effect.
>
>
> Marek
>
> On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>> Does this really guarantee VRAM placement? The code doesn't say anything
>> about that.
>>
>
> Yes, see the code here:
>
>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index 8b7ee1142d9a..1944ef37a61e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>                 bp->domain;
>>>         bo->allowed_domains = bo->preferred_domains;
>>>         if (bp->type != ttm_bo_type_kernel &&
>>> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>
>>
> The only case where this could be circumvented is when you try to allocate
> more than physically available on an APU.
>
> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then the
> GEM code will catch the error and fallback to GTT (IIRC).
>
> Regards,
> Christian.
>

[-- Attachment #2: Type: text/html, Size: 3525 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-11  6:15       ` Lazar, Lijo
@ 2022-05-11  7:22         ` Marek Olšák
  2022-05-11  7:43           ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-11  7:22 UTC (permalink / raw)
  To: Lazar, Lijo; +Cc: Christian König, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 4879 bytes --]

Bypass means that the contents of the cache are ignored, which decreases
latency at the cost of no coherency between bypassed and normal memory
requests. NOA (noalloc) means that the cache is checked and can give you
cache hits, but misses are not cached and the overall latency is higher. I
don't know what the hw does, but I hope it was misnamed and it really means
bypass because there is no point in doing cache lookups on every memory
request if the driver wants to disable caching to *decrease* latency in the
situations when the cache isn't helping.

Marek

On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:

>
>
> On 5/11/2022 11:36 AM, Christian König wrote:
> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
> > entries on write.
> >
> > How about AMDGPU_VM_PAGE_NO_MALL ?
>
> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
> of attribute which decides LLC behaviour]
>
> Thanks,
> Lijo
>
> >
> > Christian.
> >
> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
> >> A better name would be:
> >> AMDGPU_VM_PAGE_BYPASS_MALL
> >>
> >> Marek
> >>
> >> On Fri, May 6, 2022 at 7:23 AM Christian König
> >> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>
> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
> >>     allocation.
> >>
> >>     Only compile tested!
> >>
> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
> >>     ---
> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
> >>      4 files changed, 10 insertions(+)
> >>
> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>     index bf97d8f07f57..d8129626581f 100644
> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
> >>     amdgpu_device *adev, uint32_t flags)
> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
> >>             if (flags & AMDGPU_VM_PAGE_PRT)
> >>                     pte_flag |= AMDGPU_PTE_PRT;
> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
> >>
> >>             if (adev->gmc.gmc_funcs->map_mtype)
> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>     index b8c79789e1e4..9077dfccaf3c 100644
> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
> >>     amdgpu_device *adev,
> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >>     +
> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
> >>                     *flags |= AMDGPU_PTE_PRT;
> >>                     *flags |= AMDGPU_PTE_SNOOPED;
> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >>     index 8d733eeac556..32ee56adb602 100644
> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
> >>     amdgpu_device *adev,
> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
> >>
> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
> >>     +
> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
> >>                     *flags |= AMDGPU_PTE_PRT;
> >>                     *flags |= AMDGPU_PTE_SNOOPED;
> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
> >>     b/include/uapi/drm/amdgpu_drm.h
> >>     index 57b9d8f0133a..9d71d6330687 100644
> >>     --- a/include/uapi/drm/amdgpu_drm.h
> >>     +++ b/include/uapi/drm/amdgpu_drm.h
> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
> >>      /* Use Read Write MTYPE instead of default MTYPE */
> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
> >>     +/* don't allocate MALL */
> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
> >>
> >>      struct drm_amdgpu_gem_va {
> >>             /** GEM object handle */
> >>     --
> >>     2.25.1
> >>
> >
>

[-- Attachment #2: Type: text/html, Size: 6579 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-11  7:22         ` Marek Olšák
@ 2022-05-11  7:43           ` Christian König
  2022-05-11 18:55             ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-11  7:43 UTC (permalink / raw)
  To: Marek Olšák, Lazar, Lijo; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 6039 bytes --]

It really *is* a NOALLOC feature. In other words there is no latency 
improvement on reads because the cache is always checked, even with the 
noalloc flag set.

The only thing it affects is that misses not enter the cache and so 
don't cause any additional pressure on evicting cache lines.

You might want to double check with the hardware guys, but I'm something 
like 95% sure that it works this way.

Christian.

Am 11.05.22 um 09:22 schrieb Marek Olšák:
> Bypass means that the contents of the cache are ignored, which 
> decreases latency at the cost of no coherency between bypassed and 
> normal memory requests. NOA (noalloc) means that the cache is checked 
> and can give you cache hits, but misses are not cached and the overall 
> latency is higher. I don't know what the hw does, but I hope it was 
> misnamed and it really means bypass because there is no point in doing 
> cache lookups on every memory request if the driver wants to disable 
> caching to *decrease* latency in the situations when the cache isn't 
> helping.
>
> Marek
>
> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>
>
>     On 5/11/2022 11:36 AM, Christian König wrote:
>     > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
>     MALL
>     > entries on write.
>     >
>     > How about AMDGPU_VM_PAGE_NO_MALL ?
>
>     One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
>     sort
>     of attribute which decides LLC behaviour]
>
>     Thanks,
>     Lijo
>
>     >
>     > Christian.
>     >
>     > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>     >> A better name would be:
>     >> AMDGPU_VM_PAGE_BYPASS_MALL
>     >>
>     >> Marek
>     >>
>     >> On Fri, May 6, 2022 at 7:23 AM Christian König
>     >> <ckoenig.leichtzumerken@gmail.com> wrote:
>     >>
>     >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>     >>     allocation.
>     >>
>     >>     Only compile tested!
>     >>
>     >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>     >>     ---
>     >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>     >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>     >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>     >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>     >>      4 files changed, 10 insertions(+)
>     >>
>     >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     >>     index bf97d8f07f57..d8129626581f 100644
>     >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>     >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>     >>     amdgpu_device *adev, uint32_t flags)
>     >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>     >>             if (flags & AMDGPU_VM_PAGE_PRT)
>     >>                     pte_flag |= AMDGPU_PTE_PRT;
>     >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>     >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>     >>
>     >>             if (adev->gmc.gmc_funcs->map_mtype)
>     >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>     >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     >>     index b8c79789e1e4..9077dfccaf3c 100644
>     >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>     >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>     >>     amdgpu_device *adev,
>     >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>     >>             *flags |= (mapping->flags &
>     AMDGPU_PTE_MTYPE_NV10_MASK);
>     >>
>     >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>     >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>     >>     +
>     >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>     >>                     *flags |= AMDGPU_PTE_PRT;
>     >>                     *flags |= AMDGPU_PTE_SNOOPED;
>     >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     >>     index 8d733eeac556..32ee56adb602 100644
>     >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>     >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>     >>     amdgpu_device *adev,
>     >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>     >>             *flags |= (mapping->flags &
>     AMDGPU_PTE_MTYPE_NV10_MASK);
>     >>
>     >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>     >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>     >>     +
>     >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>     >>                     *flags |= AMDGPU_PTE_PRT;
>     >>                     *flags |= AMDGPU_PTE_SNOOPED;
>     >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>     >>     b/include/uapi/drm/amdgpu_drm.h
>     >>     index 57b9d8f0133a..9d71d6330687 100644
>     >>     --- a/include/uapi/drm/amdgpu_drm.h
>     >>     +++ b/include/uapi/drm/amdgpu_drm.h
>     >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>     >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>     >>      /* Use Read Write MTYPE instead of default MTYPE */
>     >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>     >>     +/* don't allocate MALL */
>     >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>     >>
>     >>      struct drm_amdgpu_gem_va {
>     >>             /** GEM object handle */
>     >>     --
>     >>     2.25.1
>     >>
>     >
>

[-- Attachment #2: Type: text/html, Size: 9457 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-11  7:43           ` Christian König
@ 2022-05-11 18:55             ` Marek Olšák
  2022-05-16 11:06               ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-11 18:55 UTC (permalink / raw)
  To: Christian König; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 5631 bytes --]

Ok sounds good.

Marek

On Wed., May 11, 2022, 03:43 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> It really *is* a NOALLOC feature. In other words there is no latency
> improvement on reads because the cache is always checked, even with the
> noalloc flag set.
>
> The only thing it affects is that misses not enter the cache and so don't
> cause any additional pressure on evicting cache lines.
>
> You might want to double check with the hardware guys, but I'm something
> like 95% sure that it works this way.
>
> Christian.
>
> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>
> Bypass means that the contents of the cache are ignored, which decreases
> latency at the cost of no coherency between bypassed and normal memory
> requests. NOA (noalloc) means that the cache is checked and can give you
> cache hits, but misses are not cached and the overall latency is higher. I
> don't know what the hw does, but I hope it was misnamed and it really means
> bypass because there is no point in doing cache lookups on every memory
> request if the driver wants to disable caching to *decrease* latency in the
> situations when the cache isn't helping.
>
> Marek
>
> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>
>>
>>
>> On 5/11/2022 11:36 AM, Christian König wrote:
>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>> > entries on write.
>> >
>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>
>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>> of attribute which decides LLC behaviour]
>>
>> Thanks,
>> Lijo
>>
>> >
>> > Christian.
>> >
>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>> >> A better name would be:
>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>> >>
>> >> Marek
>> >>
>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>> >>
>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>> >>     allocation.
>> >>
>> >>     Only compile tested!
>> >>
>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>> >>     ---
>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>> >>      4 files changed, 10 insertions(+)
>> >>
>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >>     index bf97d8f07f57..d8129626581f 100644
>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>> >>     amdgpu_device *adev, uint32_t flags)
>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>> >>
>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>> >>     amdgpu_device *adev,
>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >>     +
>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>> >>                     *flags |= AMDGPU_PTE_PRT;
>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >>     index 8d733eeac556..32ee56adb602 100644
>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>> >>     amdgpu_device *adev,
>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>> >>
>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>> >>     +
>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>> >>                     *flags |= AMDGPU_PTE_PRT;
>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>> >>     b/include/uapi/drm/amdgpu_drm.h
>> >>     index 57b9d8f0133a..9d71d6330687 100644
>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>> >>     +/* don't allocate MALL */
>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>> >>
>> >>      struct drm_amdgpu_gem_va {
>> >>             /** GEM object handle */
>> >>     --
>> >>     2.25.1
>> >>
>> >
>>
>
>

[-- Attachment #2: Type: text/html, Size: 9637 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-11  7:08       ` Marek Olšák
@ 2022-05-11 21:58         ` Marek Olšák
  2022-05-11 22:06           ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-11 21:58 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 2017 bytes --]

Will the kernel keep all discardable buffers in VRAM if VRAM is not
overcommitted by discardable buffers, or will other buffers also affect the
placement of discardable buffers?

Do evictions deallocate the buffer, or do they keep an allocation in GTT
and only the copy is skipped?

Thanks,
Marek

On Wed, May 11, 2022 at 3:08 AM Marek Olšák <maraeo@gmail.com> wrote:

> OK that sounds good.
>
> Marek
>
> On Wed, May 11, 2022 at 2:04 AM Christian König <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Hi Marek,
>>
>> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>
>> A better flag name would be:
>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>
>>
>> A bit long for my taste and I think the best placement is just a side
>> effect.
>>
>>
>> Marek
>>
>> On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>> Does this really guarantee VRAM placement? The code doesn't say anything
>>> about that.
>>>
>>
>> Yes, see the code here:
>>
>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index 8b7ee1142d9a..1944ef37a61e 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>>                 bp->domain;
>>>>         bo->allowed_domains = bo->preferred_domains;
>>>>         if (bp->type != ttm_bo_type_kernel &&
>>>> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>>>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>>
>>>
>> The only case where this could be circumvented is when you try to
>> allocate more than physically available on an APU.
>>
>> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then
>> the GEM code will catch the error and fallback to GTT (IIRC).
>>
>> Regards,
>> Christian.
>>
>

[-- Attachment #2: Type: text/html, Size: 4277 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-11 21:58         ` Marek Olšák
@ 2022-05-11 22:06           ` Marek Olšák
  2022-05-12  7:25             ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-11 22:06 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 2233 bytes --]

3rd question: Is it worth using this on APUs?

Thanks,
Marek

On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com> wrote:

> Will the kernel keep all discardable buffers in VRAM if VRAM is not
> overcommitted by discardable buffers, or will other buffers also affect the
> placement of discardable buffers?
>
> Do evictions deallocate the buffer, or do they keep an allocation in GTT
> and only the copy is skipped?
>
> Thanks,
> Marek
>
> On Wed, May 11, 2022 at 3:08 AM Marek Olšák <maraeo@gmail.com> wrote:
>
>> OK that sounds good.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:04 AM Christian König <
>> ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>> Hi Marek,
>>>
>>> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>>
>>> A better flag name would be:
>>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>>
>>>
>>> A bit long for my taste and I think the best placement is just a side
>>> effect.
>>>
>>>
>>> Marek
>>>
>>> On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:
>>>
>>>> Does this really guarantee VRAM placement? The code doesn't say
>>>> anything about that.
>>>>
>>>
>>> Yes, see the code here:
>>>
>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> index 8b7ee1142d9a..1944ef37a61e 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>>>                 bp->domain;
>>>>>         bo->allowed_domains = bo->preferred_domains;
>>>>>         if (bp->type != ttm_bo_type_kernel &&
>>>>> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>>>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>>>>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>>>
>>>>
>>> The only case where this could be circumvented is when you try to
>>> allocate more than physically available on an APU.
>>>
>>> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then
>>> the GEM code will catch the error and fallback to GTT (IIRC).
>>>
>>> Regards,
>>> Christian.
>>>
>>

[-- Attachment #2: Type: text/html, Size: 4737 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 3/3] drm/amdgpu: bump minor version number
  2022-05-06 13:34   ` Alex Deucher
@ 2022-05-12  2:38     ` Marek Olšák
  0 siblings, 0 replies; 32+ messages in thread
From: Marek Olšák @ 2022-05-12  2:38 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Christian König, amd-gfx list

[-- Attachment #1: Type: text/plain, Size: 1625 bytes --]

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16466

Marek

On Fri, May 6, 2022 at 9:35 AM Alex Deucher <alexdeucher@gmail.com> wrote:

> On Fri, May 6, 2022 at 7:23 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Increase the minor version number to indicate that the new flags are
> > avaiable.
>
> typo: available.  Other than that the series is:
> Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
> Once we get the Mesa patches.
>
> Alex
>
>
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index 16871baee784..3dbf406b4194 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -99,10 +99,11 @@
> >   * - 3.43.0 - Add device hot plug/unplug support
> >   * - 3.44.0 - DCN3 supports DCC independent block settings: !64B &&
> 128B, 64B && 128B
> >   * - 3.45.0 - Add context ioctl stable pstate interface
> > - * * 3.46.0 - To enable hot plug amdgpu tests in libdrm
> > + * - 3.46.0 - To enable hot plug amdgpu tests in libdrm
> > + * * 3.47.0 - Add AMDGPU_GEM_CREATE_DISCARDABLE and AMDGPU_VM_NOALLOC
> flags
> >   */
> >  #define KMS_DRIVER_MAJOR       3
> > -#define KMS_DRIVER_MINOR       46
> > +#define KMS_DRIVER_MINOR       47
> >  #define KMS_DRIVER_PATCHLEVEL  0
> >
> >  int amdgpu_vram_limit;
> > --
> > 2.25.1
> >
>

[-- Attachment #2: Type: text/html, Size: 2435 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-11 22:06           ` Marek Olšák
@ 2022-05-12  7:25             ` Christian König
  2022-05-12 22:17               ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-12  7:25 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 3645 bytes --]

Am 12.05.22 um 00:06 schrieb Marek Olšák:
> 3rd question: Is it worth using this on APUs?

It makes memory management somewhat easier when we are really OOM.

E.g. it should also work for GTT allocations and when the core kernel 
says "Hey please free something up or I will start the OOM-killer" it's 
something we can easily throw away.

Not sure how many of those buffers we have, but marking everything which 
is temporary with that flag is probably a good idea.

>
> Thanks,
> Marek
>
> On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>     Will the kernel keep all discardable buffers in VRAM if VRAM is
>     not overcommitted by discardable buffers, or will other buffers
>     also affect the placement of discardable buffers?
>

Regarding the eviction pressure the buffers will be handled like any 
other buffer, but instead of preserving the content it is just discarded 
on eviction.

>
>     Do evictions deallocate the buffer, or do they keep an allocation
>     in GTT and only the copy is skipped?
>

It really deallocates the backing store of the buffer, just keeps a 
dummy page array around where all entries are NULL.

There is a patch set on the mailing list to make this a little bit more 
efficient, but even using the dummy page array should only have a few 
bytes overhead.

Regards,
Christian.

>
>     Thanks,
>     Marek
>
>     On Wed, May 11, 2022 at 3:08 AM Marek Olšák <maraeo@gmail.com> wrote:
>
>         OK that sounds good.
>
>         Marek
>
>         On Wed, May 11, 2022 at 2:04 AM Christian König
>         <ckoenig.leichtzumerken@gmail.com> wrote:
>
>             Hi Marek,
>
>             Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>             A better flag name would be:
>>             AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>
>             A bit long for my taste and I think the best placement is
>             just a side effect.
>
>>
>>             Marek
>>
>>             On Tue, May 10, 2022 at 4:13 PM Marek Olšák
>>             <maraeo@gmail.com> wrote:
>>
>>                 Does this really guarantee VRAM placement? The code
>>                 doesn't say anything about that.
>>
>
>             Yes, see the code here:
>
>>
>>                     diff --git
>>                     a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>                     b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>                     index 8b7ee1142d9a..1944ef37a61e 100644
>>                     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>                     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>                     @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct
>>                     amdgpu_device *adev,
>>                                     bp->domain;
>>                             bo->allowed_domains = bo->preferred_domains;
>>                             if (bp->type != ttm_bo_type_kernel &&
>>                     +           !(bp->flags &
>>                     AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>                                 bo->allowed_domains ==
>>                     AMDGPU_GEM_DOMAIN_VRAM)
>>                                     bo->allowed_domains |=
>>                     AMDGPU_GEM_DOMAIN_GTT;
>>
>
>             The only case where this could be circumvented is when you
>             try to allocate more than physically available on an APU.
>
>             E.g. you only have something like 32 MiB VRAM and request
>             64 MiB, then the GEM code will catch the error and
>             fallback to GTT (IIRC).
>
>             Regards,
>             Christian.
>

[-- Attachment #2: Type: text/html, Size: 9761 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-12  7:25             ` Christian König
@ 2022-05-12 22:17               ` Marek Olšák
  2022-05-13 11:24                 ` Christian König
       [not found]                 ` <62165c7c-892a-5b35-79dd-b90414ccb5cd@damsy.net>
  0 siblings, 2 replies; 32+ messages in thread
From: Marek Olšák @ 2022-05-12 22:17 UTC (permalink / raw)
  To: Christian König; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 3788 bytes --]

Would it be better to set the VM_ALWAYS_VALID flag to have a greater
guarantee that the best placement will be chosen?

See, the main feature is getting the best placement, not being discardable.
The best placement is a hw design requirement due to using memory for uses
that are expected to have performance similar to onchip SRAMs. We need to
make sure the best placement is guaranteed if it's VRAM.

Marek

On Thu., May 12, 2022, 03:26 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> Am 12.05.22 um 00:06 schrieb Marek Olšák:
>
> 3rd question: Is it worth using this on APUs?
>
>
> It makes memory management somewhat easier when we are really OOM.
>
> E.g. it should also work for GTT allocations and when the core kernel says
> "Hey please free something up or I will start the OOM-killer" it's
> something we can easily throw away.
>
> Not sure how many of those buffers we have, but marking everything which
> is temporary with that flag is probably a good idea.
>
>
> Thanks,
> Marek
>
> On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>> Will the kernel keep all discardable buffers in VRAM if VRAM is not
>> overcommitted by discardable buffers, or will other buffers also affect the
>> placement of discardable buffers?
>>
>
> Regarding the eviction pressure the buffers will be handled like any other
> buffer, but instead of preserving the content it is just discarded on
> eviction.
>
>
>> Do evictions deallocate the buffer, or do they keep an allocation in GTT
>> and only the copy is skipped?
>>
>
> It really deallocates the backing store of the buffer, just keeps a dummy
> page array around where all entries are NULL.
>
> There is a patch set on the mailing list to make this a little bit more
> efficient, but even using the dummy page array should only have a few bytes
> overhead.
>
> Regards,
> Christian.
>
>
>> Thanks,
>> Marek
>>
>> On Wed, May 11, 2022 at 3:08 AM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>> OK that sounds good.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:04 AM Christian König <
>>> ckoenig.leichtzumerken@gmail.com> wrote:
>>>
>>>> Hi Marek,
>>>>
>>>> Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>>>
>>>> A better flag name would be:
>>>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>>>
>>>>
>>>> A bit long for my taste and I think the best placement is just a side
>>>> effect.
>>>>
>>>>
>>>> Marek
>>>>
>>>> On Tue, May 10, 2022 at 4:13 PM Marek Olšák <maraeo@gmail.com> wrote:
>>>>
>>>>> Does this really guarantee VRAM placement? The code doesn't say
>>>>> anything about that.
>>>>>
>>>>
>>>> Yes, see the code here:
>>>>
>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> index 8b7ee1142d9a..1944ef37a61e 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> @@ -567,6 +567,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>>>>>                 bp->domain;
>>>>>>         bo->allowed_domains = bo->preferred_domains;
>>>>>>         if (bp->type != ttm_bo_type_kernel &&
>>>>>> +           !(bp->flags & AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>>>>             bo->allowed_domains == AMDGPU_GEM_DOMAIN_VRAM)
>>>>>>                 bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>>>>
>>>>>
>>>> The only case where this could be circumvented is when you try to
>>>> allocate more than physically available on an APU.
>>>>
>>>> E.g. you only have something like 32 MiB VRAM and request 64 MiB, then
>>>> the GEM code will catch the error and fallback to GTT (IIRC).
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>
>

[-- Attachment #2: Type: text/html, Size: 9717 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-12 22:17               ` Marek Olšák
@ 2022-05-13 11:24                 ` Christian König
       [not found]                 ` <62165c7c-892a-5b35-79dd-b90414ccb5cd@damsy.net>
  1 sibling, 0 replies; 32+ messages in thread
From: Christian König @ 2022-05-13 11:24 UTC (permalink / raw)
  To: Marek Olšák; +Cc: amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 4857 bytes --]

Well the best placement is guaranteed as long as the application doesn't 
do any nonsense (e.g. trying to allocate a buffer larger than available 
VRAM).

The VM_ALWAYS_VALID flag doesn't affect any of that handling.

Regards,
Christian.

Am 13.05.22 um 00:17 schrieb Marek Olšák:
> Would it be better to set the VM_ALWAYS_VALID flag to have a greater 
> guarantee that the best placement will be chosen?
>
> See, the main feature is getting the best placement, not being 
> discardable. The best placement is a hw design requirement due to 
> using memory for uses that are expected to have performance similar to 
> onchip SRAMs. We need to make sure the best placement is guaranteed if 
> it's VRAM.
>
> Marek
>
> On Thu., May 12, 2022, 03:26 Christian König, 
> <ckoenig.leichtzumerken@gmail.com> wrote:
>
>     Am 12.05.22 um 00:06 schrieb Marek Olšák:
>>     3rd question: Is it worth using this on APUs?
>
>     It makes memory management somewhat easier when we are really OOM.
>
>     E.g. it should also work for GTT allocations and when the core
>     kernel says "Hey please free something up or I will start the
>     OOM-killer" it's something we can easily throw away.
>
>     Not sure how many of those buffers we have, but marking everything
>     which is temporary with that flag is probably a good idea.
>
>>
>>     Thanks,
>>     Marek
>>
>>     On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>         Will the kernel keep all discardable buffers in VRAM if VRAM
>>         is not overcommitted by discardable buffers, or will other
>>         buffers also affect the placement of discardable buffers?
>>
>
>     Regarding the eviction pressure the buffers will be handled like
>     any other buffer, but instead of preserving the content it is just
>     discarded on eviction.
>
>>
>>         Do evictions deallocate the buffer, or do they keep an
>>         allocation in GTT and only the copy is skipped?
>>
>
>     It really deallocates the backing store of the buffer, just keeps
>     a dummy page array around where all entries are NULL.
>
>     There is a patch set on the mailing list to make this a little bit
>     more efficient, but even using the dummy page array should only
>     have a few bytes overhead.
>
>     Regards,
>     Christian.
>
>>
>>         Thanks,
>>         Marek
>>
>>         On Wed, May 11, 2022 at 3:08 AM Marek Olšák
>>         <maraeo@gmail.com> wrote:
>>
>>             OK that sounds good.
>>
>>             Marek
>>
>>             On Wed, May 11, 2022 at 2:04 AM Christian König
>>             <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>                 Hi Marek,
>>
>>                 Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>>                 A better flag name would be:
>>>                 AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>
>>                 A bit long for my taste and I think the best
>>                 placement is just a side effect.
>>
>>>
>>>                 Marek
>>>
>>>                 On Tue, May 10, 2022 at 4:13 PM Marek Olšák
>>>                 <maraeo@gmail.com> wrote:
>>>
>>>                     Does this really guarantee VRAM placement? The
>>>                     code doesn't say anything about that.
>>>
>>
>>                 Yes, see the code here:
>>
>>>
>>>                         diff --git
>>>                         a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>                         b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>                         index 8b7ee1142d9a..1944ef37a61e 100644
>>>                         --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>                         +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>                         @@ -567,6 +567,7 @@ int
>>>                         amdgpu_bo_create(struct amdgpu_device *adev,
>>>                                         bp->domain;
>>>                                 bo->allowed_domains =
>>>                         bo->preferred_domains;
>>>                                 if (bp->type != ttm_bo_type_kernel &&
>>>                         +           !(bp->flags &
>>>                         AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>                                     bo->allowed_domains ==
>>>                         AMDGPU_GEM_DOMAIN_VRAM)
>>>                         bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>
>>
>>                 The only case where this could be circumvented is
>>                 when you try to allocate more than physically
>>                 available on an APU.
>>
>>                 E.g. you only have something like 32 MiB VRAM and
>>                 request 64 MiB, then the GEM code will catch the
>>                 error and fallback to GTT (IIRC).
>>
>>                 Regards,
>>                 Christian.
>>
>

[-- Attachment #2: Type: text/html, Size: 13423 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
       [not found]                 ` <62165c7c-892a-5b35-79dd-b90414ccb5cd@damsy.net>
@ 2022-05-13 11:26                   ` Christian König
  2022-07-08 14:58                     ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-13 11:26 UTC (permalink / raw)
  To: Pierre-Eric Pelloux-Prayer, Marek Olšák; +Cc: amd-gfx mailing list

Exactly that's what we can't do.

See the kernel must always be able to move things to GTT or discard. So 
when you want to guarantee that something is in VRAM you must at the 
same time say you can discard it if it can't.

Christian.

Am 13.05.22 um 10:43 schrieb Pierre-Eric Pelloux-Prayer:
> Hi Marek, Christian,
>
> If the main feature for Mesa of AMDGPU_GEM_CREATE_DISCARDABLE is 
> getting the best placement, maybe we should have 2 separate flags:
>   * AMDGPU_GEM_CREATE_DISCARDABLE: indicates to the kernel that it can 
> discards the content on eviction instead of preserving it
>   * AMDGPU_GEM_CREATE_FORCE_BEST_PLACEMENT (or 
> AMDGPU_GEM_CREATE_NO_GTT_FALLBACK ? or AMDGPU_CREATE_GEM_AVOID_GTT?): 
> tells the kernel that this bo really needs to be in VRAM
>
>
> Pierre-Eric
>
> On 13/05/2022 00:17, Marek Olšák wrote:
>> Would it be better to set the VM_ALWAYS_VALID flag to have a greater 
>> guarantee that the best placement will be chosen?
>>
>> See, the main feature is getting the best placement, not being 
>> discardable. The best placement is a hw design requirement due to 
>> using memory for uses that are expected to have performance similar 
>> to onchip SRAMs. We need to make sure the best placement is 
>> guaranteed if it's VRAM.
>>
>> Marek
>>
>> On Thu., May 12, 2022, 03:26 Christian König, 
>> <ckoenig.leichtzumerken@gmail.com 
>> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>>
>>     Am 12.05.22 um 00:06 schrieb Marek Olšák:
>>>     3rd question: Is it worth using this on APUs?
>>
>>     It makes memory management somewhat easier when we are really OOM.
>>
>>     E.g. it should also work for GTT allocations and when the core 
>> kernel says "Hey please free something up or I will start the 
>> OOM-killer" it's something we can easily throw away.
>>
>>     Not sure how many of those buffers we have, but marking 
>> everything which is temporary with that flag is probably a good idea.
>>
>>>
>>>     Thanks,
>>>     Marek
>>>
>>>     On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com 
>>> <mailto:maraeo@gmail.com>> wrote:
>>>
>>>         Will the kernel keep all discardable buffers in VRAM if VRAM 
>>> is not overcommitted by discardable buffers, or will other buffers 
>>> also affect the placement of discardable buffers?
>>>
>>
>>     Regarding the eviction pressure the buffers will be handled like 
>> any other buffer, but instead of preserving the content it is just 
>> discarded on eviction.
>>
>>>
>>>         Do evictions deallocate the buffer, or do they keep an 
>>> allocation in GTT and only the copy is skipped?
>>>
>>
>>     It really deallocates the backing store of the buffer, just keeps 
>> a dummy page array around where all entries are NULL.
>>
>>     There is a patch set on the mailing list to make this a little 
>> bit more efficient, but even using the dummy page array should only 
>> have a few bytes overhead.
>>
>>     Regards,
>>     Christian.
>>
>>>
>>>         Thanks,
>>>         Marek
>>>
>>>         On Wed, May 11, 2022 at 3:08 AM Marek Olšák 
>>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
>>>
>>>             OK that sounds good.
>>>
>>>             Marek
>>>
>>>             On Wed, May 11, 2022 at 2:04 AM Christian König 
>>> <ckoenig.leichtzumerken@gmail.com 
>>> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>>>
>>>                 Hi Marek,
>>>
>>>                 Am 10.05.22 um 22:43 schrieb Marek Olšák:
>>>>                 A better flag name would be:
>>>>                 AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>>>
>>>                 A bit long for my taste and I think the best 
>>> placement is just a side effect.
>>>
>>>>
>>>>                 Marek
>>>>
>>>>                 On Tue, May 10, 2022 at 4:13 PM Marek Olšák 
>>>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
>>>>
>>>>                     Does this really guarantee VRAM placement? The 
>>>> code doesn't say anything about that.
>>>>
>>>
>>>                 Yes, see the code here:
>>>
>>>>
>>>>                         diff --git 
>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>                         index 8b7ee1142d9a..1944ef37a61e 100644
>>>>                         --- 
>>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>                         +++ 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>                         @@ -567,6 +567,7 @@ int 
>>>> amdgpu_bo_create(struct amdgpu_device *adev,
>>>>                                         bp->domain;
>>>>                                 bo->allowed_domains = 
>>>> bo->preferred_domains;
>>>>                                 if (bp->type != ttm_bo_type_kernel &&
>>>>                         +           !(bp->flags & 
>>>> AMDGPU_GEM_CREATE_DISCARDABLE) &&
>>>>                                     bo->allowed_domains == 
>>>> AMDGPU_GEM_DOMAIN_VRAM)
>>>> bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>>>>
>>>
>>>                 The only case where this could be circumvented is 
>>> when you try to allocate more than physically available on an APU.
>>>
>>>                 E.g. you only have something like 32 MiB VRAM and 
>>> request 64 MiB, then the GEM code will catch the error and fallback 
>>> to GTT (IIRC).
>>>
>>>                 Regards,
>>>                 Christian.
>>>
>>


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-11 18:55             ` Marek Olšák
@ 2022-05-16 11:06               ` Marek Olšák
  2022-05-16 11:10                 ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-16 11:06 UTC (permalink / raw)
  To: Christian König; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 6028 bytes --]

FYI, I think it's time to merge this because the Mesa commits are going to
be merged in ~30 minutes if Gitlab CI is green, and that includes updated
amdgpu_drm.h.

Marek

On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:

> Ok sounds good.
>
> Marek
>
> On Wed., May 11, 2022, 03:43 Christian König, <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
>> It really *is* a NOALLOC feature. In other words there is no latency
>> improvement on reads because the cache is always checked, even with the
>> noalloc flag set.
>>
>> The only thing it affects is that misses not enter the cache and so don't
>> cause any additional pressure on evicting cache lines.
>>
>> You might want to double check with the hardware guys, but I'm something
>> like 95% sure that it works this way.
>>
>> Christian.
>>
>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>
>> Bypass means that the contents of the cache are ignored, which decreases
>> latency at the cost of no coherency between bypassed and normal memory
>> requests. NOA (noalloc) means that the cache is checked and can give you
>> cache hits, but misses are not cached and the overall latency is higher. I
>> don't know what the hw does, but I hope it was misnamed and it really means
>> bypass because there is no point in doing cache lookups on every memory
>> request if the driver wants to disable caching to *decrease* latency in the
>> situations when the cache isn't helping.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>
>>>
>>>
>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>>> > entries on write.
>>> >
>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>
>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>>> of attribute which decides LLC behaviour]
>>>
>>> Thanks,
>>> Lijo
>>>
>>> >
>>> > Christian.
>>> >
>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>> >> A better name would be:
>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>> >>
>>> >> Marek
>>> >>
>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> >>
>>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>> >>     allocation.
>>> >>
>>> >>     Only compile tested!
>>> >>
>>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>> >>     ---
>>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>> >>      4 files changed, 10 insertions(+)
>>> >>
>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >>     index bf97d8f07f57..d8129626581f 100644
>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>> >>     amdgpu_device *adev, uint32_t flags)
>>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>> >>
>>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>> >>     amdgpu_device *adev,
>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>> >>
>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>> >>     +
>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >>     index 8d733eeac556..32ee56adb602 100644
>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>> >>     amdgpu_device *adev,
>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>> >>
>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>> >>     +
>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>> >>     b/include/uapi/drm/amdgpu_drm.h
>>> >>     index 57b9d8f0133a..9d71d6330687 100644
>>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>> >>     +/* don't allocate MALL */
>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>> >>
>>> >>      struct drm_amdgpu_gem_va {
>>> >>             /** GEM object handle */
>>> >>     --
>>> >>     2.25.1
>>> >>
>>> >
>>>
>>
>>

[-- Attachment #2: Type: text/html, Size: 10235 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-16 11:06               ` Marek Olšák
@ 2022-05-16 11:10                 ` Marek Olšák
  2022-05-16 11:53                   ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-16 11:10 UTC (permalink / raw)
  To: Christian König; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 6370 bytes --]

I forgot to say: The NOALLOC flag causes an allocation failure, so there is
a kernel bug somewhere.

Marek

On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:

> FYI, I think it's time to merge this because the Mesa commits are going to
> be merged in ~30 minutes if Gitlab CI is green, and that includes updated
> amdgpu_drm.h.
>
> Marek
>
> On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>> Ok sounds good.
>>
>> Marek
>>
>> On Wed., May 11, 2022, 03:43 Christian König, <
>> ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>> It really *is* a NOALLOC feature. In other words there is no latency
>>> improvement on reads because the cache is always checked, even with the
>>> noalloc flag set.
>>>
>>> The only thing it affects is that misses not enter the cache and so
>>> don't cause any additional pressure on evicting cache lines.
>>>
>>> You might want to double check with the hardware guys, but I'm something
>>> like 95% sure that it works this way.
>>>
>>> Christian.
>>>
>>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>
>>> Bypass means that the contents of the cache are ignored, which decreases
>>> latency at the cost of no coherency between bypassed and normal memory
>>> requests. NOA (noalloc) means that the cache is checked and can give you
>>> cache hits, but misses are not cached and the overall latency is higher. I
>>> don't know what the hw does, but I hope it was misnamed and it really means
>>> bypass because there is no point in doing cache lookups on every memory
>>> request if the driver wants to disable caching to *decrease* latency in the
>>> situations when the cache isn't helping.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>
>>>>
>>>>
>>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any MALL
>>>> > entries on write.
>>>> >
>>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>
>>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some sort
>>>> of attribute which decides LLC behaviour]
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>> >
>>>> > Christian.
>>>> >
>>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>> >> A better name would be:
>>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>> >>
>>>> >> Marek
>>>> >>
>>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>> >>
>>>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>>> >>     allocation.
>>>> >>
>>>> >>     Only compile tested!
>>>> >>
>>>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>> >>     ---
>>>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>>> >>      4 files changed, 10 insertions(+)
>>>> >>
>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> >>     index bf97d8f07f57..d8129626581f 100644
>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>>> >>     amdgpu_device *adev, uint32_t flags)
>>>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>>>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>>> >>
>>>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>>>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>>> >>     amdgpu_device *adev,
>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>>> >>
>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>> >>     +
>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>> >>     index 8d733eeac556..32ee56adb602 100644
>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>>> >>     amdgpu_device *adev,
>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>>> >>
>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>> >>     +
>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>>> >>     b/include/uapi/drm/amdgpu_drm.h
>>>> >>     index 57b9d8f0133a..9d71d6330687 100644
>>>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>>>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>>> >>     +/* don't allocate MALL */
>>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>>> >>
>>>> >>      struct drm_amdgpu_gem_va {
>>>> >>             /** GEM object handle */
>>>> >>     --
>>>> >>     2.25.1
>>>> >>
>>>> >
>>>>
>>>
>>>

[-- Attachment #2: Type: text/html, Size: 10747 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-16 11:10                 ` Marek Olšák
@ 2022-05-16 11:53                   ` Christian König
  2022-05-16 12:56                     ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-16 11:53 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 8608 bytes --]

Crap, do you have a link to the failure?

Am 16.05.22 um 13:10 schrieb Marek Olšák:
> I forgot to say: The NOALLOC flag causes an allocation failure, so 
> there is a kernel bug somewhere.
>
> Marek
>
> On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:
>
>     FYI, I think it's time to merge this because the Mesa commits are
>     going to be merged in ~30 minutes if Gitlab CI is green, and that
>     includes updated amdgpu_drm.h.
>
>     Marek
>
>     On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:
>
>         Ok sounds good.
>
>         Marek
>
>         On Wed., May 11, 2022, 03:43 Christian König,
>         <ckoenig.leichtzumerken@gmail.com> wrote:
>
>             It really *is* a NOALLOC feature. In other words there is
>             no latency improvement on reads because the cache is
>             always checked, even with the noalloc flag set.
>
>             The only thing it affects is that misses not enter the
>             cache and so don't cause any additional pressure on
>             evicting cache lines.
>
>             You might want to double check with the hardware guys, but
>             I'm something like 95% sure that it works this way.
>
>             Christian.
>
>             Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>             Bypass means that the contents of the cache are ignored,
>>             which decreases latency at the cost of no coherency
>>             between bypassed and normal memory requests. NOA
>>             (noalloc) means that the cache is checked and can give
>>             you cache hits, but misses are not cached and the overall
>>             latency is higher. I don't know what the hw does, but I
>>             hope it was misnamed and it really means bypass because
>>             there is no point in doing cache lookups on every memory
>>             request if the driver wants to disable caching to
>>             *decrease* latency in the situations when the cache isn't
>>             helping.
>>
>>             Marek
>>
>>             On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
>>             <lijo.lazar@amd.com> wrote:
>>
>>
>>
>>                 On 5/11/2022 11:36 AM, Christian König wrote:
>>                 > Mhm, it doesn't really bypass MALL. It just doesn't
>>                 allocate any MALL
>>                 > entries on write.
>>                 >
>>                 > How about AMDGPU_VM_PAGE_NO_MALL ?
>>
>>                 One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level
>>                 cache, * = some sort
>>                 of attribute which decides LLC behaviour]
>>
>>                 Thanks,
>>                 Lijo
>>
>>                 >
>>                 > Christian.
>>                 >
>>                 > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>                 >> A better name would be:
>>                 >> AMDGPU_VM_PAGE_BYPASS_MALL
>>                 >>
>>                 >> Marek
>>                 >>
>>                 >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>                 >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>                 >>
>>                 >>     Add the AMDGPU_VM_NOALLOC flag to let
>>                 userspace control MALL
>>                 >>     allocation.
>>                 >>
>>                 >>     Only compile tested!
>>                 >>
>>                 >>     Signed-off-by: Christian König
>>                 <christian.koenig@amd.com>
>>                 >>     ---
>>                 >>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>                 >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>                 >>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>                 >>      include/uapi/drm/amdgpu_drm.h          | 2 ++
>>                 >>      4 files changed, 10 insertions(+)
>>                 >>
>>                 >>     diff --git
>>                 a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>                 >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>                 >>     index bf97d8f07f57..d8129626581f 100644
>>                 >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>                 >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>                 >>     @@ -650,6 +650,8 @@ uint64_t
>>                 amdgpu_gem_va_map_flags(struct
>>                 >>     amdgpu_device *adev, uint32_t flags)
>>                 >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>                 >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>                 >>                     pte_flag |= AMDGPU_PTE_PRT;
>>                 >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>                 >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>                 >>
>>                 >>             if (adev->gmc.gmc_funcs->map_mtype)
>>                 >>                     pte_flag |=
>>                 amdgpu_gmc_map_mtype(adev,
>>                 >>     diff --git
>>                 a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>                 >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>                 >>     index b8c79789e1e4..9077dfccaf3c 100644
>>                 >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>                 >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>                 >>     @@ -613,6 +613,9 @@ static void
>>                 gmc_v10_0_get_vm_pte(struct
>>                 >>     amdgpu_device *adev,
>>                 >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>                 >>             *flags |= (mapping->flags &
>>                 AMDGPU_PTE_MTYPE_NV10_MASK);
>>                 >>
>>                 >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>                 >>     +       *flags |= (mapping->flags &
>>                 AMDGPU_PTE_NOALLOC);
>>                 >>     +
>>                 >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                 >>                     *flags |= AMDGPU_PTE_PRT;
>>                 >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>                 >>     diff --git
>>                 a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>                 >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>                 >>     index 8d733eeac556..32ee56adb602 100644
>>                 >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>                 >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>                 >>     @@ -508,6 +508,9 @@ static void
>>                 gmc_v11_0_get_vm_pte(struct
>>                 >>     amdgpu_device *adev,
>>                 >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>                 >>             *flags |= (mapping->flags &
>>                 AMDGPU_PTE_MTYPE_NV10_MASK);
>>                 >>
>>                 >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>                 >>     +       *flags |= (mapping->flags &
>>                 AMDGPU_PTE_NOALLOC);
>>                 >>     +
>>                 >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>                 >>                     *flags |= AMDGPU_PTE_PRT;
>>                 >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>                 >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>                 >>     b/include/uapi/drm/amdgpu_drm.h
>>                 >>     index 57b9d8f0133a..9d71d6330687 100644
>>                 >>     --- a/include/uapi/drm/amdgpu_drm.h
>>                 >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>                 >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>                 >>      #define AMDGPU_VM_MTYPE_UC          (4 << 5)
>>                 >>      /* Use Read Write MTYPE instead of default
>>                 MTYPE */
>>                 >>      #define AMDGPU_VM_MTYPE_RW          (5 << 5)
>>                 >>     +/* don't allocate MALL */
>>                 >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>                 >>
>>                 >>      struct drm_amdgpu_gem_va {
>>                 >>             /** GEM object handle */
>>                 >>     --
>>                 >>     2.25.1
>>                 >>
>>                 >
>>
>

[-- Attachment #2: Type: text/html, Size: 16684 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-16 11:53                   ` Christian König
@ 2022-05-16 12:56                     ` Marek Olšák
  2022-05-16 16:13                       ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-16 12:56 UTC (permalink / raw)
  To: Christian König; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 6874 bytes --]

Reproduction steps:
- use mesa/main on gfx10.3 (not sure what other GPUs do)
- run: radeonsi_mall_noalloc=true glxgears

Marek

On Mon, May 16, 2022 at 7:53 AM Christian König <
ckoenig.leichtzumerken@gmail.com> wrote:

> Crap, do you have a link to the failure?
>
> Am 16.05.22 um 13:10 schrieb Marek Olšák:
>
> I forgot to say: The NOALLOC flag causes an allocation failure, so there
> is a kernel bug somewhere.
>
> Marek
>
> On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:
>
>> FYI, I think it's time to merge this because the Mesa commits are going
>> to be merged in ~30 minutes if Gitlab CI is green, and that includes
>> updated amdgpu_drm.h.
>>
>> Marek
>>
>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>> Ok sounds good.
>>>
>>> Marek
>>>
>>> On Wed., May 11, 2022, 03:43 Christian König, <
>>> ckoenig.leichtzumerken@gmail.com> wrote:
>>>
>>>> It really *is* a NOALLOC feature. In other words there is no latency
>>>> improvement on reads because the cache is always checked, even with the
>>>> noalloc flag set.
>>>>
>>>> The only thing it affects is that misses not enter the cache and so
>>>> don't cause any additional pressure on evicting cache lines.
>>>>
>>>> You might want to double check with the hardware guys, but I'm
>>>> something like 95% sure that it works this way.
>>>>
>>>> Christian.
>>>>
>>>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>>
>>>> Bypass means that the contents of the cache are ignored, which
>>>> decreases latency at the cost of no coherency between bypassed and normal
>>>> memory requests. NOA (noalloc) means that the cache is checked and can give
>>>> you cache hits, but misses are not cached and the overall latency is
>>>> higher. I don't know what the hw does, but I hope it was misnamed and it
>>>> really means bypass because there is no point in doing cache lookups on
>>>> every memory request if the driver wants to disable caching to *decrease*
>>>> latency in the situations when the cache isn't helping.
>>>>
>>>> Marek
>>>>
>>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com> wrote:
>>>>
>>>>>
>>>>>
>>>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
>>>>> MALL
>>>>> > entries on write.
>>>>> >
>>>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>>
>>>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
>>>>> sort
>>>>> of attribute which decides LLC behaviour]
>>>>>
>>>>> Thanks,
>>>>> Lijo
>>>>>
>>>>> >
>>>>> > Christian.
>>>>> >
>>>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>>> >> A better name would be:
>>>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>>> >>
>>>>> >> Marek
>>>>> >>
>>>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>>>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>> >>
>>>>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>>>> >>     allocation.
>>>>> >>
>>>>> >>     Only compile tested!
>>>>> >>
>>>>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>> >>     ---
>>>>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>>>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>>>> >>      4 files changed, 10 insertions(+)
>>>>> >>
>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> >>     index bf97d8f07f57..d8129626581f 100644
>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>>>> >>     amdgpu_device *adev, uint32_t flags)
>>>>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>>>>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>>>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>>>> >>
>>>>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>>>>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>>>> >>     amdgpu_device *adev,
>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>> >>
>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>> >>     +
>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>> >>     index 8d733eeac556..32ee56adb602 100644
>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>>>> >>     amdgpu_device *adev,
>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>> >>             *flags |= (mapping->flags & AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>> >>
>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>> >>     +
>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>> >>     b/include/uapi/drm/amdgpu_drm.h
>>>>> >>     index 57b9d8f0133a..9d71d6330687 100644
>>>>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>>>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>>>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>>>>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>>>> >>     +/* don't allocate MALL */
>>>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>>>> >>
>>>>> >>      struct drm_amdgpu_gem_va {
>>>>> >>             /** GEM object handle */
>>>>> >>     --
>>>>> >>     2.25.1
>>>>> >>
>>>>> >
>>>>>
>>>>
>>>>
>

[-- Attachment #2: Type: text/html, Size: 16418 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-16 12:56                     ` Marek Olšák
@ 2022-05-16 16:13                       ` Christian König
  2022-05-17  0:12                         ` Marek Olšák
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-16 16:13 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 9814 bytes --]

I don't have access to any gfx10 hardware.

Can you give me a dmesg and/or backtrace, etc..?

I can't push this unless it's working properly.

Christian.

Am 16.05.22 um 14:56 schrieb Marek Olšák:
> Reproduction steps:
> - use mesa/main on gfx10.3 (not sure what other GPUs do)
> - run: radeonsi_mall_noalloc=true glxgears
>
> Marek
>
> On Mon, May 16, 2022 at 7:53 AM Christian König 
> <ckoenig.leichtzumerken@gmail.com> wrote:
>
>     Crap, do you have a link to the failure?
>
>     Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>     I forgot to say: The NOALLOC flag causes an allocation failure,
>>     so there is a kernel bug somewhere.
>>
>>     Marek
>>
>>     On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>         FYI, I think it's time to merge this because the Mesa commits
>>         are going to be merged in ~30 minutes if Gitlab CI is green,
>>         and that includes updated amdgpu_drm.h.
>>
>>         Marek
>>
>>         On Wed, May 11, 2022 at 2:55 PM Marek Olšák
>>         <maraeo@gmail.com> wrote:
>>
>>             Ok sounds good.
>>
>>             Marek
>>
>>             On Wed., May 11, 2022, 03:43 Christian König,
>>             <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>                 It really *is* a NOALLOC feature. In other words
>>                 there is no latency improvement on reads because the
>>                 cache is always checked, even with the noalloc flag set.
>>
>>                 The only thing it affects is that misses not enter
>>                 the cache and so don't cause any additional pressure
>>                 on evicting cache lines.
>>
>>                 You might want to double check with the hardware
>>                 guys, but I'm something like 95% sure that it works
>>                 this way.
>>
>>                 Christian.
>>
>>                 Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>                 Bypass means that the contents of the cache are
>>>                 ignored, which decreases latency at the cost of no
>>>                 coherency between bypassed and normal memory
>>>                 requests. NOA (noalloc) means that the cache is
>>>                 checked and can give you cache hits, but misses are
>>>                 not cached and the overall latency is higher. I
>>>                 don't know what the hw does, but I hope it was
>>>                 misnamed and it really means bypass because there is
>>>                 no point in doing cache lookups on every memory
>>>                 request if the driver wants to disable caching to
>>>                 *decrease* latency in the situations when the cache
>>>                 isn't helping.
>>>
>>>                 Marek
>>>
>>>                 On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
>>>                 <lijo.lazar@amd.com> wrote:
>>>
>>>
>>>
>>>                     On 5/11/2022 11:36 AM, Christian König wrote:
>>>                     > Mhm, it doesn't really bypass MALL. It just
>>>                     doesn't allocate any MALL
>>>                     > entries on write.
>>>                     >
>>>                     > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>
>>>                     One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last
>>>                     level cache, * = some sort
>>>                     of attribute which decides LLC behaviour]
>>>
>>>                     Thanks,
>>>                     Lijo
>>>
>>>                     >
>>>                     > Christian.
>>>                     >
>>>                     > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>                     >> A better name would be:
>>>                     >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>                     >>
>>>                     >> Marek
>>>                     >>
>>>                     >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>>                     >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>                     >>
>>>                     >>     Add the AMDGPU_VM_NOALLOC flag to let
>>>                     userspace control MALL
>>>                     >>     allocation.
>>>                     >>
>>>                     >>     Only compile tested!
>>>                     >>
>>>                     >>     Signed-off-by: Christian König
>>>                     <christian.koenig@amd.com>
>>>                     >>     ---
>>>                     >>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>>                     >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 3 +++
>>>                     >>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c | 3 +++
>>>                     >>   include/uapi/drm/amdgpu_drm.h        | 2 ++
>>>                     >>      4 files changed, 10 insertions(+)
>>>                     >>
>>>                     >>     diff --git
>>>                     a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>                     >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>                     >>     index bf97d8f07f57..d8129626581f 100644
>>>                     >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>                     >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>                     >>     @@ -650,6 +650,8 @@ uint64_t
>>>                     amdgpu_gem_va_map_flags(struct
>>>                     >>     amdgpu_device *adev, uint32_t flags)
>>>                     >> pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>                     >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>                     >> pte_flag |= AMDGPU_PTE_PRT;
>>>                     >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>>                     >>     +  pte_flag |= AMDGPU_PTE_NOALLOC;
>>>                     >>
>>>                     >>             if (adev->gmc.gmc_funcs->map_mtype)
>>>                     >> pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>                     >>     diff --git
>>>                     a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>                     >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>                     >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>                     >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>                     >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>                     >>     @@ -613,6 +613,9 @@ static void
>>>                     gmc_v10_0_get_vm_pte(struct
>>>                     >>     amdgpu_device *adev,
>>>                     >>             *flags &=
>>>                     ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>                     >>             *flags |= (mapping->flags &
>>>                     AMDGPU_PTE_MTYPE_NV10_MASK);
>>>                     >>
>>>                     >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>                     >>     +       *flags |= (mapping->flags &
>>>                     AMDGPU_PTE_NOALLOC);
>>>                     >>     +
>>>                     >>             if (mapping->flags &
>>>                     AMDGPU_PTE_PRT) {
>>>                     >> *flags |= AMDGPU_PTE_PRT;
>>>                     >> *flags |= AMDGPU_PTE_SNOOPED;
>>>                     >>     diff --git
>>>                     a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>                     >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>                     >>     index 8d733eeac556..32ee56adb602 100644
>>>                     >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>                     >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>                     >>     @@ -508,6 +508,9 @@ static void
>>>                     gmc_v11_0_get_vm_pte(struct
>>>                     >>     amdgpu_device *adev,
>>>                     >>             *flags &=
>>>                     ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>                     >>             *flags |= (mapping->flags &
>>>                     AMDGPU_PTE_MTYPE_NV10_MASK);
>>>                     >>
>>>                     >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>                     >>     +       *flags |= (mapping->flags &
>>>                     AMDGPU_PTE_NOALLOC);
>>>                     >>     +
>>>                     >>             if (mapping->flags &
>>>                     AMDGPU_PTE_PRT) {
>>>                     >> *flags |= AMDGPU_PTE_PRT;
>>>                     >> *flags |= AMDGPU_PTE_SNOOPED;
>>>                     >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>>                     >>  b/include/uapi/drm/amdgpu_drm.h
>>>                     >>     index 57b9d8f0133a..9d71d6330687 100644
>>>                     >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>                     >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>                     >>     @@ -533,6 +533,8 @@ struct
>>>                     drm_amdgpu_gem_op {
>>>                     >>      #define AMDGPU_VM_MTYPE_UC           
>>>                      (4 << 5)
>>>                     >>      /* Use Read Write MTYPE instead of
>>>                     default MTYPE */
>>>                     >>      #define AMDGPU_VM_MTYPE_RW           
>>>                      (5 << 5)
>>>                     >>     +/* don't allocate MALL */
>>>                     >>     +#define AMDGPU_VM_PAGE_NOALLOC       
>>>                      (1 << 9)
>>>                     >>
>>>                     >>      struct drm_amdgpu_gem_va {
>>>                     >>             /** GEM object handle */
>>>                     >>     --
>>>                     >>     2.25.1
>>>                     >>
>>>                     >
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 22145 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-16 16:13                       ` Christian König
@ 2022-05-17  0:12                         ` Marek Olšák
  2022-05-17  6:33                           ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-05-17  0:12 UTC (permalink / raw)
  To: Christian König; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 7584 bytes --]

Dmesg doesn't contain anything. There is no backtrace because it's not a
crash. The VA map ioctl just fails with the new flag. It looks like the
flag is considered invalid.

Marek

On Mon., May 16, 2022, 12:13 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> I don't have access to any gfx10 hardware.
>
> Can you give me a dmesg and/or backtrace, etc..?
>
> I can't push this unless it's working properly.
>
> Christian.
>
> Am 16.05.22 um 14:56 schrieb Marek Olšák:
>
> Reproduction steps:
> - use mesa/main on gfx10.3 (not sure what other GPUs do)
> - run: radeonsi_mall_noalloc=true glxgears
>
> Marek
>
> On Mon, May 16, 2022 at 7:53 AM Christian König <
> ckoenig.leichtzumerken@gmail.com> wrote:
>
>> Crap, do you have a link to the failure?
>>
>> Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>
>> I forgot to say: The NOALLOC flag causes an allocation failure, so there
>> is a kernel bug somewhere.
>>
>> Marek
>>
>> On Mon, May 16, 2022 at 7:06 AM Marek Olšák <maraeo@gmail.com> wrote:
>>
>>> FYI, I think it's time to merge this because the Mesa commits are going
>>> to be merged in ~30 minutes if Gitlab CI is green, and that includes
>>> updated amdgpu_drm.h.
>>>
>>> Marek
>>>
>>> On Wed, May 11, 2022 at 2:55 PM Marek Olšák <maraeo@gmail.com> wrote:
>>>
>>>> Ok sounds good.
>>>>
>>>> Marek
>>>>
>>>> On Wed., May 11, 2022, 03:43 Christian König, <
>>>> ckoenig.leichtzumerken@gmail.com> wrote:
>>>>
>>>>> It really *is* a NOALLOC feature. In other words there is no latency
>>>>> improvement on reads because the cache is always checked, even with the
>>>>> noalloc flag set.
>>>>>
>>>>> The only thing it affects is that misses not enter the cache and so
>>>>> don't cause any additional pressure on evicting cache lines.
>>>>>
>>>>> You might want to double check with the hardware guys, but I'm
>>>>> something like 95% sure that it works this way.
>>>>>
>>>>> Christian.
>>>>>
>>>>> Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>>>
>>>>> Bypass means that the contents of the cache are ignored, which
>>>>> decreases latency at the cost of no coherency between bypassed and normal
>>>>> memory requests. NOA (noalloc) means that the cache is checked and can give
>>>>> you cache hits, but misses are not cached and the overall latency is
>>>>> higher. I don't know what the hw does, but I hope it was misnamed and it
>>>>> really means bypass because there is no point in doing cache lookups on
>>>>> every memory request if the driver wants to disable caching to *decrease*
>>>>> latency in the situations when the cache isn't helping.
>>>>>
>>>>> Marek
>>>>>
>>>>> On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo <lijo.lazar@amd.com>
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/11/2022 11:36 AM, Christian König wrote:
>>>>>> > Mhm, it doesn't really bypass MALL. It just doesn't allocate any
>>>>>> MALL
>>>>>> > entries on write.
>>>>>> >
>>>>>> > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>>>
>>>>>> One more - AMDGPU_VM_PAGE_LLC_* [ LLC = last level cache, * = some
>>>>>> sort
>>>>>> of attribute which decides LLC behaviour]
>>>>>>
>>>>>> Thanks,
>>>>>> Lijo
>>>>>>
>>>>>> >
>>>>>> > Christian.
>>>>>> >
>>>>>> > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>>>> >> A better name would be:
>>>>>> >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>>>> >>
>>>>>> >> Marek
>>>>>> >>
>>>>>> >> On Fri, May 6, 2022 at 7:23 AM Christian König
>>>>>> >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>> >>
>>>>>> >>     Add the AMDGPU_VM_NOALLOC flag to let userspace control MALL
>>>>>> >>     allocation.
>>>>>> >>
>>>>>> >>     Only compile tested!
>>>>>> >>
>>>>>> >>     Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> >>     ---
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 ++
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c  | 3 +++
>>>>>> >>      drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c  | 3 +++
>>>>>> >>      include/uapi/drm/amdgpu_drm.h           | 2 ++
>>>>>> >>      4 files changed, 10 insertions(+)
>>>>>> >>
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     index bf97d8f07f57..d8129626581f 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>> >>     @@ -650,6 +650,8 @@ uint64_t amdgpu_gem_va_map_flags(struct
>>>>>> >>     amdgpu_device *adev, uint32_t flags)
>>>>>> >>                     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>>>> >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>>>> >>                     pte_flag |= AMDGPU_PTE_PRT;
>>>>>> >>     +       if (flags & AMDGPU_VM_PAGE_NOALLOC)
>>>>>> >>     +               pte_flag |= AMDGPU_PTE_NOALLOC;
>>>>>> >>
>>>>>> >>             if (adev->gmc.gmc_funcs->map_mtype)
>>>>>> >>                     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>> >>     @@ -613,6 +613,9 @@ static void gmc_v10_0_get_vm_pte(struct
>>>>>> >>     amdgpu_device *adev,
>>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>> >>             *flags |= (mapping->flags &
>>>>>> AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>> >>
>>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>>> >>     +
>>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>> >>     diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     index 8d733eeac556..32ee56adb602 100644
>>>>>> >>     --- a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>> >>     @@ -508,6 +508,9 @@ static void gmc_v11_0_get_vm_pte(struct
>>>>>> >>     amdgpu_device *adev,
>>>>>> >>             *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>> >>             *flags |= (mapping->flags &
>>>>>> AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>> >>
>>>>>> >>     +       *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>> >>     +       *flags |= (mapping->flags & AMDGPU_PTE_NOALLOC);
>>>>>> >>     +
>>>>>> >>             if (mapping->flags & AMDGPU_PTE_PRT) {
>>>>>> >>                     *flags |= AMDGPU_PTE_PRT;
>>>>>> >>                     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>> >>     diff --git a/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     b/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     index 57b9d8f0133a..9d71d6330687 100644
>>>>>> >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>> >>     @@ -533,6 +533,8 @@ struct drm_amdgpu_gem_op {
>>>>>> >>      #define AMDGPU_VM_MTYPE_UC             (4 << 5)
>>>>>> >>      /* Use Read Write MTYPE instead of default MTYPE */
>>>>>> >>      #define AMDGPU_VM_MTYPE_RW             (5 << 5)
>>>>>> >>     +/* don't allocate MALL */
>>>>>> >>     +#define AMDGPU_VM_PAGE_NOALLOC         (1 << 9)
>>>>>> >>
>>>>>> >>      struct drm_amdgpu_gem_va {
>>>>>> >>             /** GEM object handle */
>>>>>> >>     --
>>>>>> >>     2.25.1
>>>>>> >>
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 21538 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-17  0:12                         ` Marek Olšák
@ 2022-05-17  6:33                           ` Christian König
  2022-05-18  8:28                             ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Christian König @ 2022-05-17  6:33 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 11449 bytes --]

Ok that sounds like a rather simple bug. I will try to take a look.

Thanks,
Christian.

Am 17.05.22 um 02:12 schrieb Marek Olšák:
> Dmesg doesn't contain anything. There is no backtrace because it's not 
> a crash. The VA map ioctl just fails with the new flag. It looks like 
> the flag is considered invalid.
>
> Marek
>
> On Mon., May 16, 2022, 12:13 Christian König, 
> <ckoenig.leichtzumerken@gmail.com> wrote:
>
>     I don't have access to any gfx10 hardware.
>
>     Can you give me a dmesg and/or backtrace, etc..?
>
>     I can't push this unless it's working properly.
>
>     Christian.
>
>     Am 16.05.22 um 14:56 schrieb Marek Olšák:
>>     Reproduction steps:
>>     - use mesa/main on gfx10.3 (not sure what other GPUs do)
>>     - run: radeonsi_mall_noalloc=true glxgears
>>
>>     Marek
>>
>>     On Mon, May 16, 2022 at 7:53 AM Christian König
>>     <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>         Crap, do you have a link to the failure?
>>
>>         Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>>         I forgot to say: The NOALLOC flag causes an allocation
>>>         failure, so there is a kernel bug somewhere.
>>>
>>>         Marek
>>>
>>>         On Mon, May 16, 2022 at 7:06 AM Marek Olšák
>>>         <maraeo@gmail.com> wrote:
>>>
>>>             FYI, I think it's time to merge this because the Mesa
>>>             commits are going to be merged in ~30 minutes if Gitlab
>>>             CI is green, and that includes updated amdgpu_drm.h.
>>>
>>>             Marek
>>>
>>>             On Wed, May 11, 2022 at 2:55 PM Marek Olšák
>>>             <maraeo@gmail.com> wrote:
>>>
>>>                 Ok sounds good.
>>>
>>>                 Marek
>>>
>>>                 On Wed., May 11, 2022, 03:43 Christian König,
>>>                 <ckoenig.leichtzumerken@gmail.com> wrote:
>>>
>>>                     It really *is* a NOALLOC feature. In other words
>>>                     there is no latency improvement on reads because
>>>                     the cache is always checked, even with the
>>>                     noalloc flag set.
>>>
>>>                     The only thing it affects is that misses not
>>>                     enter the cache and so don't cause any
>>>                     additional pressure on evicting cache lines.
>>>
>>>                     You might want to double check with the hardware
>>>                     guys, but I'm something like 95% sure that it
>>>                     works this way.
>>>
>>>                     Christian.
>>>
>>>                     Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>>                     Bypass means that the contents of the cache are
>>>>                     ignored, which decreases latency at the cost of
>>>>                     no coherency between bypassed and normal memory
>>>>                     requests. NOA (noalloc) means that the cache is
>>>>                     checked and can give you cache hits, but misses
>>>>                     are not cached and the overall latency is
>>>>                     higher. I don't know what the hw does, but I
>>>>                     hope it was misnamed and it really means bypass
>>>>                     because there is no point in doing cache
>>>>                     lookups on every memory request if the driver
>>>>                     wants to disable caching to *decrease* latency
>>>>                     in the situations when the cache isn't helping.
>>>>
>>>>                     Marek
>>>>
>>>>                     On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
>>>>                     <lijo.lazar@amd.com> wrote:
>>>>
>>>>
>>>>
>>>>                         On 5/11/2022 11:36 AM, Christian König wrote:
>>>>                         > Mhm, it doesn't really bypass MALL. It
>>>>                         just doesn't allocate any MALL
>>>>                         > entries on write.
>>>>                         >
>>>>                         > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>
>>>>                         One more - AMDGPU_VM_PAGE_LLC_* [ LLC =
>>>>                         last level cache, * = some sort
>>>>                         of attribute which decides LLC behaviour]
>>>>
>>>>                         Thanks,
>>>>                         Lijo
>>>>
>>>>                         >
>>>>                         > Christian.
>>>>                         >
>>>>                         > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>>                         >> A better name would be:
>>>>                         >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>>                         >>
>>>>                         >> Marek
>>>>                         >>
>>>>                         >> On Fri, May 6, 2022 at 7:23 AM Christian
>>>>                         König
>>>>                         >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>                         >>
>>>>                         >>     Add the AMDGPU_VM_NOALLOC flag to
>>>>                         let userspace control MALL
>>>>                         >>     allocation.
>>>>                         >>
>>>>                         >>     Only compile tested!
>>>>                         >>
>>>>                         >>  Signed-off-by: Christian König
>>>>                         <christian.koenig@amd.com>
>>>>                         >>     ---
>>>>                         >>
>>>>                           drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c |
>>>>                         2 ++
>>>>                         >>   drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>                         | 3 +++
>>>>                         >>   drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>                         | 3 +++
>>>>                         >>   include/uapi/drm/amdgpu_drm.h        
>>>>                          | 2 ++
>>>>                         >>      4 files changed, 10 insertions(+)
>>>>                         >>
>>>>                         >>     diff --git
>>>>                         a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>                         >>     index bf97d8f07f57..d8129626581f 100644
>>>>                         >>     ---
>>>>                         a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>                         >>     +++
>>>>                         b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>                         >>     @@ -650,6 +650,8 @@ uint64_t
>>>>                         amdgpu_gem_va_map_flags(struct
>>>>                         >>     amdgpu_device *adev, uint32_t flags)
>>>>                         >>   pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>>                         >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>>                         >>   pte_flag |= AMDGPU_PTE_PRT;
>>>>                         >>     +       if (flags &
>>>>                         AMDGPU_VM_PAGE_NOALLOC)
>>>>                         >>     +    pte_flag |= AMDGPU_PTE_NOALLOC;
>>>>                         >>
>>>>                         >>             if
>>>>                         (adev->gmc.gmc_funcs->map_mtype)
>>>>                         >>   pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>>                         >>     diff --git
>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>                         >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>>                         >>     ---
>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>                         >>     +++
>>>>                         b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>                         >>     @@ -613,6 +613,9 @@ static void
>>>>                         gmc_v10_0_get_vm_pte(struct
>>>>                         >>     amdgpu_device *adev,
>>>>                         >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>                         >> *flags |= (mapping->flags &
>>>>                         AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>                         >>
>>>>                         >>     +  *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>                         >>     +  *flags |= (mapping->flags &
>>>>                         AMDGPU_PTE_NOALLOC);
>>>>                         >>     +
>>>>                         >>             if (mapping->flags &
>>>>                         AMDGPU_PTE_PRT) {
>>>>                         >>   *flags |= AMDGPU_PTE_PRT;
>>>>                         >>   *flags |= AMDGPU_PTE_SNOOPED;
>>>>                         >>     diff --git
>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>                         >>     index 8d733eeac556..32ee56adb602 100644
>>>>                         >>     ---
>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>                         >>     +++
>>>>                         b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>                         >>     @@ -508,6 +508,9 @@ static void
>>>>                         gmc_v11_0_get_vm_pte(struct
>>>>                         >>     amdgpu_device *adev,
>>>>                         >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>                         >> *flags |= (mapping->flags &
>>>>                         AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>                         >>
>>>>                         >>     +  *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>                         >>     +  *flags |= (mapping->flags &
>>>>                         AMDGPU_PTE_NOALLOC);
>>>>                         >>     +
>>>>                         >>             if (mapping->flags &
>>>>                         AMDGPU_PTE_PRT) {
>>>>                         >>   *flags |= AMDGPU_PTE_PRT;
>>>>                         >>   *flags |= AMDGPU_PTE_SNOOPED;
>>>>                         >>     diff --git
>>>>                         a/include/uapi/drm/amdgpu_drm.h
>>>>                         >>  b/include/uapi/drm/amdgpu_drm.h
>>>>                         >>     index 57b9d8f0133a..9d71d6330687 100644
>>>>                         >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>>                         >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>>                         >>     @@ -533,6 +533,8 @@ struct
>>>>                         drm_amdgpu_gem_op {
>>>>                         >>      #define AMDGPU_VM_MTYPE_UC      (4
>>>>                         << 5)
>>>>                         >>      /* Use Read Write MTYPE instead of
>>>>                         default MTYPE */
>>>>                         >>      #define AMDGPU_VM_MTYPE_RW      (5
>>>>                         << 5)
>>>>                         >>     +/* don't allocate MALL */
>>>>                         >>     +#define AMDGPU_VM_PAGE_NOALLOC    
>>>>                          (1 << 9)
>>>>                         >>
>>>>                         >>      struct drm_amdgpu_gem_va {
>>>>                         >>             /** GEM object handle */
>>>>                         >>     --
>>>>                         >>     2.25.1
>>>>                         >>
>>>>                         >
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 28764 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC
  2022-05-17  6:33                           ` Christian König
@ 2022-05-18  8:28                             ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2022-05-18  8:28 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Lazar, Lijo, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 12009 bytes --]

Found and fixed. If nobody has any more objections are batter names for 
the new flags I'm going to push this to amd-staging-drm-next tomorrow.

Thanks,
Christian.

Am 17.05.22 um 08:33 schrieb Christian König:
> Ok that sounds like a rather simple bug. I will try to take a look.
>
> Thanks,
> Christian.
>
> Am 17.05.22 um 02:12 schrieb Marek Olšák:
>> Dmesg doesn't contain anything. There is no backtrace because it's 
>> not a crash. The VA map ioctl just fails with the new flag. It looks 
>> like the flag is considered invalid.
>>
>> Marek
>>
>> On Mon., May 16, 2022, 12:13 Christian König, 
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>>     I don't have access to any gfx10 hardware.
>>
>>     Can you give me a dmesg and/or backtrace, etc..?
>>
>>     I can't push this unless it's working properly.
>>
>>     Christian.
>>
>>     Am 16.05.22 um 14:56 schrieb Marek Olšák:
>>>     Reproduction steps:
>>>     - use mesa/main on gfx10.3 (not sure what other GPUs do)
>>>     - run: radeonsi_mall_noalloc=true glxgears
>>>
>>>     Marek
>>>
>>>     On Mon, May 16, 2022 at 7:53 AM Christian König
>>>     <ckoenig.leichtzumerken@gmail.com> wrote:
>>>
>>>         Crap, do you have a link to the failure?
>>>
>>>         Am 16.05.22 um 13:10 schrieb Marek Olšák:
>>>>         I forgot to say: The NOALLOC flag causes an allocation
>>>>         failure, so there is a kernel bug somewhere.
>>>>
>>>>         Marek
>>>>
>>>>         On Mon, May 16, 2022 at 7:06 AM Marek Olšák
>>>>         <maraeo@gmail.com> wrote:
>>>>
>>>>             FYI, I think it's time to merge this because the Mesa
>>>>             commits are going to be merged in ~30 minutes if Gitlab
>>>>             CI is green, and that includes updated amdgpu_drm.h.
>>>>
>>>>             Marek
>>>>
>>>>             On Wed, May 11, 2022 at 2:55 PM Marek Olšák
>>>>             <maraeo@gmail.com> wrote:
>>>>
>>>>                 Ok sounds good.
>>>>
>>>>                 Marek
>>>>
>>>>                 On Wed., May 11, 2022, 03:43 Christian König,
>>>>                 <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>
>>>>                     It really *is* a NOALLOC feature. In other
>>>>                     words there is no latency improvement on reads
>>>>                     because the cache is always checked, even with
>>>>                     the noalloc flag set.
>>>>
>>>>                     The only thing it affects is that misses not
>>>>                     enter the cache and so don't cause any
>>>>                     additional pressure on evicting cache lines.
>>>>
>>>>                     You might want to double check with the
>>>>                     hardware guys, but I'm something like 95% sure
>>>>                     that it works this way.
>>>>
>>>>                     Christian.
>>>>
>>>>                     Am 11.05.22 um 09:22 schrieb Marek Olšák:
>>>>>                     Bypass means that the contents of the cache
>>>>>                     are ignored, which decreases latency at the
>>>>>                     cost of no coherency between bypassed and
>>>>>                     normal memory requests. NOA (noalloc) means
>>>>>                     that the cache is checked and can give you
>>>>>                     cache hits, but misses are not cached and the
>>>>>                     overall latency is higher. I don't know what
>>>>>                     the hw does, but I hope it was misnamed and it
>>>>>                     really means bypass because there is no point
>>>>>                     in doing cache lookups on every memory request
>>>>>                     if the driver wants to disable caching to
>>>>>                     *decrease* latency in the situations when the
>>>>>                     cache isn't helping.
>>>>>
>>>>>                     Marek
>>>>>
>>>>>                     On Wed, May 11, 2022 at 2:15 AM Lazar, Lijo
>>>>>                     <lijo.lazar@amd.com> wrote:
>>>>>
>>>>>
>>>>>
>>>>>                         On 5/11/2022 11:36 AM, Christian König wrote:
>>>>>                         > Mhm, it doesn't really bypass MALL. It
>>>>>                         just doesn't allocate any MALL
>>>>>                         > entries on write.
>>>>>                         >
>>>>>                         > How about AMDGPU_VM_PAGE_NO_MALL ?
>>>>>
>>>>>                         One more - AMDGPU_VM_PAGE_LLC_* [ LLC =
>>>>>                         last level cache, * = some sort
>>>>>                         of attribute which decides LLC behaviour]
>>>>>
>>>>>                         Thanks,
>>>>>                         Lijo
>>>>>
>>>>>                         >
>>>>>                         > Christian.
>>>>>                         >
>>>>>                         > Am 10.05.22 um 23:21 schrieb Marek Olšák:
>>>>>                         >> A better name would be:
>>>>>                         >> AMDGPU_VM_PAGE_BYPASS_MALL
>>>>>                         >>
>>>>>                         >> Marek
>>>>>                         >>
>>>>>                         >> On Fri, May 6, 2022 at 7:23 AM
>>>>>                         Christian König
>>>>>                         >> <ckoenig.leichtzumerken@gmail.com> wrote:
>>>>>                         >>
>>>>>                         >>     Add the AMDGPU_VM_NOALLOC flag to
>>>>>                         let userspace control MALL
>>>>>                         >>     allocation.
>>>>>                         >>
>>>>>                         >>     Only compile tested!
>>>>>                         >>
>>>>>                         >>  Signed-off-by: Christian König
>>>>>                         <christian.koenig@amd.com>
>>>>>                         >>     ---
>>>>>                         >>
>>>>>                           drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>                         | 2 ++
>>>>>                         >>
>>>>>                           drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c |
>>>>>                         3 +++
>>>>>                         >>
>>>>>                           drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c |
>>>>>                         3 +++
>>>>>                         >>   include/uapi/drm/amdgpu_drm.h        
>>>>>                          | 2 ++
>>>>>                         >>      4 files changed, 10 insertions(+)
>>>>>                         >>
>>>>>                         >>     diff --git
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>                         >>     index bf97d8f07f57..d8129626581f 100644
>>>>>                         >>     ---
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>                         >>     +++
>>>>>                         b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>>>>                         >>     @@ -650,6 +650,8 @@ uint64_t
>>>>>                         amdgpu_gem_va_map_flags(struct
>>>>>                         >>  amdgpu_device *adev, uint32_t flags)
>>>>>                         >>     pte_flag |= AMDGPU_PTE_WRITEABLE;
>>>>>                         >>             if (flags & AMDGPU_VM_PAGE_PRT)
>>>>>                         >>     pte_flag |= AMDGPU_PTE_PRT;
>>>>>                         >>     +       if (flags &
>>>>>                         AMDGPU_VM_PAGE_NOALLOC)
>>>>>                         >>     +      pte_flag |= AMDGPU_PTE_NOALLOC;
>>>>>                         >>
>>>>>                         >>             if
>>>>>                         (adev->gmc.gmc_funcs->map_mtype)
>>>>>                         >>     pte_flag |= amdgpu_gmc_map_mtype(adev,
>>>>>                         >>     diff --git
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>                         >>     index b8c79789e1e4..9077dfccaf3c 100644
>>>>>                         >>     ---
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>                         >>     +++
>>>>>                         b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>>>>                         >>     @@ -613,6 +613,9 @@ static void
>>>>>                         gmc_v10_0_get_vm_pte(struct
>>>>>                         >>  amdgpu_device *adev,
>>>>>                         >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>                         >> *flags |= (mapping->flags &
>>>>>                         AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>                         >>
>>>>>                         >>     +  *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>                         >>     +  *flags |= (mapping->flags &
>>>>>                         AMDGPU_PTE_NOALLOC);
>>>>>                         >>     +
>>>>>                         >>             if (mapping->flags &
>>>>>                         AMDGPU_PTE_PRT) {
>>>>>                         >>     *flags |= AMDGPU_PTE_PRT;
>>>>>                         >>     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>                         >>     diff --git
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>                         >>  b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>                         >>     index 8d733eeac556..32ee56adb602 100644
>>>>>                         >>     ---
>>>>>                         a/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>                         >>     +++
>>>>>                         b/drivers/gpu/drm/amd/amdgpu/gmc_v11_0.c
>>>>>                         >>     @@ -508,6 +508,9 @@ static void
>>>>>                         gmc_v11_0_get_vm_pte(struct
>>>>>                         >>  amdgpu_device *adev,
>>>>>                         >> *flags &= ~AMDGPU_PTE_MTYPE_NV10_MASK;
>>>>>                         >> *flags |= (mapping->flags &
>>>>>                         AMDGPU_PTE_MTYPE_NV10_MASK);
>>>>>                         >>
>>>>>                         >>     +  *flags &= ~AMDGPU_PTE_NOALLOC;
>>>>>                         >>     +  *flags |= (mapping->flags &
>>>>>                         AMDGPU_PTE_NOALLOC);
>>>>>                         >>     +
>>>>>                         >>             if (mapping->flags &
>>>>>                         AMDGPU_PTE_PRT) {
>>>>>                         >>     *flags |= AMDGPU_PTE_PRT;
>>>>>                         >>     *flags |= AMDGPU_PTE_SNOOPED;
>>>>>                         >>     diff --git
>>>>>                         a/include/uapi/drm/amdgpu_drm.h
>>>>>                         >>  b/include/uapi/drm/amdgpu_drm.h
>>>>>                         >>     index 57b9d8f0133a..9d71d6330687 100644
>>>>>                         >>     --- a/include/uapi/drm/amdgpu_drm.h
>>>>>                         >>     +++ b/include/uapi/drm/amdgpu_drm.h
>>>>>                         >>     @@ -533,6 +533,8 @@ struct
>>>>>                         drm_amdgpu_gem_op {
>>>>>                         >>      #define AMDGPU_VM_MTYPE_UC      
>>>>>                          (4 << 5)
>>>>>                         >>      /* Use Read Write MTYPE instead of
>>>>>                         default MTYPE */
>>>>>                         >>      #define AMDGPU_VM_MTYPE_RW      
>>>>>                          (5 << 5)
>>>>>                         >>     +/* don't allocate MALL */
>>>>>                         >>     +#define AMDGPU_VM_PAGE_NOALLOC    
>>>>>                            (1 << 9)
>>>>>                         >>
>>>>>                         >>      struct drm_amdgpu_gem_va {
>>>>>                         >>             /** GEM object handle */
>>>>>                         >>     --
>>>>>                         >>     2.25.1
>>>>>                         >>
>>>>>                         >
>>>>>
>>>>
>>>
>>
>

[-- Attachment #2: Type: text/html, Size: 29903 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-05-13 11:26                   ` Christian König
@ 2022-07-08 14:58                     ` Marek Olšák
  2022-07-11 10:15                       ` Christian König
  0 siblings, 1 reply; 32+ messages in thread
From: Marek Olšák @ 2022-07-08 14:58 UTC (permalink / raw)
  To: Christian König; +Cc: Pierre-Eric Pelloux-Prayer, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 5955 bytes --]

Christian, should we set this flag for GDS too? Will it help with GDS OOM
failures?

Marek

On Fri., May 13, 2022, 07:26 Christian König, <
ckoenig.leichtzumerken@gmail.com> wrote:

> Exactly that's what we can't do.
>
> See the kernel must always be able to move things to GTT or discard. So
> when you want to guarantee that something is in VRAM you must at the
> same time say you can discard it if it can't.
>
> Christian.
>
> Am 13.05.22 um 10:43 schrieb Pierre-Eric Pelloux-Prayer:
> > Hi Marek, Christian,
> >
> > If the main feature for Mesa of AMDGPU_GEM_CREATE_DISCARDABLE is
> > getting the best placement, maybe we should have 2 separate flags:
> >   * AMDGPU_GEM_CREATE_DISCARDABLE: indicates to the kernel that it can
> > discards the content on eviction instead of preserving it
> >   * AMDGPU_GEM_CREATE_FORCE_BEST_PLACEMENT (or
> > AMDGPU_GEM_CREATE_NO_GTT_FALLBACK ? or AMDGPU_CREATE_GEM_AVOID_GTT?):
> > tells the kernel that this bo really needs to be in VRAM
> >
> >
> > Pierre-Eric
> >
> > On 13/05/2022 00:17, Marek Olšák wrote:
> >> Would it be better to set the VM_ALWAYS_VALID flag to have a greater
> >> guarantee that the best placement will be chosen?
> >>
> >> See, the main feature is getting the best placement, not being
> >> discardable. The best placement is a hw design requirement due to
> >> using memory for uses that are expected to have performance similar
> >> to onchip SRAMs. We need to make sure the best placement is
> >> guaranteed if it's VRAM.
> >>
> >> Marek
> >>
> >> On Thu., May 12, 2022, 03:26 Christian König,
> >> <ckoenig.leichtzumerken@gmail.com
> >> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
> >>
> >>     Am 12.05.22 um 00:06 schrieb Marek Olšák:
> >>>     3rd question: Is it worth using this on APUs?
> >>
> >>     It makes memory management somewhat easier when we are really OOM.
> >>
> >>     E.g. it should also work for GTT allocations and when the core
> >> kernel says "Hey please free something up or I will start the
> >> OOM-killer" it's something we can easily throw away.
> >>
> >>     Not sure how many of those buffers we have, but marking
> >> everything which is temporary with that flag is probably a good idea.
> >>
> >>>
> >>>     Thanks,
> >>>     Marek
> >>>
> >>>     On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com
> >>> <mailto:maraeo@gmail.com>> wrote:
> >>>
> >>>         Will the kernel keep all discardable buffers in VRAM if VRAM
> >>> is not overcommitted by discardable buffers, or will other buffers
> >>> also affect the placement of discardable buffers?
> >>>
> >>
> >>     Regarding the eviction pressure the buffers will be handled like
> >> any other buffer, but instead of preserving the content it is just
> >> discarded on eviction.
> >>
> >>>
> >>>         Do evictions deallocate the buffer, or do they keep an
> >>> allocation in GTT and only the copy is skipped?
> >>>
> >>
> >>     It really deallocates the backing store of the buffer, just keeps
> >> a dummy page array around where all entries are NULL.
> >>
> >>     There is a patch set on the mailing list to make this a little
> >> bit more efficient, but even using the dummy page array should only
> >> have a few bytes overhead.
> >>
> >>     Regards,
> >>     Christian.
> >>
> >>>
> >>>         Thanks,
> >>>         Marek
> >>>
> >>>         On Wed, May 11, 2022 at 3:08 AM Marek Olšák
> >>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
> >>>
> >>>             OK that sounds good.
> >>>
> >>>             Marek
> >>>
> >>>             On Wed, May 11, 2022 at 2:04 AM Christian König
> >>> <ckoenig.leichtzumerken@gmail.com
> >>> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
> >>>
> >>>                 Hi Marek,
> >>>
> >>>                 Am 10.05.22 um 22:43 schrieb Marek Olšák:
> >>>>                 A better flag name would be:
> >>>>                 AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
> >>>
> >>>                 A bit long for my taste and I think the best
> >>> placement is just a side effect.
> >>>
> >>>>
> >>>>                 Marek
> >>>>
> >>>>                 On Tue, May 10, 2022 at 4:13 PM Marek Olšák
> >>>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
> >>>>
> >>>>                     Does this really guarantee VRAM placement? The
> >>>> code doesn't say anything about that.
> >>>>
> >>>
> >>>                 Yes, see the code here:
> >>>
> >>>>
> >>>>                         diff --git
> >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>                         index 8b7ee1142d9a..1944ef37a61e 100644
> >>>>                         ---
> >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>                         +++
> >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> >>>>                         @@ -567,6 +567,7 @@ int
> >>>> amdgpu_bo_create(struct amdgpu_device *adev,
> >>>>                                         bp->domain;
> >>>>                                 bo->allowed_domains =
> >>>> bo->preferred_domains;
> >>>>                                 if (bp->type != ttm_bo_type_kernel &&
> >>>>                         +           !(bp->flags &
> >>>> AMDGPU_GEM_CREATE_DISCARDABLE) &&
> >>>>                                     bo->allowed_domains ==
> >>>> AMDGPU_GEM_DOMAIN_VRAM)
> >>>> bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
> >>>>
> >>>
> >>>                 The only case where this could be circumvented is
> >>> when you try to allocate more than physically available on an APU.
> >>>
> >>>                 E.g. you only have something like 32 MiB VRAM and
> >>> request 64 MiB, then the GEM code will catch the error and fallback
> >>> to GTT (IIRC).
> >>>
> >>>                 Regards,
> >>>                 Christian.
> >>>
> >>
>
>

[-- Attachment #2: Type: text/html, Size: 9238 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE
  2022-07-08 14:58                     ` Marek Olšák
@ 2022-07-11 10:15                       ` Christian König
  0 siblings, 0 replies; 32+ messages in thread
From: Christian König @ 2022-07-11 10:15 UTC (permalink / raw)
  To: Marek Olšák; +Cc: Pierre-Eric Pelloux-Prayer, amd-gfx mailing list

[-- Attachment #1: Type: text/plain, Size: 7124 bytes --]

That would be redundant. GDS handling has always worked in the way that 
the storage is thrown away after an IB.

My LRU patch set should have helped with GDS out of memory errors, but 
I'm not sure how far along we are with rebasing amd-staging-drm-next.

Christian.

Am 08.07.22 um 16:58 schrieb Marek Olšák:
> Christian, should we set this flag for GDS too? Will it help with GDS 
> OOM failures?
>
> Marek
>
> On Fri., May 13, 2022, 07:26 Christian König, 
> <ckoenig.leichtzumerken@gmail.com> wrote:
>
>     Exactly that's what we can't do.
>
>     See the kernel must always be able to move things to GTT or
>     discard. So
>     when you want to guarantee that something is in VRAM you must at the
>     same time say you can discard it if it can't.
>
>     Christian.
>
>     Am 13.05.22 um 10:43 schrieb Pierre-Eric Pelloux-Prayer:
>     > Hi Marek, Christian,
>     >
>     > If the main feature for Mesa of AMDGPU_GEM_CREATE_DISCARDABLE is
>     > getting the best placement, maybe we should have 2 separate flags:
>     >   * AMDGPU_GEM_CREATE_DISCARDABLE: indicates to the kernel that
>     it can
>     > discards the content on eviction instead of preserving it
>     >   * AMDGPU_GEM_CREATE_FORCE_BEST_PLACEMENT (or
>     > AMDGPU_GEM_CREATE_NO_GTT_FALLBACK ? or
>     AMDGPU_CREATE_GEM_AVOID_GTT?):
>     > tells the kernel that this bo really needs to be in VRAM
>     >
>     >
>     > Pierre-Eric
>     >
>     > On 13/05/2022 00:17, Marek Olšák wrote:
>     >> Would it be better to set the VM_ALWAYS_VALID flag to have a
>     greater
>     >> guarantee that the best placement will be chosen?
>     >>
>     >> See, the main feature is getting the best placement, not being
>     >> discardable. The best placement is a hw design requirement due to
>     >> using memory for uses that are expected to have performance
>     similar
>     >> to onchip SRAMs. We need to make sure the best placement is
>     >> guaranteed if it's VRAM.
>     >>
>     >> Marek
>     >>
>     >> On Thu., May 12, 2022, 03:26 Christian König,
>     >> <ckoenig.leichtzumerken@gmail.com
>     >> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>     >>
>     >>     Am 12.05.22 um 00:06 schrieb Marek Olšák:
>     >>>     3rd question: Is it worth using this on APUs?
>     >>
>     >>     It makes memory management somewhat easier when we are
>     really OOM.
>     >>
>     >>     E.g. it should also work for GTT allocations and when the core
>     >> kernel says "Hey please free something up or I will start the
>     >> OOM-killer" it's something we can easily throw away.
>     >>
>     >>     Not sure how many of those buffers we have, but marking
>     >> everything which is temporary with that flag is probably a good
>     idea.
>     >>
>     >>>
>     >>>     Thanks,
>     >>>     Marek
>     >>>
>     >>>     On Wed, May 11, 2022 at 5:58 PM Marek Olšák <maraeo@gmail.com
>     >>> <mailto:maraeo@gmail.com>> wrote:
>     >>>
>     >>>         Will the kernel keep all discardable buffers in VRAM
>     if VRAM
>     >>> is not overcommitted by discardable buffers, or will other
>     buffers
>     >>> also affect the placement of discardable buffers?
>     >>>
>     >>
>     >>     Regarding the eviction pressure the buffers will be handled
>     like
>     >> any other buffer, but instead of preserving the content it is just
>     >> discarded on eviction.
>     >>
>     >>>
>     >>>         Do evictions deallocate the buffer, or do they keep an
>     >>> allocation in GTT and only the copy is skipped?
>     >>>
>     >>
>     >>     It really deallocates the backing store of the buffer, just
>     keeps
>     >> a dummy page array around where all entries are NULL.
>     >>
>     >>     There is a patch set on the mailing list to make this a little
>     >> bit more efficient, but even using the dummy page array should
>     only
>     >> have a few bytes overhead.
>     >>
>     >>     Regards,
>     >>     Christian.
>     >>
>     >>>
>     >>>         Thanks,
>     >>>         Marek
>     >>>
>     >>>         On Wed, May 11, 2022 at 3:08 AM Marek Olšák
>     >>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
>     >>>
>     >>>             OK that sounds good.
>     >>>
>     >>>             Marek
>     >>>
>     >>>             On Wed, May 11, 2022 at 2:04 AM Christian König
>     >>> <ckoenig.leichtzumerken@gmail.com
>     >>> <mailto:ckoenig.leichtzumerken@gmail.com>> wrote:
>     >>>
>     >>>                 Hi Marek,
>     >>>
>     >>>                 Am 10.05.22 um 22:43 schrieb Marek Olšák:
>     >>>>                 A better flag name would be:
>     >>>> AMDGPU_GEM_CREATE_BEST_PLACEMENT_OR_DISCARD
>     >>>
>     >>>                 A bit long for my taste and I think the best
>     >>> placement is just a side effect.
>     >>>
>     >>>>
>     >>>>                 Marek
>     >>>>
>     >>>>                 On Tue, May 10, 2022 at 4:13 PM Marek Olšák
>     >>>> <maraeo@gmail.com <mailto:maraeo@gmail.com>> wrote:
>     >>>>
>     >>>>                     Does this really guarantee VRAM
>     placement? The
>     >>>> code doesn't say anything about that.
>     >>>>
>     >>>
>     >>>                 Yes, see the code here:
>     >>>
>     >>>>
>     >>>>                         diff --git
>     >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     >>>>                         index 8b7ee1142d9a..1944ef37a61e 100644
>     >>>>                         ---
>     >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     >>>>                         +++
>     >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>     >>>>                         @@ -567,6 +567,7 @@ int
>     >>>> amdgpu_bo_create(struct amdgpu_device *adev,
>     >>>> bp->domain;
>     >>>> bo->allowed_domains =
>     >>>> bo->preferred_domains;
>     >>>>                                 if (bp->type !=
>     ttm_bo_type_kernel &&
>     >>>>                         +  !(bp->flags &
>     >>>> AMDGPU_GEM_CREATE_DISCARDABLE) &&
>     >>>> bo->allowed_domains ==
>     >>>> AMDGPU_GEM_DOMAIN_VRAM)
>     >>>> bo->allowed_domains |= AMDGPU_GEM_DOMAIN_GTT;
>     >>>>
>     >>>
>     >>>                 The only case where this could be circumvented is
>     >>> when you try to allocate more than physically available on an APU.
>     >>>
>     >>>                 E.g. you only have something like 32 MiB VRAM and
>     >>> request 64 MiB, then the GEM code will catch the error and
>     fallback
>     >>> to GTT (IIRC).
>     >>>
>     >>>                 Regards,
>     >>>                 Christian.
>     >>>
>     >>
>

[-- Attachment #2: Type: text/html, Size: 12877 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2022-07-11 10:15 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-06 11:23 [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Christian König
2022-05-06 11:23 ` [PATCH 2/3] drm/amdgpu: add AMDGPU_VM_NOALLOC Christian König
2022-05-10 21:21   ` Marek Olšák
2022-05-11  6:06     ` Christian König
2022-05-11  6:15       ` Lazar, Lijo
2022-05-11  7:22         ` Marek Olšák
2022-05-11  7:43           ` Christian König
2022-05-11 18:55             ` Marek Olšák
2022-05-16 11:06               ` Marek Olšák
2022-05-16 11:10                 ` Marek Olšák
2022-05-16 11:53                   ` Christian König
2022-05-16 12:56                     ` Marek Olšák
2022-05-16 16:13                       ` Christian König
2022-05-17  0:12                         ` Marek Olšák
2022-05-17  6:33                           ` Christian König
2022-05-18  8:28                             ` Christian König
2022-05-06 11:23 ` [PATCH 3/3] drm/amdgpu: bump minor version number Christian König
2022-05-06 13:34   ` Alex Deucher
2022-05-12  2:38     ` Marek Olšák
2022-05-06 15:04 ` [PATCH 1/3] drm/amdgpu: add AMDGPU_GEM_CREATE_DISCARDABLE Felix Kuehling
2022-05-10 20:13 ` Marek Olšák
2022-05-10 20:43   ` Marek Olšák
2022-05-11  6:04     ` Christian König
2022-05-11  7:08       ` Marek Olšák
2022-05-11 21:58         ` Marek Olšák
2022-05-11 22:06           ` Marek Olšák
2022-05-12  7:25             ` Christian König
2022-05-12 22:17               ` Marek Olšák
2022-05-13 11:24                 ` Christian König
     [not found]                 ` <62165c7c-892a-5b35-79dd-b90414ccb5cd@damsy.net>
2022-05-13 11:26                   ` Christian König
2022-07-08 14:58                     ` Marek Olšák
2022-07-11 10:15                       ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).