AMD-GFX Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] drm/amdkfd: set memory limit to avoid OOM with HMM enabled
@ 2021-04-15 21:43 Philip Yang
  2021-04-15 21:51 ` Felix Kuehling
  0 siblings, 1 reply; 2+ messages in thread
From: Philip Yang @ 2021-04-15 21:43 UTC (permalink / raw)
  To: amd-gfx; +Cc: Philip Yang

HMM migration alloc sizeof(struct page) on system memory for each VRAM
page, it is 1GB system memory reserved for 64GB VRAM. To avoid
application OOM, increase system memory used size based on VRAM size of
all GPUs, then application alloc memory will fail if system memory usage
reach the limit.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +++++
 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c         | 8 ++++++++
 3 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index 025e8bade8c8..2cb7f8c30b9f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -272,6 +272,7 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
 void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
 				struct amdgpu_vm *vm);
 void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
+void amdgpu_amdkfd_reserve_system_mem(uint64_t size);
 #else
 static inline
 void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index d70a21ea576d..6ea1039b08a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -105,6 +105,11 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
 		(kfd_mem_limit.max_ttm_mem_limit >> 20));
 }
 
+void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
+{
+	kfd_mem_limit.system_mem_used += size;
+}
+
 /* Estimate page table size needed to represent a given memory size
  *
  * With 4KB pages, we need one 8 byte PTE for each 4KB of memory
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 7d8659517447..1373ce9af890 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -876,6 +876,9 @@ static const struct dev_pagemap_ops svm_migrate_pgmap_ops = {
 	.migrate_to_ram		= svm_migrate_to_ram,
 };
 
+/* Each VRAM page uses sizeof(struct page) on system memory */
+#define SVM_HMM_PAGE_STRUCT_SIZE(size) ((size)/PAGE_SIZE * sizeof(struct page))
+
 int svm_migrate_init(struct amdgpu_device *adev)
 {
 	struct kfd_dev *kfddev = adev->kfd.dev;
@@ -912,6 +915,11 @@ int svm_migrate_init(struct amdgpu_device *adev)
 		return PTR_ERR(r);
 	}
 
+	pr_debug("reserve %ldMB system memory for VRAM pages struct\n",
+		 SVM_HMM_PAGE_STRUCT_SIZE(size) >> 20);
+
+	amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size));
+
 	pr_info("HMM registered %ldMB device memory\n", size >> 20);
 
 	return 0;
-- 
2.17.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] drm/amdkfd: set memory limit to avoid OOM with HMM enabled
  2021-04-15 21:43 [PATCH] drm/amdkfd: set memory limit to avoid OOM with HMM enabled Philip Yang
@ 2021-04-15 21:51 ` Felix Kuehling
  0 siblings, 0 replies; 2+ messages in thread
From: Felix Kuehling @ 2021-04-15 21:51 UTC (permalink / raw)
  To: Philip Yang, amd-gfx

Am 2021-04-15 um 5:43 p.m. schrieb Philip Yang:
> HMM migration alloc sizeof(struct page) on system memory for each VRAM
> page, it is 1GB system memory reserved for 64GB VRAM. To avoid
> application OOM, increase system memory used size based on VRAM size of
> all GPUs, then application alloc memory will fail if system memory usage
> reach the limit.
>
> Signed-off-by: Philip Yang <Philip.Yang@amd.com>
> Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks. I'll apply this to amd-staging-drm-next together with the big
HMM patch series.

Regards,
  Felix


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h       | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 5 +++++
>  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c         | 8 ++++++++
>  3 files changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> index 025e8bade8c8..2cb7f8c30b9f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> @@ -272,6 +272,7 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void);
>  void amdgpu_amdkfd_gpuvm_destroy_cb(struct amdgpu_device *adev,
>  				struct amdgpu_vm *vm);
>  void amdgpu_amdkfd_unreserve_memory_limit(struct amdgpu_bo *bo);
> +void amdgpu_amdkfd_reserve_system_mem(uint64_t size);
>  #else
>  static inline
>  void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> index d70a21ea576d..6ea1039b08a0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
> @@ -105,6 +105,11 @@ void amdgpu_amdkfd_gpuvm_init_mem_limits(void)
>  		(kfd_mem_limit.max_ttm_mem_limit >> 20));
>  }
>  
> +void amdgpu_amdkfd_reserve_system_mem(uint64_t size)
> +{
> +	kfd_mem_limit.system_mem_used += size;
> +}
> +
>  /* Estimate page table size needed to represent a given memory size
>   *
>   * With 4KB pages, we need one 8 byte PTE for each 4KB of memory
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> index 7d8659517447..1373ce9af890 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
> @@ -876,6 +876,9 @@ static const struct dev_pagemap_ops svm_migrate_pgmap_ops = {
>  	.migrate_to_ram		= svm_migrate_to_ram,
>  };
>  
> +/* Each VRAM page uses sizeof(struct page) on system memory */
> +#define SVM_HMM_PAGE_STRUCT_SIZE(size) ((size)/PAGE_SIZE * sizeof(struct page))
> +
>  int svm_migrate_init(struct amdgpu_device *adev)
>  {
>  	struct kfd_dev *kfddev = adev->kfd.dev;
> @@ -912,6 +915,11 @@ int svm_migrate_init(struct amdgpu_device *adev)
>  		return PTR_ERR(r);
>  	}
>  
> +	pr_debug("reserve %ldMB system memory for VRAM pages struct\n",
> +		 SVM_HMM_PAGE_STRUCT_SIZE(size) >> 20);
> +
> +	amdgpu_amdkfd_reserve_system_mem(SVM_HMM_PAGE_STRUCT_SIZE(size));
> +
>  	pr_info("HMM registered %ldMB device memory\n", size >> 20);
>  
>  	return 0;
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, back to index

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 21:43 [PATCH] drm/amdkfd: set memory limit to avoid OOM with HMM enabled Philip Yang
2021-04-15 21:51 ` Felix Kuehling

AMD-GFX Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/amd-gfx/0 amd-gfx/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 amd-gfx amd-gfx/ https://lore.kernel.org/amd-gfx \
		amd-gfx@lists.freedesktop.org
	public-inbox-index amd-gfx

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.freedesktop.lists.amd-gfx


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git