All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Felix Kuehling <felix.kuehling@amd.com>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
Date: Sat, 10 Dec 2022 15:12:26 +0100	[thread overview]
Message-ID: <c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com> (raw)
In-Reply-To: <5ad09c47-1f50-07ce-7b8b-f8e4195f2256@amd.com>

Am 10.12.22 um 07:15 schrieb Felix Kuehling:
> On 2022-11-25 05:21, Christian König wrote:
>> We already fallback to a dummy BO with no backing store when we
>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>
>> Drop all those workarounds and generalize this for GTT as well. This
>> fixes ENOMEM issues with runaway applications which try to allocate/free
>> GTT in a loop and are otherwise only limited by the CPU speed.
>>
>> The CS will wait for the cleanup of freed up BOs to satisfy the
>> various domain specific limits and so effectively throttle those
>> buggy applications down to a sane allocation behavior again.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>
> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
> sees a huge VRAM allocation slow-down. And 
> KFDMemoryTest.LargestVramBufferTest can only allocate half the 
> available memory.

Mhm, I wasn't expecting that we use this for the KFD as well.

>
> This seems to be caused by initially validating VRAM BOs in the CPU 
> domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
> domain involves a copy from GTT to VRAM.

The idea was to initially create the BOs without any backing store.

>
> After that, freeing of BOs can get delayed by the ghost object of a 
> previous migration, which delays calling release notifiers and causes 
> problems for KFDs available memory accounting.
>
> I experimented with a workaround that validates BOs immediately after 
> allocation, but that only moves around the delays and doesn't solve 
> the problem. During those experiments I may also have stumbled over a 
> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move 
> before initializing and locking fbo->base.base._resv. This results in 
> a flood of warnings because ttm_bo_set_bulk_move expects the 
> reservation to be locked.
>
> Right now I'd like to remove the bp.domain = initial_domain | 
> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.

Yeah, let's revert and investigate this first.

Thanks,
Christian.

>
> Regards,
>   Felix
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a0780a4e3e61..62e98f1ad770 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
>> *adev, unsigned long size,
>>       bp.resv = resv;
>>       bp.preferred_domain = initial_domain;
>>       bp.flags = flags;
>> -    bp.domain = initial_domain;
>> +    bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device 
>> *dev, void *data,
>>       }
>>         initial_domain = (u32)(0xffffffff & args->in.domains);
>> -retry:
>>       r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>> -                     initial_domain,
>> -                     flags, ttm_bo_type_device, resv, &gobj);
>> +                     initial_domain, flags, ttm_bo_type_device,
>> +                     resv, &gobj);
>>       if (r && r != -ERESTARTSYS) {
>> -        if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>> -            flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> -            goto retry;
>> -        }
>> -
>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -            goto retry;
>> -        }
>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
>> %d)\n",
>>                   size, initial_domain, args->in.alignment, r);
>>       }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 974e85d8b6cc..919bbea2e3ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>           bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>         bo->tbo.bdev = &adev->mman.bdev;
>> -    if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>> -              AMDGPU_GEM_DOMAIN_GDS))
>> -        amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> -    else
>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +    amdgpu_bo_placement_from_domain(bo, bp->domain);
>>       if (bp->type == ttm_bo_type_kernel)
>>           bo->tbo.priority = 1;


WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Felix Kuehling <felix.kuehling@amd.com>,
	Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation
Date: Sat, 10 Dec 2022 15:12:26 +0100	[thread overview]
Message-ID: <c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com> (raw)
In-Reply-To: <5ad09c47-1f50-07ce-7b8b-f8e4195f2256@amd.com>

Am 10.12.22 um 07:15 schrieb Felix Kuehling:
> On 2022-11-25 05:21, Christian König wrote:
>> We already fallback to a dummy BO with no backing store when we
>> allocate GDS,GWS and OA resources and to GTT when we allocate VRAM.
>>
>> Drop all those workarounds and generalize this for GTT as well. This
>> fixes ENOMEM issues with runaway applications which try to allocate/free
>> GTT in a loop and are otherwise only limited by the CPU speed.
>>
>> The CS will wait for the cleanup of freed up BOs to satisfy the
>> various domain specific limits and so effectively throttle those
>> buggy applications down to a sane allocation behavior again.
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>
> This patch causes some regressions in KFDTest. KFDMemoryTest.MMBench 
> sees a huge VRAM allocation slow-down. And 
> KFDMemoryTest.LargestVramBufferTest can only allocate half the 
> available memory.

Mhm, I wasn't expecting that we use this for the KFD as well.

>
> This seems to be caused by initially validating VRAM BOs in the CPU 
> domain, which allocates a ttm_tt. A subsequent validation in the VRAM 
> domain involves a copy from GTT to VRAM.

The idea was to initially create the BOs without any backing store.

>
> After that, freeing of BOs can get delayed by the ghost object of a 
> previous migration, which delays calling release notifiers and causes 
> problems for KFDs available memory accounting.
>
> I experimented with a workaround that validates BOs immediately after 
> allocation, but that only moves around the delays and doesn't solve 
> the problem. During those experiments I may also have stumbled over a 
> bug in ttm_buffer_object_transfer: It calls ttm_bo_set_bulk_move 
> before initializing and locking fbo->base.base._resv. This results in 
> a flood of warnings because ttm_bo_set_bulk_move expects the 
> reservation to be locked.
>
> Right now I'd like to remove the bp.domain = initial_domain | 
> AMDGPU_GEM_DOMAIN_CPU change in amdgpu_gem_object_create to fix this.

Yeah, let's revert and investigate this first.

Thanks,
Christian.

>
> Regards,
>   Felix
>
>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c    | 16 +++-------------
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  6 +-----
>>   2 files changed, 4 insertions(+), 18 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index a0780a4e3e61..62e98f1ad770 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -113,7 +113,7 @@ int amdgpu_gem_object_create(struct amdgpu_device 
>> *adev, unsigned long size,
>>       bp.resv = resv;
>>       bp.preferred_domain = initial_domain;
>>       bp.flags = flags;
>> -    bp.domain = initial_domain;
>> +    bp.domain = initial_domain | AMDGPU_GEM_DOMAIN_CPU;
>>       bp.bo_ptr_size = sizeof(struct amdgpu_bo);
>>         r = amdgpu_bo_create_user(adev, &bp, &ubo);
>> @@ -332,20 +332,10 @@ int amdgpu_gem_create_ioctl(struct drm_device 
>> *dev, void *data,
>>       }
>>         initial_domain = (u32)(0xffffffff & args->in.domains);
>> -retry:
>>       r = amdgpu_gem_object_create(adev, size, args->in.alignment,
>> -                     initial_domain,
>> -                     flags, ttm_bo_type_device, resv, &gobj);
>> +                     initial_domain, flags, ttm_bo_type_device,
>> +                     resv, &gobj);
>>       if (r && r != -ERESTARTSYS) {
>> -        if (flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) {
>> -            flags &= ~AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
>> -            goto retry;
>> -        }
>> -
>> -        if (initial_domain == AMDGPU_GEM_DOMAIN_VRAM) {
>> -            initial_domain |= AMDGPU_GEM_DOMAIN_GTT;
>> -            goto retry;
>> -        }
>>           DRM_DEBUG("Failed to allocate GEM object (%llu, %d, %llu, 
>> %d)\n",
>>                   size, initial_domain, args->in.alignment, r);
>>       }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index 974e85d8b6cc..919bbea2e3ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -581,11 +581,7 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>>           bo->flags |= AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE;
>>         bo->tbo.bdev = &adev->mman.bdev;
>> -    if (bp->domain & (AMDGPU_GEM_DOMAIN_GWS | AMDGPU_GEM_DOMAIN_OA |
>> -              AMDGPU_GEM_DOMAIN_GDS))
>> -        amdgpu_bo_placement_from_domain(bo, AMDGPU_GEM_DOMAIN_CPU);
>> -    else
>> -        amdgpu_bo_placement_from_domain(bo, bp->domain);
>> +    amdgpu_bo_placement_from_domain(bo, bp->domain);
>>       if (bp->type == ttm_bo_type_kernel)
>>           bo->tbo.priority = 1;


  reply	other threads:[~2022-12-10 15:33 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 10:21 [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation Christian König
2022-11-25 10:21 ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 3/9] drm/ttm: use per BO cleanup workers Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-29 21:14   ` Felix Kuehling
2022-11-29 21:14     ` [Intel-gfx] " Felix Kuehling
2022-12-05 13:39     ` Christian König
2022-12-05 13:39       ` [Intel-gfx] " Christian König
2023-06-13 13:05       ` Karol Herbst
2023-06-13 13:05         ` [Intel-gfx] " Karol Herbst
2023-06-13 13:59         ` Christian König
2023-06-13 13:59           ` [Intel-gfx] " Christian König
2023-06-13 14:18           ` Karol Herbst
2023-06-13 14:18             ` [Intel-gfx] " Karol Herbst
2023-06-15 11:19             ` Christian König
2023-06-15 11:19               ` [Intel-gfx] " Christian König
2023-06-15 12:04               ` Karol Herbst
2023-06-15 12:04                 ` [Intel-gfx] " Karol Herbst
2022-11-25 10:21 ` [PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 12:43   ` kernel test robot
2022-11-25 12:43     ` [Intel-gfx] " kernel test robot
2022-11-25 21:19   ` kernel test robot
2022-11-25 21:19     ` [Intel-gfx] " kernel test robot
2022-11-25 10:21 ` [PATCH 5/9] drm/nouveau: stop using ttm_bo_wait Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2023-01-11  9:52   ` [Nouveau] " Christian König
2023-01-18  9:04     ` Christian König
2023-01-18  9:42       ` Christian König
2023-01-18 13:01       ` Karol Herbst
2023-01-18 14:15         ` Christian König
2023-01-18 15:44           ` Danilo Krummrich
2022-11-25 10:21 ` [PATCH 6/9] drm/qxl: " Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-12-15 14:19   ` Christian König
2022-12-15 14:19     ` [Intel-gfx] " Christian König
2022-12-15 20:09     ` Dave Airlie
2022-12-15 20:09       ` Dave Airlie
2022-12-15 20:09       ` [Intel-gfx] " Dave Airlie
2022-11-25 10:21 ` [PATCH 7/9] drm/i915: " Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 11:14   ` Tvrtko Ursulin
2022-11-25 12:46     ` Christian König
2022-11-29 18:05     ` Matthew Auld
2022-11-30 13:02       ` Tvrtko Ursulin
2022-11-30 14:06         ` Daniel Vetter
2022-11-30 14:06           ` Daniel Vetter
2022-12-05 19:58           ` Christian König
2022-12-05 19:58             ` Christian König
2022-12-06 18:03             ` Matthew Auld
2022-12-06 18:03               ` Matthew Auld
2022-12-06 18:06               ` Christian König
2022-12-06 18:06                 ` Christian König
2022-11-25 10:21 ` [PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 11:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/amdgpu: generally allow over-commit during BO allocation Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2022-11-25 11:40 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-25 18:18 ` [PATCH 1/9] " Alex Deucher
2022-11-25 18:18   ` [Intel-gfx] " Alex Deucher
2022-12-05 13:41   ` Christian König
2022-12-05 13:41     ` [Intel-gfx] " Christian König
2022-11-28  6:00 ` Arunpravin Paneer Selvam
2022-11-28  6:00   ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-12-10  6:15 ` Felix Kuehling
2022-12-10  6:15   ` [Intel-gfx] " Felix Kuehling
2022-12-10 14:12   ` Christian König [this message]
2022-12-10 14:12     ` Christian König
2022-12-11  1:13     ` Felix Kuehling
2022-12-11  1:13       ` [Intel-gfx] " Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c9243d99-2a02-2e95-82f6-c70db9a08641@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=felix.kuehling@amd.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.