Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

From: Felix Kuehling <felix.kuehling@amd.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs
Date: Wed, 28 Apr 2021 13:02:31 -0400	[thread overview]
Message-ID: <55742179-98d9-d68a-30b7-331885fd91e0@amd.com> (raw)
In-Reply-To: <6946e644-0a16-30fe-e987-861bec610762@gmail.com>

Am 2021-04-28 um 12:58 p.m. schrieb Christian König:
> Am 28.04.21 um 18:49 schrieb Felix Kuehling:
>> Am 2021-04-28 um 12:33 p.m. schrieb Christian König:
>>> Am 28.04.21 um 17:19 schrieb Felix Kuehling:
>>> [SNIP]
>>>>>> Failing that, I'd probably have to abandon userptr BOs altogether
>>>>>> and
>>>>>> switch system memory mappings over to using the new SVM API on
>>>>>> systems
>>>>>> where it is avaliable.
>>>>> Well as long as that provides the necessary functionality through HMM
>>>>> it would be an option.
>>>> Just another way of circumventing "It should limit the amount of
>>>> system
>>>> memory the GPU can access at the same time," a premise I disagree with
>>>> in case of userptrs and HMM. Both use pageable, unpinned memory.
>>>> Both can cause the GPU to be preempted in case of MMU interval
>>>> notifiers.
>>> Well that's the key point. GFX userptrs and DMA-buf imports can't be
>>> preempted.
>> But they don't need to be. They don't use any resources on the importing
>> GPU or system memory, so why do we limit them?
>
> Yeah, but at least user pointer effectively pin their backing store as
> long as the GPU operation is running.
>
>> With dynamic attachment, the exported BOs can be evicted and that
>> affects the imports as well. I don't see why the import needs to be
>> evicted as if there was some resource limitation on the importing GPU.
>
> It prevents that multiple DMA-buf imports are active at the same time.
>
> See the following example: GTT space is 1GiB and we have two DMA-buf
> imports of 600MiB each.
>
> When userspace wants to submit work using both at the same time we
> return -ENOSPC (or -ENOMEM, not 100% sure).
>
> When one is in use and a submission made with the other we block until
> that submission is completed.
>
> This way there is never more than 1 GiB of memory in use or "pinned"
> by the GPU using it.

Is this reasonable for imports of VRAM in a multi GPU system? E.g. you
allocate 600 MB on GPU A and 600 MB on GPU B. You export both and import
them on the other GPU because you want both GPUs to access each other's
memory. This is a common use case for KFD, and something we want to
implement for upstreamable PCIe P2P support.

With your limitation, I will lever be able to validate both BOs and run
KFD user mode queues in the above scenario.

Regards,
  Felix

>
>>> So they basically lock the backing memory until the last submission is
>>> completed and that is causing problems if it happens for to much
>>> memory at the same time.
>>>
>>> What we could do is to figure out in the valuable callback if the BO
>>> is preempt-able or not.
>> Then we should also not count them in mgr->available. Otherwise not
>> evicting these BOs can block other GTT allocations. Again, maybe it's
>> easier to use a different domain for preemptible BOs.
>
> Good point. That would also be valuable when we get user queues at
> some point.
>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> Regards,
>>> Christian.
>>>
>>>> Statically limiting the amount of pageable memory accessible to GTT is
>>>> redundant and overly limiting.
>>>>
>>>> Regards,
>>>>     Felix
>>>>
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> Regards,
>>>>>>      Felix
>>>>>>
>>>>>>
>>>>>>> Christian.
>>>>>>>
>>>>>>>> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
>>>>>>>> ---
>>>>>>>>      drivers/gpu/drm/ttm/ttm_bo.c | 4 ++++
>>>>>>>>      1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> index de1ec838cf8b..0b953654fdbf 100644
>>>>>>>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>>>>>>>> @@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device
>>>>>>>> *bdev,
>>>>>>>>              list_for_each_entry(bo, &man->lru[i], lru) {
>>>>>>>>                  bool busy;
>>>>>>>>      +            /* Don't evict SG BOs */
>>>>>>>> +            if (bo->ttm && bo->ttm->sg)
>>>>>>>> +                continue;
>>>>>>>> +
>>>>>>>>                  if (!ttm_bo_evict_swapout_allowable(bo, ctx,
>>>>>>>> &locked,
>>>>>>>>                                      &busy)) {
>>>>>>>>                      if (busy && !busy_bo && ticket !=
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel