All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: ML dri-devel <dri-devel@lists.freedesktop.org>
Subject: Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops.
Date: Fri, 3 Jun 2022 14:49:17 +0200	[thread overview]
Message-ID: <6c7e8167-fd72-ef7f-c390-8750c61bc411@amd.com> (raw)
In-Reply-To: <CAP+8YyEMDNR_5=uGf8BEV5DCovr-Z_ZDWS2E7-7zqSFGG7bdKg@mail.gmail.com>

Am 03.06.22 um 14:39 schrieb Bas Nieuwenhuizen:
> On Fri, Jun 3, 2022 at 2:08 PM Christian König <christian.koenig@amd.com> wrote:
>> Am 03.06.22 um 13:07 schrieb Bas Nieuwenhuizen:
>>> On Fri, Jun 3, 2022 at 12:16 PM Christian König
>>> <christian.koenig@amd.com> wrote:
>>>> Am 03.06.22 um 12:08 schrieb Bas Nieuwenhuizen:
>>>>> [SNIP]
>>>>>>> I do have to fix some stuff indeed, especially for the GEM close but
>>>>>>> with that we should be able to keep the same basic approach?
>>>>>> Nope, not even remotely.
>>>>>>
>>>>>> What we need is the following:
>>>>>> 1. Rolling out my drm_exec patch set, so that we can lock buffers as needed.
>>>>>> 2. When we get a VM operation we not only lock the VM page tables, but
>>>>>> also all buffers we potentially need to unmap.
>>>>>> 3. Nuking the freed list in the amdgpu_vm structure by updating freed
>>>>>> areas directly when they are unmapped.
>>>>>> 4. Tracking those updates inside the bo_va structure for the BO+VM
>>>>>> combination.
>>>>>> 5. When the bo_va structure is destroy because of closing the handle
>>>>>> move the last clear operation over to the VM as implicit sync.
>>>>>>
>>>>> Hi Christian, isn't that a different problem though (that we're also
>>>>> trying to solve, but in your series)?
>>>>>
>>>>> What this patch tries to achieve:
>>>>>
>>>>> (t+0) CS submission setting BOOKKEEP fences (i.e. no implicit sync)
>>>>> (t+1) a VM operation on a BO/VM accessed by the CS.
>>>>>
>>>>> to run concurrently. What it *doesn't* try is
>>>>>
>>>>> (t+0) a VM operation on a BO/VM accessed by the CS.
>>>>> (t+1) CS submission setting BOOKKEEP fences (i.e. no implicit sync)
>>>>>
>>>>> to run concurrently. When you write
>>>>>
>>>>>> Only when all this is done we then can resolve the dependency that the
>>>>>> CS currently must wait for any clear operation on the VM.
>>>>> isn't that all about the second problem?
>>>> No, it's the same.
>>>>
>>>> See what we do in the VM code is to artificially insert a bubble so that
>>>> all VM clear operations wait for all CS operations and then use the
>>>> clear fence to indicate when the backing store of the BO can be freed.
>>> Isn't that remediated with something like the code below? At least the
>>> gem_close case should be handled with this, and the move case was
>>> already handled by the copy operation.
>> That is one necessary puzzle piece, yes. But you need more than that.
>>
>> Especially the explicit unmap operation needs to be converted into an
>> implicit unmap to get the TLB flush right.
> This doesn't change anything about the TLB flush though? Since all
> unmap -> later jobs dependencies are still implicit.
>
> So the worst what could happen (i.f. e.g. userspace gets the
> waits/dependencies wrong) is
>
> 1) non-implicit CS gets submitted that touches a BO
> 2)  VM unmap on that BO happens
> 2.5) the CS from 1 is still active due to missing dependencies
> 2.6) but any CS submission after 2 will trigger a TLB flush

Yeah, but that's exactly the bubble we try to avoid. Isn't it?

When we want to do a TLB flush the unmap operation must already be 
completed. Otherwise the flush is rather pointless since any access 
could reloads the not yet updated PTEs.

And this means that we need to artificially add a dependency on every 
command submission after 2 to wait until the unmap operation is completed.

Christian.

> 3) A TLB flush happens for a new CS
> 4) All CS submissions here see the TLB flush and hence the unmap
>
> So the main problem would be the CS from step 1, but (a) if that
> VMFaults that is the apps own fault and (b) because we don't free the
> memory until (1) finishes it is not a security issue kernel-wise.
>
>> I think I know all the necessary steps now, it's just tons of work to do.
>>
>> Regards,
>> Christian.
>>
>>>
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -187,6 +187,39 @@ static int amdgpu_gem_object_open(struct
>>> drm_gem_object *obj,
>>>          return 0;
>>> }
>>>
>>> +static void dma_resv_copy(struct dma_resv *src, struct dma_resv *dst)
>>> +{
>>> +       struct dma_resv_iter cursor;
>>> +       struct dma_fence *f;
>>> +       int r;
>>> +       unsigned num_fences = 0;
>>> +
>>> +       if (src == dst)
>>> +               return;
>>> +
>>> +       /* We assume the later loops get the same fences as the caller should
>>> +        * lock the resv. */
>>> +       dma_resv_for_each_fence(&cursor, src, DMA_RESV_USAGE_BOOKKEEP, f) {
>>> +               ++num_fences;
>>> +               dma_fence_put(f);
>>> +       }
>>> +
>>> +       r = dma_resv_reserve_fences(dst, num_fences);
>>> +       if (r) {
>>> +               /* As last resort on OOM we block for the fence */
>>> +               dma_resv_for_each_fence(&cursor, src,
>>> DMA_RESV_USAGE_BOOKKEEP, f) {
>>> +                       dma_fence_wait(f, false);
>>> +                       dma_fence_put(f);
>>> +               }
>>> +       }
>>> +
>>> +       dma_resv_for_each_fence(&cursor, src, DMA_RESV_USAGE_BOOKKEEP, f) {
>>> +               dma_resv_add_fence(dst, f, dma_resv_iter_usage(&cursor));
>>> +               dma_fence_put(f);
>>> +       }
>>> +}
>>> +
>>> +
>>> static void amdgpu_gem_object_close(struct drm_gem_object *obj,
>>>                                      struct drm_file *file_priv)
>>> {
>>> @@ -233,6 +266,8 @@ static void amdgpu_gem_object_close(struct
>>> drm_gem_object *obj,
>>>          amdgpu_bo_fence(bo, fence, true);
>>>          dma_fence_put(fence);
>>>
>>> +       dma_resv_copy(vm->root.bo->tbo.base.resv, bo->tbo.base.resv);
>>> +
>>> out_unlock:
>>>          if (unlikely(r < 0))
>>>                  dev_err(adev->dev, "failed to clear page "
>>>
>>>> When you want to remove this bubble (which is certainly a good idea) you
>>>> need to first come up with a different approach to handle the clear
>>>> operations.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>


  reply	other threads:[~2022-06-03 12:49 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01  0:40 [RFC PATCH 0/5] Add option to disable implicit sync for userspace submits Bas Nieuwenhuizen
2022-06-01  0:40 ` [RFC PATCH 1/5] drm/ttm: Refactor num_shared into usage Bas Nieuwenhuizen
2022-06-01  8:02   ` Christian König
2022-06-01  8:11     ` Bas Nieuwenhuizen
2022-06-01  8:29       ` Christian König
2022-06-01  8:39         ` Bas Nieuwenhuizen
2022-06-01  8:42           ` Christian König
2022-06-01  8:41     ` Daniel Vetter
2022-06-01  8:47       ` Christian König
2022-06-01  0:40 ` [RFC PATCH 2/5] drm/amdgpu: Add separate mode for syncing DMA_RESV_USAGE_BOOKKEEP Bas Nieuwenhuizen
2022-06-01  0:40 ` [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops Bas Nieuwenhuizen
2022-06-01  8:03   ` Christian König
2022-06-01  8:16     ` Bas Nieuwenhuizen
2022-06-01  8:40       ` Christian König
2022-06-01  8:48         ` Bas Nieuwenhuizen
2022-06-01  8:59           ` Bas Nieuwenhuizen
2022-06-01  9:01           ` Christian König
2022-06-03  1:21             ` Bas Nieuwenhuizen
2022-06-03  8:11               ` Christian König
2022-06-03 10:08                 ` Bas Nieuwenhuizen
2022-06-03 10:16                   ` Christian König
2022-06-03 11:07                     ` Bas Nieuwenhuizen
2022-06-03 12:08                       ` Christian König
2022-06-03 12:39                         ` Bas Nieuwenhuizen
2022-06-03 12:49                           ` Christian König [this message]
2022-06-03 13:23                             ` Bas Nieuwenhuizen
2022-06-03 17:41                               ` Christian König
2022-06-03 17:50                                 ` Bas Nieuwenhuizen
2022-06-03 18:41                                   ` Christian König
2022-06-03 19:11                                     ` Bas Nieuwenhuizen
2022-06-06 10:15                                       ` Christian König
2022-06-06 10:30                                         ` Bas Nieuwenhuizen
2022-06-06 10:35                                           ` Christian König
2022-06-06 11:00                                             ` Bas Nieuwenhuizen
2022-06-15  0:40                                               ` Bas Nieuwenhuizen
2022-06-15  7:00                                                 ` Christian König
2022-06-15  7:00                                               ` Christian König
2022-06-17 13:03                                                 ` Bas Nieuwenhuizen
2022-06-17 13:08                                                   ` Christian König
2022-06-24 20:34                                                     ` Daniel Vetter
2022-06-25 13:58                                                       ` Christian König
2022-06-25 22:45                                                         ` Daniel Vetter
2022-07-04 13:37                                                           ` Christian König
2022-08-09 14:37                                                             ` Daniel Vetter
2022-06-01  0:40 ` [RFC PATCH 4/5] drm/amdgpu: Refactor amdgpu_vm_get_pd_bo Bas Nieuwenhuizen
2022-06-01  0:40 ` [RFC PATCH 5/5] drm/amdgpu: Add option to disable implicit sync for a context Bas Nieuwenhuizen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6c7e8167-fd72-ef7f-c390-8750c61bc411@amd.com \
    --to=christian.koenig@amd.com \
    --cc=bas@basnieuwenhuizen.nl \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.