From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 6B5CCC433F5 for ; Wed, 1 Jun 2022 08:59:26 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B611910E1D3; Wed, 1 Jun 2022 08:59:25 +0000 (UTC) Received: from mail-yb1-xb35.google.com (mail-yb1-xb35.google.com [IPv6:2607:f8b0:4864:20::b35]) by gabe.freedesktop.org (Postfix) with ESMTPS id 351B410E092 for ; Wed, 1 Jun 2022 08:59:25 +0000 (UTC) Received: by mail-yb1-xb35.google.com with SMTP id f34so1736210ybj.6 for ; Wed, 01 Jun 2022 01:59:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=basnieuwenhuizen.nl; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=niJHAkU1vNEQQEduQzg9ScF8FBt/3ulFhyJs7r+jqTY=; b=g8eQgKFfHy0B6R9nViTKdPwZIDIT+BKAlJYiA7Ja8rkZ+Gb1nECntmVAydwARHkU9z X/EzrCVdIWAVdsKP4xZiriO8Fn9pwHh1OvK62LEsjHPQrjtU5Rbc75pVHDUYlx5shddW vzUobMtc0MCiS0AeHjy5pROK3mJxtupXzbK8GAtEKgEtX1B62PoX21z2ld9t6POFcKNX coFQKwUW/O1uFUb/Mn70z6xNndKoZMuRFPUhsjw+iN6UwyOEZb1WvVFNj25FdTn9foHC WbVlF8vHK+FlLAKUnXJPfx3CRNdTkFZirzpl0ijovnJ3FPCoforZAt46afxyeFU41FcN WDGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=niJHAkU1vNEQQEduQzg9ScF8FBt/3ulFhyJs7r+jqTY=; b=q6YAGbCULAIAvS5OSWIwwpq24kwqMuHrfhNnh5AaFsP9vuhOjR+/tlz5uiJbQDR/Jb evhAE5m/sMV2NO7oI7FL9BkO5CPs1zZnmV+o7LlHDjxX2ac3oGIQ9RIc4/GvmSExvatr c2b2KzXS4f/x+w0NTyxEvpk3v09vsQ4QEXHhWIjZYoeL79SY8swAADX+6XB9g+TvEffs U1as1THWcQEE1YrsguFsy0Ys4/p/uWmssXs/l2x58+uqlk/fkJ7FkBumk+8JjYYzpqSG Jvj1L6m9qGg86M4tQCm61TNH3OuPTEJqxyJS30UuFzfzJ3sB+1tEhmPR0Mi4nk/aeZIV GizQ== X-Gm-Message-State: AOAM530LDH9JztJpj4sGyMuwG2nzX2AxppBeRlxijSWsXKiWPR27As56 2OxmMHa7MHqNrSVxnrGysXyE8UpTao9QJjwTfJtfTg== X-Google-Smtp-Source: ABdhPJz8tB6d4Fa7GYLH2czZiXFhCnuw6JM66CBC2g0uT/EP0IegfzR1pQHThOQbziFJABimx/Z83KvyB6+97BWpymI= X-Received: by 2002:a25:d044:0:b0:65c:b987:d884 with SMTP id h65-20020a25d044000000b0065cb987d884mr17729926ybg.399.1654073964390; Wed, 01 Jun 2022 01:59:24 -0700 (PDT) MIME-Version: 1.0 References: <20220601004014.158247-1-bas@basnieuwenhuizen.nl> <20220601004014.158247-4-bas@basnieuwenhuizen.nl> In-Reply-To: From: Bas Nieuwenhuizen Date: Wed, 1 Jun 2022 10:59:13 +0200 Message-ID: Subject: Re: [RFC PATCH 3/5] drm/amdgpu: Allow explicit sync for VM ops. To: =?UTF-8?Q?Christian_K=C3=B6nig?= Content-Type: multipart/alternative; boundary="000000000000147d4405e05f1b94" X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: ML dri-devel Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" --000000000000147d4405e05f1b94 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Jun 1, 2022, 10:48 Bas Nieuwenhuizen wrote: > On Wed, Jun 1, 2022 at 10:40 AM Christian K=C3=B6nig > wrote: > > > > Am 01.06.22 um 10:16 schrieb Bas Nieuwenhuizen: > > > On Wed, Jun 1, 2022 at 10:03 AM Christian K=C3=B6nig > > > wrote: > > >> Am 01.06.22 um 02:40 schrieb Bas Nieuwenhuizen: > > >>> This should be okay because moves themselves use KERNEL usage and > > >>> hence still sync with BOOKKEEP usage. Then any later submits still > > >>> wait on any pending VM operations. > > >>> > > >>> (i.e. we only made VM ops not wait on BOOKKEEP submits, not the oth= er > > >>> way around) > > >> Well NAK again. This allows access to freed up memory and is a > complete > > >> no-go. > > > How does this allow access to freed memory? Worst I can see is that > > > the unmap happens earlier if the app/drivers gets the waits wrong, > > > which wouldn't give access after the underlying BO is freed? > > > > To free up memory we need to update the PTEs and then flush those out b= y > > invalidating the TLB. > > > > On gfx6, gfx7 and gfx8 and some broken gfx10 hw invalidating the TLB ca= n > > only be done while the VMID is idle. > > > > Only gfx9 can reliable invalidate the TLB while it is in use and even > > there it comes with quite some performance penalty (at TLB invalidation > > can take multiple seconds). > > > > Because of this what we do in the kernel driver is to sync to everythin= g > > when we unmap entries: > > > > if (!(flags & AMDGPU_PTE_VALID)) > > sync_mode =3D AMDGPU_SYNC_EQ_OWNER; > > else > > sync_mode =3D AMDGPU_SYNC_EXPLICIT; > > > > This acts as a barrier for freeing the memory. In other words we > > intentionally add a bubble which syncs everything. > > > > I'm working for month on a concept how to do all this without causing > > the stalls you observer, but so far didn't came to much of a conclusion= . > > That might cause an unmap operation too early, but for freeing up the > actual backing memory we still wait for all fences on the BO to finish > first, no? In that case, since BOOKKEEP fences are still added for > explicit sync, that should not be a problem, no? > > (If not, that sounds like the obvious fix for making this work?) > As an aside this is the same hole/issue as when an app forgets a bo in the bo list on submission. > > > > Regards, > > Christian. > > > > > > > >> Regards, > > >> Christian. > > >> > > >>> Signed-off-by: Bas Nieuwenhuizen > > >>> --- > > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c | 2 +- > > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c | 2 +- > > >>> 2 files changed, 2 insertions(+), 2 deletions(-) > > >>> > > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c > > >>> index f10332e1c6c0..31bc73fd1fae 100644 > > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c > > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c > > >>> @@ -51,7 +51,7 @@ static int amdgpu_vm_cpu_prepare(struct > amdgpu_vm_update_params *p, > > >>> if (!resv) > > >>> return 0; > > >>> > > >>> - return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode, > sync_mode, p->vm, true); > > >>> + return amdgpu_bo_sync_wait_resv(p->adev, resv, sync_mode, > AMDGPU_SYNC_EXPLICIT, p->vm, true); > > >>> } > > >>> > > >>> /** > > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c > > >>> index 63b484dc76c5..c8d5898bea11 100644 > > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c > > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c > > >>> @@ -75,7 +75,7 @@ static int amdgpu_vm_sdma_prepare(struct > amdgpu_vm_update_params *p, > > >>> if (!resv) > > >>> return 0; > > >>> > > >>> - return amdgpu_sync_resv(p->adev, &p->job->sync, resv, > sync_mode, sync_mode, p->vm); > > >>> + return amdgpu_sync_resv(p->adev, &p->job->sync, resv, > sync_mode, AMDGPU_SYNC_EXPLICIT, p->vm); > > >>> } > > >>> > > >>> /** > > > --000000000000147d4405e05f1b94 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Wed, Jun 1, 2022, 10:48 Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote= :
On Wed, Jun 1, 2022 at 10:40 AM C= hristian K=C3=B6nig
<christian.koenig@amd.com> wrote:
>
> Am 01.06.22 um 10:16 schrieb Bas Nieuwenhuizen:
> > On Wed, Jun 1, 2022 at 10:03 AM Christian K=C3=B6nig
> > <christian.koenig@amd.com> wrote:
> >> Am 01.06.22 um 02:40 schrieb Bas Nieuwenhuizen:
> >>> This should be okay because moves themselves use KERNEL u= sage and
> >>> hence still sync with BOOKKEEP usage. Then any later subm= its still
> >>> wait on any pending VM operations.
> >>>
> >>> (i.e. we only made VM ops not wait on BOOKKEEP submits, n= ot the other
> >>>=C2=A0 =C2=A0 way around)
> >> Well NAK again. This allows access to freed up memory and is = a complete
> >> no-go.
> > How does this allow access to freed memory? Worst I can see is th= at
> > the unmap happens earlier if the app/drivers gets the waits wrong= ,
> > which wouldn't give access after the underlying BO is freed?<= br> >
> To free up memory we need to update the PTEs and then flush those out = by
> invalidating the TLB.
>
> On gfx6, gfx7 and gfx8 and some broken gfx10 hw invalidating the TLB c= an
> only be done while the VMID is idle.
>
> Only gfx9 can reliable invalidate the TLB while it is in use and even<= br> > there it comes with quite some performance penalty (at TLB invalidatio= n
> can take multiple seconds).
>
> Because of this what we do in the kernel driver is to sync to everythi= ng
> when we unmap entries:
>
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!(flags & AMDGPU_PTE_VALID))=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sync_mod= e =3D AMDGPU_SYNC_EQ_OWNER;
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 else
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 sync_mod= e =3D AMDGPU_SYNC_EXPLICIT;
>
> This acts as a barrier for freeing the memory. In other words we
> intentionally add a bubble which syncs everything.
>
> I'm working for month on a concept how to do all this without caus= ing
> the stalls you observer, but so far didn't came to much of a concl= usion.

That might cause an unmap operation too early, but for freeing up the
actual backing memory we still wait for all fences on the BO to finish
first, no? In that case, since BOOKKEEP fences are still added for
explicit sync, that should not be a problem, no?

(If not, that sounds like the obvious fix for making this work?)

As an aside= this is the same hole/issue as when an app forgets a bo in the bo list on = submission.
>
> Regards,
> Christian.
>
> >
> >> Regards,
> >> Christian.
> >>
> >>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenh= uizen.nl>
> >>> ---
> >>>=C2=A0 =C2=A0 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c= =C2=A0 | 2 +-
> >>>=C2=A0 =C2=A0 drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c = | 2 +-
> >>>=C2=A0 =C2=A0 2 files changed, 2 insertions(+), 2 deletion= s(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c b= /drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> >>> index f10332e1c6c0..31bc73fd1fae 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_cpu.c
> >>> @@ -51,7 +51,7 @@ static int amdgpu_vm_cpu_prepare(struct= amdgpu_vm_update_params *p,
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!resv)
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 re= turn 0;
> >>>
> >>> -=C2=A0 =C2=A0 =C2=A0return amdgpu_bo_sync_wait_resv(p-&g= t;adev, resv, sync_mode, sync_mode, p->vm, true);
> >>> +=C2=A0 =C2=A0 =C2=A0return amdgpu_bo_sync_wait_resv(p-&g= t;adev, resv, sync_mode, AMDGPU_SYNC_EXPLICIT, p->vm, true);
> >>>=C2=A0 =C2=A0 }
> >>>
> >>>=C2=A0 =C2=A0 /**
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c = b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> >>> index 63b484dc76c5..c8d5898bea11 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_sdma.c
> >>> @@ -75,7 +75,7 @@ static int amdgpu_vm_sdma_prepare(struc= t amdgpu_vm_update_params *p,
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!resv)
> >>>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 re= turn 0;
> >>>
> >>> -=C2=A0 =C2=A0 =C2=A0return amdgpu_sync_resv(p->adev, = &p->job->sync, resv, sync_mode, sync_mode, p->vm);
> >>> +=C2=A0 =C2=A0 =C2=A0return amdgpu_sync_resv(p->adev, = &p->job->sync, resv, sync_mode, AMDGPU_SYNC_EXPLICIT, p->vm);<= br> > >>>=C2=A0 =C2=A0 }
> >>>
> >>>=C2=A0 =C2=A0 /**
>
--000000000000147d4405e05f1b94--