* [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) @ 2017-01-18 11:42 Monk Liu [not found] ` <1484739739-30080-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Monk Liu @ 2017-01-18 11:42 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu previously we always insert 128nops behind vm_flush, which may lead to DAMframe size above 256 dw and automatially aligned to 512 dw. now we calculate how many DWs already inserted after vm_flush and make up for the reset to pad up to 128dws before emit_ib. that way we only take 256 dw per submit. v2: drop insert_nops in vm_flush re-calculate the estimate frame size for gfx8 ring v3: calculate the gap between vm_flush and IB in cntx_cntl on an member field of ring structure v4: set last_vm_flush_pos even for case of no vm flush. Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd Signed-off-by: Monk Liu <Monk.Liu@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- 3 files changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index c813cbe..332969f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -173,6 +173,7 @@ struct amdgpu_ring { #if defined(CONFIG_DEBUG_FS) struct dentry *ent; #endif + u32 last_vm_flush_pos; }; int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index d05546e..53fc5e0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) amdgpu_vm_ring_has_compute_vm_bug(ring))) amdgpu_ring_emit_pipeline_sync(ring); + /* when no vm-flush this frame, we still need to mark down + * the position of the tail of hdp_flush, because we still + * need to make sure there are 128 DWs between last SWITCH_BUFFER and + * the emit_ib this frame. otherwise there is still VM fault occured on + * constant engine. + */ + ring->last_vm_flush_pos = ring->wptr; if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || amdgpu_vm_is_gpu_reset(adev, id))) { struct fence *fence; diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 5f37313..dde4177 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); amdgpu_ring_write(ring, lower_32_bits(seq)); amdgpu_ring_write(ring, upper_32_bits(seq)); - } static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, /* sync PFP to ME, otherwise we might get invalid PFP reads */ amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); amdgpu_ring_write(ring, 0x0); - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ - amdgpu_ring_insert_nop(ring, 128); } + ring->last_vm_flush_pos = ring->wptr; } static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) if (amdgpu_sriov_vf(ring->adev)) gfx_v8_0_ring_emit_de_meta_init(ring, (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : ring->adev->virt.csa_vmid0_addr); + + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of + * vm_flush, which may trigger VM fault. */ + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - ring->last_vm_flush_pos)); + else + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) + amdgpu_ring_insert_nop(ring, + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr)); } static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ 2 + /* gfx_v8_ring_emit_sb */ - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush fence/cntx_cntl/vgt_flush/meta-data anymore */ .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ .emit_ib = gfx_v8_0_ring_emit_ib_gfx, .emit_fence = gfx_v8_0_ring_emit_fence_gfx, -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 11+ messages in thread
[parent not found: <1484739739-30080-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org>]
* Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <1484739739-30080-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org> @ 2017-01-19 9:10 ` Christian König [not found] ` <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-01-19 14:49 ` Liu, Monk 0 siblings, 2 replies; 11+ messages in thread From: Christian König @ 2017-01-19 9:10 UTC (permalink / raw) To: Monk Liu, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Am 18.01.2017 um 12:42 schrieb Monk Liu: > previously we always insert 128nops behind vm_flush, which > may lead to DAMframe size above 256 dw and automatially aligned > to 512 dw. > > now we calculate how many DWs already inserted after vm_flush > and make up for the reset to pad up to 128dws before emit_ib. > that way we only take 256 dw per submit. > > v2: > drop insert_nops in vm_flush > re-calculate the estimate frame size for gfx8 ring > v3: > calculate the gap between vm_flush and IB in cntx_cntl > on an member field of ring structure > v4: > set last_vm_flush_pos even for case of no vm flush. > > Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd > Signed-off-by: Monk Liu <Monk.Liu@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- > 3 files changed, 20 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index c813cbe..332969f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -173,6 +173,7 @@ struct amdgpu_ring { > #if defined(CONFIG_DEBUG_FS) > struct dentry *ent; > #endif > + u32 last_vm_flush_pos; > }; > > int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index d05546e..53fc5e0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) > amdgpu_vm_ring_has_compute_vm_bug(ring))) > amdgpu_ring_emit_pipeline_sync(ring); > > + /* when no vm-flush this frame, we still need to mark down > + * the position of the tail of hdp_flush, because we still > + * need to make sure there are 128 DWs between last SWITCH_BUFFER and > + * the emit_ib this frame. otherwise there is still VM fault occured on > + * constant engine. > + */ > + ring->last_vm_flush_pos = ring->wptr; > if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || > amdgpu_vm_is_gpu_reset(adev, id))) { > struct fence *fence; > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > index 5f37313..dde4177 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, > DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); > amdgpu_ring_write(ring, lower_32_bits(seq)); > amdgpu_ring_write(ring, upper_32_bits(seq)); > - > } > > static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring) > @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, > /* sync PFP to ME, otherwise we might get invalid PFP reads */ > amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); > amdgpu_ring_write(ring, 0x0); > - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ > - amdgpu_ring_insert_nop(ring, 128); > } > + ring->last_vm_flush_pos = ring->wptr; > } > > static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) > @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) > if (amdgpu_sriov_vf(ring->adev)) > gfx_v8_0_ring_emit_de_meta_init(ring, > (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : ring->adev->virt.csa_vmid0_addr); > + > + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of > + * vm_flush, which may trigger VM fault. */ > + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ > + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - ring->last_vm_flush_pos)); This can easily result in a negative number, couldn't it? > + else > + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) > + amdgpu_ring_insert_nop(ring, > + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr)); I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > } > > static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg) > @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { > 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ > 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ > 2 + /* gfx_v8_ring_emit_sb */ > - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ > + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. Apart from that the patch looks good to me, Christian. > + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush fence/cntx_cntl/vgt_flush/meta-data anymore */ > .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ > .emit_ib = gfx_v8_0_ring_emit_ib_gfx, > .emit_fence = gfx_v8_0_ring_emit_fence_gfx, _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-01-19 13:51 ` Grazvydas Ignotas [not found] ` <CANOLnOOwu2Fd0Svx5tcp7kqb2b28Ar30gTd-FeY4xQwm32J5YQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-19 14:39 ` Liu, Monk 1 sibling, 1 reply; 11+ messages in thread From: Grazvydas Ignotas @ 2017-01-19 13:51 UTC (permalink / raw) To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Monk Liu On Thu, Jan 19, 2017 at 11:10 AM, Christian König <deathsimple@vodafone.de> wrote: > Am 18.01.2017 um 12:42 schrieb Monk Liu: >> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct >> amdgpu_ring *ring, uint32_t flags) >> if (amdgpu_sriov_vf(ring->adev)) >> gfx_v8_0_ring_emit_de_meta_init(ring, >> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >> ring->adev->virt.csa_vmid0_addr); >> + >> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead >> of >> + * vm_flush, which may trigger VM fault. */ >> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to >> RB head */ >> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >> ring->last_vm_flush_pos)); > > > This can easily result in a negative number, couldn't it? > >> + else >> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >> ring->wptr < 128) >> + amdgpu_ring_insert_nop(ring, >> + 128 - (ring->ptr_mask + 1 - >> ring->last_vm_flush_pos + ring->wptr)); > > > I think it would be cleaner if you calculate the number of NOPs needed first > for both cases and then check if the number isn't negative for both cases. What about this: 128 - ((ring->wptr - ring->last_vm_flush_pos) & 127) Gražvydas _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <CANOLnOOwu2Fd0Svx5tcp7kqb2b28Ar30gTd-FeY4xQwm32J5YQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <CANOLnOOwu2Fd0Svx5tcp7kqb2b28Ar30gTd-FeY4xQwm32J5YQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2017-01-19 14:32 ` Christian König [not found] ` <fd28a4f9-80ce-49d4-d6e7-9765988e0181-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Christian König @ 2017-01-19 14:32 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Monk Liu Am 19.01.2017 um 14:51 schrieb Grazvydas Ignotas: > On Thu, Jan 19, 2017 at 11:10 AM, Christian König > <deathsimple@vodafone.de> wrote: >> Am 18.01.2017 um 12:42 schrieb Monk Liu: >>> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct >>> amdgpu_ring *ring, uint32_t flags) >>> if (amdgpu_sriov_vf(ring->adev)) >>> gfx_v8_0_ring_emit_de_meta_init(ring, >>> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >>> ring->adev->virt.csa_vmid0_addr); >>> + >>> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead >>> of >>> + * vm_flush, which may trigger VM fault. */ >>> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to >>> RB head */ >>> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >>> ring->last_vm_flush_pos)); >> >> This can easily result in a negative number, couldn't it? >> >>> + else >>> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >>> ring->wptr < 128) >>> + amdgpu_ring_insert_nop(ring, >>> + 128 - (ring->ptr_mask + 1 - >>> ring->last_vm_flush_pos + ring->wptr)); >> >> I think it would be cleaner if you calculate the number of NOPs needed first >> for both cases and then check if the number isn't negative for both cases. > What about this: > 128 - ((ring->wptr - ring->last_vm_flush_pos) & 127) That won't handle the case for negative nop count correctly either. See when we already emitted more than 128 dw we don't want to add some more. Christian. > > Gražvydas _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <fd28a4f9-80ce-49d4-d6e7-9765988e0181-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <fd28a4f9-80ce-49d4-d6e7-9765988e0181-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-01-19 15:16 ` Grazvydas Ignotas 0 siblings, 0 replies; 11+ messages in thread From: Grazvydas Ignotas @ 2017-01-19 15:16 UTC (permalink / raw) To: Christian König; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Monk Liu On Thu, Jan 19, 2017 at 4:32 PM, Christian König <deathsimple@vodafone.de> wrote: > Am 19.01.2017 um 14:51 schrieb Grazvydas Ignotas: >> >> On Thu, Jan 19, 2017 at 11:10 AM, Christian König >> <deathsimple@vodafone.de> wrote: >>> >>> Am 18.01.2017 um 12:42 schrieb Monk Liu: >>>> >>>> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct >>>> amdgpu_ring *ring, uint32_t flags) >>>> if (amdgpu_sriov_vf(ring->adev)) >>>> gfx_v8_0_ring_emit_de_meta_init(ring, >>>> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >>>> ring->adev->virt.csa_vmid0_addr); >>>> + >>>> + /* We need to pad some NOPs before emit_ib to prevent CE run >>>> ahead >>>> of >>>> + * vm_flush, which may trigger VM fault. */ >>>> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to >>>> RB head */ >>>> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >>>> ring->last_vm_flush_pos)); >>> >>> >>> This can easily result in a negative number, couldn't it? >>> >>>> + else >>>> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >>>> ring->wptr < 128) >>>> + amdgpu_ring_insert_nop(ring, >>>> + 128 - (ring->ptr_mask + 1 - >>>> ring->last_vm_flush_pos + ring->wptr)); >>> >>> >>> I think it would be cleaner if you calculate the number of NOPs needed >>> first >>> for both cases and then check if the number isn't negative for both >>> cases. >> >> What about this: >> 128 - ((ring->wptr - ring->last_vm_flush_pos) & 127) > > > That won't handle the case for negative nop count correctly either. > > See when we already emitted more than 128 dw we don't want to add some more. Let me try again then: count_added = (ring->wptr - ring->last_vm_flush_pos) & ring->ptr_mask; if (count_added < 128) amdgpu_ring_insert_nop(ring, 128 - count_added); Gražvydas _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-01-19 13:51 ` Grazvydas Ignotas @ 2017-01-19 14:39 ` Liu, Monk 1 sibling, 0 replies; 11+ messages in thread From: Liu, Monk @ 2017-01-19 14:39 UTC (permalink / raw) To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Christian I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff) Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit() So that means your approach can not make sure there will be 128 NOP before vm_flush at all ! e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs .... your approach only works with wptr == 0 for the first UMD submit BR Monk -----Original Message----- From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian K?nig Sent: Thursday, January 19, 2017 5:11 PM To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Am 18.01.2017 um 12:42 schrieb Monk Liu: > previously we always insert 128nops behind vm_flush, which may lead to > DAMframe size above 256 dw and automatially aligned to 512 dw. > > now we calculate how many DWs already inserted after vm_flush and make > up for the reset to pad up to 128dws before emit_ib. > that way we only take 256 dw per submit. > > v2: > drop insert_nops in vm_flush > re-calculate the estimate frame size for gfx8 ring > v3: > calculate the gap between vm_flush and IB in cntx_cntl on an member > field of ring structure > v4: > set last_vm_flush_pos even for case of no vm flush. > > Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd > Signed-off-by: Monk Liu <Monk.Liu@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- > 3 files changed, 20 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index c813cbe..332969f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -173,6 +173,7 @@ struct amdgpu_ring { > #if defined(CONFIG_DEBUG_FS) > struct dentry *ent; > #endif > + u32 last_vm_flush_pos; > }; > > int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index d05546e..53fc5e0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) > amdgpu_vm_ring_has_compute_vm_bug(ring))) > amdgpu_ring_emit_pipeline_sync(ring); > > + /* when no vm-flush this frame, we still need to mark down > + * the position of the tail of hdp_flush, because we still > + * need to make sure there are 128 DWs between last SWITCH_BUFFER and > + * the emit_ib this frame. otherwise there is still VM fault occured on > + * constant engine. > + */ > + ring->last_vm_flush_pos = ring->wptr; > if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || > amdgpu_vm_is_gpu_reset(adev, id))) { > struct fence *fence; > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > index 5f37313..dde4177 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, > DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); > amdgpu_ring_write(ring, lower_32_bits(seq)); > amdgpu_ring_write(ring, upper_32_bits(seq)); > - > } > > static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring > *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, > /* sync PFP to ME, otherwise we might get invalid PFP reads */ > amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); > amdgpu_ring_write(ring, 0x0); > - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ > - amdgpu_ring_insert_nop(ring, 128); > } > + ring->last_vm_flush_pos = ring->wptr; > } > > static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) > @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) > if (amdgpu_sriov_vf(ring->adev)) > gfx_v8_0_ring_emit_de_meta_init(ring, > (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : > ring->adev->virt.csa_vmid0_addr); > + > + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of > + * vm_flush, which may trigger VM fault. */ > + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ > + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - > +ring->last_vm_flush_pos)); This can easily result in a negative number, couldn't it? > + else > + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) > + amdgpu_ring_insert_nop(ring, > + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + > +ring->wptr)); I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > } > > static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, > uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { > 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ > 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ > 2 + /* gfx_v8_ring_emit_sb */ > - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ > + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt > +flush/meta-data */ Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. Apart from that the patch looks good to me, Christian. > + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush > +fence/cntx_cntl/vgt_flush/meta-data anymore */ > .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ > .emit_ib = gfx_v8_0_ring_emit_ib_gfx, > .emit_fence = gfx_v8_0_ring_emit_fence_gfx, _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) 2017-01-19 9:10 ` Christian König [not found] ` <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-01-19 14:49 ` Liu, Monk [not found] ` <BY2PR1201MB1110457767084D1B300C67D8847E0-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 1 sibling, 1 reply; 11+ messages in thread From: Liu, Monk @ 2017-01-19 14:49 UTC (permalink / raw) To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW And with more think on it, even the very first UMD cmd start with wptr at 256/512 position ( because ib test ends up at 255 aligned position), your approach still cannot satisfy 128 DW between SWTICH_BUFFER and Preamble IB (if no vm_flush) or 128 Nops before VM_flush (if needed) for the next frame. e.g. we insert 180 dw this frame, start from0, end up at 179, and ring_commit() padding NOPs and make wptr end up at 255, so only 75 NOPs is between SWITCH_BUFFER and next frame (assume we don't have vm-flush next frame) BR Monk -----Original Message----- From: Liu, Monk Sent: Thursday, January 19, 2017 10:39 PM To: 'Christian König' <deathsimple@vodafone.de>; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Christian I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff) Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit() So that means your approach can not make sure there will be 128 NOP before vm_flush at all ! e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs .... your approach only works with wptr == 0 for the first UMD submit BR Monk -----Original Message----- From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian K?nig Sent: Thursday, January 19, 2017 5:11 PM To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Am 18.01.2017 um 12:42 schrieb Monk Liu: > previously we always insert 128nops behind vm_flush, which may lead to > DAMframe size above 256 dw and automatially aligned to 512 dw. > > now we calculate how many DWs already inserted after vm_flush and make > up for the reset to pad up to 128dws before emit_ib. > that way we only take 256 dw per submit. > > v2: > drop insert_nops in vm_flush > re-calculate the estimate frame size for gfx8 ring > v3: > calculate the gap between vm_flush and IB in cntx_cntl on an member > field of ring structure > v4: > set last_vm_flush_pos even for case of no vm flush. > > Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd > Signed-off-by: Monk Liu <Monk.Liu@amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ > drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- > 3 files changed, 20 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index c813cbe..332969f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -173,6 +173,7 @@ struct amdgpu_ring { > #if defined(CONFIG_DEBUG_FS) > struct dentry *ent; > #endif > + u32 last_vm_flush_pos; > }; > > int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff > --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index d05546e..53fc5e0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) > amdgpu_vm_ring_has_compute_vm_bug(ring))) > amdgpu_ring_emit_pipeline_sync(ring); > > + /* when no vm-flush this frame, we still need to mark down > + * the position of the tail of hdp_flush, because we still > + * need to make sure there are 128 DWs between last SWITCH_BUFFER and > + * the emit_ib this frame. otherwise there is still VM fault occured on > + * constant engine. > + */ > + ring->last_vm_flush_pos = ring->wptr; > if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || > amdgpu_vm_is_gpu_reset(adev, id))) { > struct fence *fence; > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > index 5f37313..dde4177 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c > @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, > DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); > amdgpu_ring_write(ring, lower_32_bits(seq)); > amdgpu_ring_write(ring, upper_32_bits(seq)); > - > } > > static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring > *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, > /* sync PFP to ME, otherwise we might get invalid PFP reads */ > amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); > amdgpu_ring_write(ring, 0x0); > - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ > - amdgpu_ring_insert_nop(ring, 128); > } > + ring->last_vm_flush_pos = ring->wptr; > } > > static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) > @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) > if (amdgpu_sriov_vf(ring->adev)) > gfx_v8_0_ring_emit_de_meta_init(ring, > (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : > ring->adev->virt.csa_vmid0_addr); > + > + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of > + * vm_flush, which may trigger VM fault. */ > + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ > + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - > +ring->last_vm_flush_pos)); This can easily result in a negative number, couldn't it? > + else > + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) > + amdgpu_ring_insert_nop(ring, > + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + > +ring->wptr)); I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > } > > static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, > uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { > 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ > 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ > 2 + /* gfx_v8_ring_emit_sb */ > - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ > + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt > +flush/meta-data */ Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. Apart from that the patch looks good to me, Christian. > + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush > +fence/cntx_cntl/vgt_flush/meta-data anymore */ > .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ > .emit_ib = gfx_v8_0_ring_emit_ib_gfx, > .emit_fence = gfx_v8_0_ring_emit_fence_gfx, _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <BY2PR1201MB1110457767084D1B300C67D8847E0-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <BY2PR1201MB1110457767084D1B300C67D8847E0-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-01-20 8:48 ` Christian König [not found] ` <72ce673f-a232-290d-62c0-082e6ede3bbd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Christian König @ 2017-01-20 8:48 UTC (permalink / raw) To: Liu, Monk, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Yeah, thought about that possibility as well when we change the padding to 256. This happens every time our command submission uses more than 128 dw, in this case you obviously have less than 128 padding. But why doesn't this happen on Windows then? Regards, Christian. Am 19.01.2017 um 15:49 schrieb Liu, Monk: > And with more think on it, even the very first UMD cmd start with wptr at 256/512 position ( because ib test ends up at 255 aligned position), your approach still cannot satisfy 128 DW between SWTICH_BUFFER and Preamble IB (if no vm_flush) or 128 Nops before VM_flush (if needed) for the next frame. > > e.g. we insert 180 dw this frame, start from0, end up at 179, and ring_commit() padding NOPs and make wptr end up at 255, so only 75 NOPs is between SWITCH_BUFFER and next frame (assume we don't have vm-flush next frame) > > BR Monk > > -----Original Message----- > From: Liu, Monk > Sent: Thursday, January 19, 2017 10:39 PM > To: 'Christian König' <deathsimple@vodafone.de>; amd-gfx@lists.freedesktop.org > Subject: RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Christian > > I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff) > > Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit() > > So that means your approach can not make sure there will be 128 NOP before vm_flush at all ! > > e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs .... > > your approach only works with wptr == 0 for the first UMD submit > > BR Monk > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian K?nig > Sent: Thursday, January 19, 2017 5:11 PM > To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Am 18.01.2017 um 12:42 schrieb Monk Liu: >> previously we always insert 128nops behind vm_flush, which may lead to >> DAMframe size above 256 dw and automatially aligned to 512 dw. >> >> now we calculate how many DWs already inserted after vm_flush and make >> up for the reset to pad up to 128dws before emit_ib. >> that way we only take 256 dw per submit. >> >> v2: >> drop insert_nops in vm_flush >> re-calculate the estimate frame size for gfx8 ring >> v3: >> calculate the gap between vm_flush and IB in cntx_cntl on an member >> field of ring structure >> v4: >> set last_vm_flush_pos even for case of no vm flush. >> >> Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd >> Signed-off-by: Monk Liu <Monk.Liu@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- >> 3 files changed, 20 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index c813cbe..332969f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -173,6 +173,7 @@ struct amdgpu_ring { >> #if defined(CONFIG_DEBUG_FS) >> struct dentry *ent; >> #endif >> + u32 last_vm_flush_pos; >> }; >> >> int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff >> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> index d05546e..53fc5e0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) >> amdgpu_vm_ring_has_compute_vm_bug(ring))) >> amdgpu_ring_emit_pipeline_sync(ring); >> >> + /* when no vm-flush this frame, we still need to mark down >> + * the position of the tail of hdp_flush, because we still >> + * need to make sure there are 128 DWs between last SWITCH_BUFFER and >> + * the emit_ib this frame. otherwise there is still VM fault occured on >> + * constant engine. >> + */ >> + ring->last_vm_flush_pos = ring->wptr; >> if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || >> amdgpu_vm_is_gpu_reset(adev, id))) { >> struct fence *fence; >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> index 5f37313..dde4177 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, >> DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); >> amdgpu_ring_write(ring, lower_32_bits(seq)); >> amdgpu_ring_write(ring, upper_32_bits(seq)); >> - >> } >> >> static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring >> *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >> /* sync PFP to ME, otherwise we might get invalid PFP reads */ >> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); >> amdgpu_ring_write(ring, 0x0); >> - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ >> - amdgpu_ring_insert_nop(ring, 128); >> } >> + ring->last_vm_flush_pos = ring->wptr; >> } >> >> static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) >> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) >> if (amdgpu_sriov_vf(ring->adev)) >> gfx_v8_0_ring_emit_de_meta_init(ring, >> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >> ring->adev->virt.csa_vmid0_addr); >> + >> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of >> + * vm_flush, which may trigger VM fault. */ >> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ >> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >> +ring->last_vm_flush_pos)); > This can easily result in a negative number, couldn't it? > >> + else >> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) >> + amdgpu_ring_insert_nop(ring, >> + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >> +ring->wptr)); > I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > >> } >> >> static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, >> uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { >> 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ >> 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ >> 2 + /* gfx_v8_ring_emit_sb */ >> - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ >> + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt >> +flush/meta-data */ > Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. > > Apart from that the patch looks good to me, Christian. > >> + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush >> +fence/cntx_cntl/vgt_flush/meta-data anymore */ >> .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ >> .emit_ib = gfx_v8_0_ring_emit_ib_gfx, >> .emit_fence = gfx_v8_0_ring_emit_fence_gfx, > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <72ce673f-a232-290d-62c0-082e6ede3bbd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>]
* 答复: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <72ce673f-a232-290d-62c0-082e6ede3bbd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> @ 2017-01-22 2:47 ` Liu, Monk [not found] ` <BY2PR1201MB11108334E6D9C3AAE1DF30D384730-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 0 siblings, 1 reply; 11+ messages in thread From: Liu, Monk @ 2017-01-22 2:47 UTC (permalink / raw) To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 9871 bytes --] 1, windows separate vm flush and gfx submit, so if with vm flush windows actually submit 256 dw totally (128 for vm flush and 128 for gfx submit) 2, for gfx submit frame: windows always emit pipeline sync(name is "w0ait_3d_idle" for windows) 3, for vm flush frame: windows always emit pipeline sync(name is drain_pfp_to_EOP for windows) prior to pd update and following vm invalidate. >But why doesn't this happen on Windows then? [ML] good question, I guess the answer is windows always insert pipeline sync. for the case without vm flush: a pipeline sync in gfx submit can prevent ME ruin SH/3d register before previous pipeline finished (DE ib will programing lots of 3d/sh register; and CE ib will also change many resource descriptor if there is no context switch this frame) for the case with a vm flush: a pipeline sync before vm invalidate can also prevent ME invalidate VM before previous pipeline finished. I think a vm invalidate will trigger VM fault if 3d pipeline is working on the invalidated VMID *or* invalidated PD address. so I think we should move pipeline sync out of VM flush next step, and always put it prior to IB as well as vm invalidate BR Monk ________________________________ 发件人: Christian König <deathsimple@vodafone.de> 发送时间: 2017年1月20日 16:48:17 收件人: Liu, Monk; amd-gfx@lists.freedesktop.org 主题: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Yeah, thought about that possibility as well when we change the padding to 256. This happens every time our command submission uses more than 128 dw, in this case you obviously have less than 128 padding. But why doesn't this happen on Windows then? Regards, Christian. Am 19.01.2017 um 15:49 schrieb Liu, Monk: > And with more think on it, even the very first UMD cmd start with wptr at 256/512 position ( because ib test ends up at 255 aligned position), your approach still cannot satisfy 128 DW between SWTICH_BUFFER and Preamble IB (if no vm_flush) or 128 Nops before VM_flush (if needed) for the next frame. > > e.g. we insert 180 dw this frame, start from0, end up at 179, and ring_commit() padding NOPs and make wptr end up at 255, so only 75 NOPs is between SWITCH_BUFFER and next frame (assume we don't have vm-flush next frame) > > BR Monk > > -----Original Message----- > From: Liu, Monk > Sent: Thursday, January 19, 2017 10:39 PM > To: 'Christian König' <deathsimple@vodafone.de>; amd-gfx@lists.freedesktop.org > Subject: RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Christian > > I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff) > > Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit() > > So that means your approach can not make sure there will be 128 NOP before vm_flush at all ! > > e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs .... > > your approach only works with wptr == 0 for the first UMD submit > > BR Monk > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian K?nig > Sent: Thursday, January 19, 2017 5:11 PM > To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Am 18.01.2017 um 12:42 schrieb Monk Liu: >> previously we always insert 128nops behind vm_flush, which may lead to >> DAMframe size above 256 dw and automatially aligned to 512 dw. >> >> now we calculate how many DWs already inserted after vm_flush and make >> up for the reset to pad up to 128dws before emit_ib. >> that way we only take 256 dw per submit. >> >> v2: >> drop insert_nops in vm_flush >> re-calculate the estimate frame size for gfx8 ring >> v3: >> calculate the gap between vm_flush and IB in cntx_cntl on an member >> field of ring structure >> v4: >> set last_vm_flush_pos even for case of no vm flush. >> >> Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd >> Signed-off-by: Monk Liu <Monk.Liu@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- >> 3 files changed, 20 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index c813cbe..332969f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -173,6 +173,7 @@ struct amdgpu_ring { >> #if defined(CONFIG_DEBUG_FS) >> struct dentry *ent; >> #endif >> + u32 last_vm_flush_pos; >> }; >> >> int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff >> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> index d05546e..53fc5e0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) >> amdgpu_vm_ring_has_compute_vm_bug(ring))) >> amdgpu_ring_emit_pipeline_sync(ring); >> >> + /* when no vm-flush this frame, we still need to mark down >> + * the position of the tail of hdp_flush, because we still >> + * need to make sure there are 128 DWs between last SWITCH_BUFFER and >> + * the emit_ib this frame. otherwise there is still VM fault occured on >> + * constant engine. >> + */ >> + ring->last_vm_flush_pos = ring->wptr; >> if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || >> amdgpu_vm_is_gpu_reset(adev, id))) { >> struct fence *fence; >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> index 5f37313..dde4177 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, >> DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); >> amdgpu_ring_write(ring, lower_32_bits(seq)); >> amdgpu_ring_write(ring, upper_32_bits(seq)); >> - >> } >> >> static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring >> *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >> /* sync PFP to ME, otherwise we might get invalid PFP reads */ >> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); >> amdgpu_ring_write(ring, 0x0); >> - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ >> - amdgpu_ring_insert_nop(ring, 128); >> } >> + ring->last_vm_flush_pos = ring->wptr; >> } >> >> static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) >> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) >> if (amdgpu_sriov_vf(ring->adev)) >> gfx_v8_0_ring_emit_de_meta_init(ring, >> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >> ring->adev->virt.csa_vmid0_addr); >> + >> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of >> + * vm_flush, which may trigger VM fault. */ >> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ >> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >> +ring->last_vm_flush_pos)); > This can easily result in a negative number, couldn't it? > >> + else >> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) >> + amdgpu_ring_insert_nop(ring, >> + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >> +ring->wptr)); > I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > >> } >> >> static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, >> uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { >> 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ >> 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ >> 2 + /* gfx_v8_ring_emit_sb */ >> - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ >> + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt >> +flush/meta-data */ > Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. > > Apart from that the patch looks good to me, Christian. > >> + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush >> +fence/cntx_cntl/vgt_flush/meta-data anymore */ >> .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ >> .emit_ib = gfx_v8_0_ring_emit_ib_gfx, >> .emit_fence = gfx_v8_0_ring_emit_fence_gfx, > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 16174 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
[parent not found: <BY2PR1201MB11108334E6D9C3AAE1DF30D384730-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org>]
* 答复: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) [not found] ` <BY2PR1201MB11108334E6D9C3AAE1DF30D384730-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> @ 2017-01-22 3:53 ` Liu, Monk 0 siblings, 0 replies; 11+ messages in thread From: Liu, Monk @ 2017-01-22 3:53 UTC (permalink / raw) To: Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW [-- Attachment #1.1: Type: text/plain, Size: 10703 bytes --] fix typo: [ML] good question, I guess the answer is windows always insert pipeline sync. for the case without vm flush: a pipeline sync in gfx submit can prevent ME ruin SH/3d register before previous pipeline finished (DE ib will programing lots of 3d/sh register and CE ib will also change many resource descriptor if context switch) for the case with a vm flush: a pipeline sync before vm invalidate can also prevent ME invalidate VM before previous pipeline finished. (windows use drain_pfp_EOP) ________________________________ 发件人: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> 代表 Liu, Monk <Monk.Liu@amd.com> 发送时间: 2017年1月22日 10:47:03 收件人: Christian König; amd-gfx@lists.freedesktop.org 主题: 答复: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) 1, windows separate vm flush and gfx submit, so if with vm flush windows actually submit 256 dw totally (128 for vm flush and 128 for gfx submit) 2, for gfx submit frame: windows always emit pipeline sync(name is "w0ait_3d_idle" for windows) 3, for vm flush frame: windows always emit pipeline sync(name is drain_pfp_to_EOP for windows) prior to pd update and following vm invalidate. >But why doesn't this happen on Windows then? [ML] good question, I guess the answer is windows always insert pipeline sync. for the case without vm flush: a pipeline sync in gfx submit can prevent ME ruin SH/3d register before previous pipeline finished (DE ib will programing lots of 3d/sh register; and CE ib will also change many resource descriptor if there is no context switch this frame) for the case with a vm flush: a pipeline sync before vm invalidate can also prevent ME invalidate VM before previous pipeline finished. I think a vm invalidate will trigger VM fault if 3d pipeline is working on the invalidated VMID *or* invalidated PD address. so I think we should move pipeline sync out of VM flush next step, and always put it prior to IB as well as vm invalidate BR Monk ________________________________ 发件人: Christian König <deathsimple@vodafone.de> 发送时间: 2017年1月20日 16:48:17 收件人: Liu, Monk; amd-gfx@lists.freedesktop.org 主题: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Yeah, thought about that possibility as well when we change the padding to 256. This happens every time our command submission uses more than 128 dw, in this case you obviously have less than 128 padding. But why doesn't this happen on Windows then? Regards, Christian. Am 19.01.2017 um 15:49 schrieb Liu, Monk: > And with more think on it, even the very first UMD cmd start with wptr at 256/512 position ( because ib test ends up at 255 aligned position), your approach still cannot satisfy 128 DW between SWTICH_BUFFER and Preamble IB (if no vm_flush) or 128 Nops before VM_flush (if needed) for the next frame. > > e.g. we insert 180 dw this frame, start from0, end up at 179, and ring_commit() padding NOPs and make wptr end up at 255, so only 75 NOPs is between SWITCH_BUFFER and next frame (assume we don't have vm-flush next frame) > > BR Monk > > -----Original Message----- > From: Liu, Monk > Sent: Thursday, January 19, 2017 10:39 PM > To: 'Christian König' <deathsimple@vodafone.de>; amd-gfx@lists.freedesktop.org > Subject: RE: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Christian > > I just found one issue in your previous patch ( the one you remove inserting 128nop before vm_fush and meanwhile set align_mask of GFX ring to 0xff) > > Because I checked amdgpu_ring_commit() again, this function will not guarantee current submit size aligned with 256 dw, instead it only guarantee the ring->wptr will be 256 dw aligned after amdgpu_ring_commit() > > So that means your approach can not make sure there will be 128 NOP before vm_flush at all ! > > e.g. this frame start at 200 position of RB, and it submitted 50 dw so it ends at 249 position, you call amdgpu_ring_commit() now and it only padding 6 more NOPs to ring buffer, so wptr finnaly end at 255, that clearly not what we want originally ~ we want to insert 128 nop after SWITCH_BUFFER (or say, before vm_flush), but you only add 6 NOPs .... > > your approach only works with wptr == 0 for the first UMD submit > > BR Monk > > -----Original Message----- > From: amd-gfx [mailto:amd-gfx-bounces@lists.freedesktop.org] On Behalf Of Christian K?nig > Sent: Thursday, January 19, 2017 5:11 PM > To: Liu, Monk <Monk.Liu@amd.com>; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) > > Am 18.01.2017 um 12:42 schrieb Monk Liu: >> previously we always insert 128nops behind vm_flush, which may lead to >> DAMframe size above 256 dw and automatially aligned to 512 dw. >> >> now we calculate how many DWs already inserted after vm_flush and make >> up for the reset to pad up to 128dws before emit_ib. >> that way we only take 256 dw per submit. >> >> v2: >> drop insert_nops in vm_flush >> re-calculate the estimate frame size for gfx8 ring >> v3: >> calculate the gap between vm_flush and IB in cntx_cntl on an member >> field of ring structure >> v4: >> set last_vm_flush_pos even for case of no vm flush. >> >> Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd >> Signed-off-by: Monk Liu <Monk.Liu@amd.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + >> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++++++ >> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- >> 3 files changed, 20 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> index c813cbe..332969f 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >> @@ -173,6 +173,7 @@ struct amdgpu_ring { >> #if defined(CONFIG_DEBUG_FS) >> struct dentry *ent; >> #endif >> + u32 last_vm_flush_pos; >> }; >> >> int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff >> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> index d05546e..53fc5e0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >> @@ -396,6 +396,13 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job) >> amdgpu_vm_ring_has_compute_vm_bug(ring))) >> amdgpu_ring_emit_pipeline_sync(ring); >> >> + /* when no vm-flush this frame, we still need to mark down >> + * the position of the tail of hdp_flush, because we still >> + * need to make sure there are 128 DWs between last SWITCH_BUFFER and >> + * the emit_ib this frame. otherwise there is still VM fault occured on >> + * constant engine. >> + */ >> + ring->last_vm_flush_pos = ring->wptr; >> if (ring->funcs->emit_vm_flush && (job->vm_needs_flush || >> amdgpu_vm_is_gpu_reset(adev, id))) { >> struct fence *fence; >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> index 5f37313..dde4177 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >> @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, >> DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); >> amdgpu_ring_write(ring, lower_32_bits(seq)); >> amdgpu_ring_write(ring, upper_32_bits(seq)); >> - >> } >> >> static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring >> *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, >> /* sync PFP to ME, otherwise we might get invalid PFP reads */ >> amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); >> amdgpu_ring_write(ring, 0x0); >> - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ >> - amdgpu_ring_insert_nop(ring, 128); >> } >> + ring->last_vm_flush_pos = ring->wptr; >> } >> >> static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) >> @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) >> if (amdgpu_sriov_vf(ring->adev)) >> gfx_v8_0_ring_emit_de_meta_init(ring, >> (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : >> ring->adev->virt.csa_vmid0_addr); >> + >> + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of >> + * vm_flush, which may trigger VM fault. */ >> + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ >> + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - >> +ring->last_vm_flush_pos)); > This can easily result in a negative number, couldn't it? > >> + else >> + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) >> + amdgpu_ring_insert_nop(ring, >> + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + >> +ring->wptr)); > I think it would be cleaner if you calculate the number of NOPs needed first for both cases and then check if the number isn't negative for both cases. > >> } >> >> static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, >> uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { >> 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ >> 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ >> 2 + /* gfx_v8_ring_emit_sb */ >> - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ >> + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt >> +flush/meta-data */ > Please put the - on the next line. That is easy to miss and I asked myself for a moment why you want to add 20 here. > > Apart from that the patch looks good to me, Christian. > >> + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush >> +fence/cntx_cntl/vgt_flush/meta-data anymore */ >> .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ >> .emit_ib = gfx_v8_0_ring_emit_ib_gfx, >> .emit_fence = gfx_v8_0_ring_emit_fence_gfx, > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx [-- Attachment #1.2: Type: text/html, Size: 18224 bytes --] [-- Attachment #2: Type: text/plain, Size: 154 bytes --] _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) @ 2017-01-18 10:33 Monk Liu 0 siblings, 0 replies; 11+ messages in thread From: Monk Liu @ 2017-01-18 10:33 UTC (permalink / raw) To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW; +Cc: Monk Liu previously we always insert 128nops behind vm_flush, which may lead to DAMframe size above 256 dw and automatially aligned to 512 dw. now we calculate how many DWs already inserted after vm_flush and make up for the reset to pad up to 128dws before emit_ib. that way we only take 256 dw per submit. v2: drop insert_nops in vm_flush re-calculate the estimate frame size for gfx8 ring v3: calculate the gap between vm_flush and IB in cntx_cntl on an member field of ring structure Change-Id: I0a1845de07c0b86ac1b77ac1a007e893144144fd Signed-off-by: Monk Liu <Monk.Liu@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 ++++++++++++---- 2 files changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index c813cbe..332969f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -173,6 +173,7 @@ struct amdgpu_ring { #if defined(CONFIG_DEBUG_FS) struct dentry *ent; #endif + u32 last_vm_flush_pos; }; int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c index 5f37313..dde4177 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c @@ -6572,7 +6572,6 @@ static void gfx_v8_0_ring_emit_fence_gfx(struct amdgpu_ring *ring, u64 addr, DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel ? 2 : 0)); amdgpu_ring_write(ring, lower_32_bits(seq)); amdgpu_ring_write(ring, upper_32_bits(seq)); - } static void gfx_v8_0_ring_emit_pipeline_sync(struct amdgpu_ring *ring) @@ -6636,9 +6635,8 @@ static void gfx_v8_0_ring_emit_vm_flush(struct amdgpu_ring *ring, /* sync PFP to ME, otherwise we might get invalid PFP reads */ amdgpu_ring_write(ring, PACKET3(PACKET3_PFP_SYNC_ME, 0)); amdgpu_ring_write(ring, 0x0); - /* GFX8 emits 128 dw nop to prevent CE access VM before vm_flush finish */ - amdgpu_ring_insert_nop(ring, 128); } + ring->last_vm_flush_pos = ring->wptr; } static u32 gfx_v8_0_ring_get_wptr_compute(struct amdgpu_ring *ring) @@ -6743,6 +6741,15 @@ static void gfx_v8_ring_emit_cntxcntl(struct amdgpu_ring *ring, uint32_t flags) if (amdgpu_sriov_vf(ring->adev)) gfx_v8_0_ring_emit_de_meta_init(ring, (flags & AMDGPU_VM_DOMAIN) ? AMDGPU_CSA_VADDR : ring->adev->virt.csa_vmid0_addr); + + /* We need to pad some NOPs before emit_ib to prevent CE run ahead of + * vm_flush, which may trigger VM fault. */ + if (ring->wptr > ring->last_vm_flush_pos) /* no wptr wrapping to RB head */ + amdgpu_ring_insert_nop(ring, 128 - (ring->wptr - ring->last_vm_flush_pos)); + else + if (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr < 128) + amdgpu_ring_insert_nop(ring, + 128 - (ring->ptr_mask + 1 - ring->last_vm_flush_pos + ring->wptr)); } static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg) @@ -7018,7 +7025,8 @@ static const struct amdgpu_ring_funcs gfx_v8_0_ring_funcs_gfx = { 7 + /* gfx_v8_0_ring_emit_pipeline_sync */ 128 + 19 + /* gfx_v8_0_ring_emit_vm_flush */ 2 + /* gfx_v8_ring_emit_sb */ - 3 + 4 + 29, /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ + 3 + 4 + 29 - /* gfx_v8_ring_emit_cntxcntl including vgt flush/meta-data */ + 20 - 7 - 6 - 3 - 4 - 29, /* no need to count gds/hdp_flush/vm_flush fence/cntx_cntl/vgt_flush/meta-data anymore */ .emit_ib_size = 4, /* gfx_v8_0_ring_emit_ib_gfx */ .emit_ib = gfx_v8_0_ring_emit_ib_gfx, .emit_fence = gfx_v8_0_ring_emit_fence_gfx, -- 2.7.4 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply related [flat|nested] 11+ messages in thread
end of thread, other threads:[~2017-01-22 3:53 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-18 11:42 [PATCH] drm/amdgpu:guarantee 128dws between vm flush and IB(v3) Monk Liu [not found] ` <1484739739-30080-1-git-send-email-Monk.Liu-5C7GfCeVMHo@public.gmane.org> 2017-01-19 9:10 ` Christian König [not found] ` <7ff8505a-a6c2-8258-5cfa-45a5344ba2df-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-01-19 13:51 ` Grazvydas Ignotas [not found] ` <CANOLnOOwu2Fd0Svx5tcp7kqb2b28Ar30gTd-FeY4xQwm32J5YQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 2017-01-19 14:32 ` Christian König [not found] ` <fd28a4f9-80ce-49d4-d6e7-9765988e0181-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-01-19 15:16 ` Grazvydas Ignotas 2017-01-19 14:39 ` Liu, Monk 2017-01-19 14:49 ` Liu, Monk [not found] ` <BY2PR1201MB1110457767084D1B300C67D8847E0-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-01-20 8:48 ` Christian König [not found] ` <72ce673f-a232-290d-62c0-082e6ede3bbd-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org> 2017-01-22 2:47 ` 答复: " Liu, Monk [not found] ` <BY2PR1201MB11108334E6D9C3AAE1DF30D384730-O28G1zQ8oGliQkyLPkmea2rFom/aUZj6nBOFsp37pqbUKgpGm//BTAC/G2K4zDHf@public.gmane.org> 2017-01-22 3:53 ` Liu, Monk -- strict thread matches above, loose matches on Subject: below -- 2017-01-18 10:33 Monk Liu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.