Re: [PATCH 1/1] drm/amdgpu: Fix per-IB secure flag GFX hang

From: Alex Deucher <alexdeucher@gmail.com>
To: Luben Tuikov <luben.tuikov@amd.com>
Cc: "Pelloux-prayer,
	Pierre-eric" <Pierre-eric.Pelloux-prayer@amd.com>,
	"Liu, Aaron" <Aaron.Liu@amd.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	Huang Rui <ray.huang@amd.com>,
	"Deucher, Alexander" <Alexander.Deucher@amd.com>,
	"Koenig, Christian" <Christian.Koenig@amd.com>
Subject: Re: [PATCH 1/1] drm/amdgpu: Fix per-IB secure flag GFX hang
Date: Thu, 27 Feb 2020 16:30:09 -0500	[thread overview]
Message-ID: <CADnq5_MEiEkXcif17pTbRys+O6RPrXf=ksTy9SdH3gFwULyrmQ@mail.gmail.com> (raw)
In-Reply-To: <bb5c3ef4-81ce-d129-bbed-99133c8c602e@amd.com>

On Thu, Feb 27, 2020 at 4:15 PM Luben Tuikov <luben.tuikov@amd.com> wrote:
>
> On 2020-02-27 6:56 a.m., Huang Rui wrote:
> > On Thu, Feb 27, 2020 at 06:39:03AM +0800, Tuikov, Luben wrote:
> >> Since commit "Move to a per-IB secure flag (TMZ)",
> >> we've been seeing hangs in GFX. Ray H. pointed out
> >> by sending a patch that we need to send FRAME
> >> CONTROL stop/start back-to-back, every time we
> >> flip the TMZ flag as per each IB we submit. That
> >> is, when we transition from TMZ to non-TMZ we have
> >> to send a stop with TMZ followed by a start with
> >> non-TMZ, and similarly for transitioning from
> >> non-TMZ into TMZ.
> >>
> >> This patch implements this, thus fixing the GFX
> >> hang.
> >>
> >> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   | 87 +++++++++++++++++-------
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  5 +-
> >>  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c   | 15 ++--
> >>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    | 13 ++--
> >>  4 files changed, 79 insertions(+), 41 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> index 4b2342d11520..16d6df3304d3 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> >> @@ -216,40 +216,75 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
> >>              amdgpu_ring_emit_cntxcntl(ring, status);
> >>      }
> >>
> >> -    secure = false;
> >> +    /* Find the first non-preamble IB.
> >> +     */
> >>      for (i = 0; i < num_ibs; ++i) {
> >>              ib = &ibs[i];
> >>
> >>              /* drop preamble IBs if we don't have a context switch */
> >> -            if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> -                skip_preamble &&
> >> -                !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> -                !amdgpu_mcbp &&
> >> -                !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> -                    continue;
> >> -
> >> -            /* If this IB is TMZ, add frame TMZ start packet,
> >> -             * else, turn off TMZ.
> >> -             */
> >> -            if (ib->flags & AMDGPU_IB_FLAGS_SECURE && ring->funcs->emit_tmz) {
> >> -                    if (!secure) {
> >> -                            secure = true;
> >> -                            amdgpu_ring_emit_tmz(ring, true);
> >> -                    }
> >> -            } else if (secure) {
> >> +            if (!(ib->flags & AMDGPU_IB_FLAG_PREAMBLE) ||
> >> +                !skip_preamble ||
> >> +                (status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) ||
> >> +                amdgpu_mcbp ||
> >> +                amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> +                    break;
> >> +    }
> >> +    if (i >= num_ibs)
> >> +            goto Done;
> >> +    /* Setup initial TMZiness and send it off.
> >> +     */
> >> +    secure = false;
> >> +    if (job && ring->funcs->emit_frame_cntl) {
> >> +            if (ib->flags & AMDGPU_IB_FLAGS_SECURE)
> >> +                    secure = true;
> >> +            else
> >>                      secure = false;
> >> -                    amdgpu_ring_emit_tmz(ring, false);
> >> -            }
> >> -
> >> -            amdgpu_ring_emit_ib(ring, job, ib, status);
> >> -            status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> +            amdgpu_ring_emit_frame_cntl(ring, true, secure);
> >>      }
> >> +    amdgpu_ring_emit_ib(ring, job, ib, status);
> >> +    status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> +    i += 1;
> >> +    /* Send the rest of the IBs.
> >> +     */
> >> +    if (job && ring->funcs->emit_frame_cntl) {
> >> +            for ( ; i < num_ibs; ++i) {
> >> +                    ib = &ibs[i];
> >> +
> >> +                    /* drop preamble IBs if we don't have a context switch */
> >> +                    if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> +                        skip_preamble &&
> >> +                        !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> +                        !amdgpu_mcbp &&
> >> +                        !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> +                            continue;
> >
> > Snip.
> >
> >> +
> >> +                    if (!!secure ^ !!(ib->flags & AMDGPU_IB_FLAGS_SECURE)) {
> >> +                            amdgpu_ring_emit_frame_cntl(ring, false, secure);
> >> +                            secure = !secure;
> >> +                            amdgpu_ring_emit_frame_cntl(ring, true, secure);
> >> +                    }
> >
> > That's pretty good optimization! I spend quit a few time to understand this.
>
> I know. I know you did. It's called experience.
>
> When I saw you v1, it was a cringe. Seriously?

It may be a good optimization but if it's hard to understand it makes
the code harder to read and comprehend, that can lead to
misinterpretation and maintenance burdens.  It took me a while to
understand what was intended here.

Alex

>
> >
> >>
> >> -    if (secure) {
> >> -            secure = false;
> >> -            amdgpu_ring_emit_tmz(ring, false);
> >> +                    amdgpu_ring_emit_ib(ring, job, ib, status);
> >> +                    status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >> +            }
> >> +            amdgpu_ring_emit_frame_cntl(ring, false, secure);
> >> +    } else {
> >> +            for ( ; i < num_ibs; ++i) {
> >> +                    ib = &ibs[i];
> >> +
> >> +                    /* drop preamble IBs if we don't have a context switch */
> >> +                    if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >> +                        skip_preamble &&
> >> +                        !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >> +                        !amdgpu_mcbp &&
> >> +                        !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >> +                            continue;
> >> +
> >> +                    amdgpu_ring_emit_ib(ring, job, ib, status);
> >> +                    status &= ~AMDGPU_HAVE_CTX_SWITCH;
> >
> > To pull the checking "job && ring->funcs->emit_frame_cntl" out of the loop,
> > that make the implemenation more duplicated like below, we have to write
>
> Yes, that is exactly what we want.
>
> As I explained before and will explain again, and again, and again,
> "job && ring->funcs->emit_frame_cntl" is static to the body of the loop,
> and as such can be pulled OUT of the loop and it should.
>
> This is a *formulaic* optimization exercise. Compilers and optimizers do it first.
>
> To extrapolate, consider that what you'd eventually have is a HUGE long long loop
> and everything inside it. Not good.
>
> Regards,
> Luben
>
>
> > the same codes multiple times. I hope the implementation is more simple and
> > readable. Please see my V2 patch that I just send out. We can try to use
> > minimum changes to fix the issue.
> >
> >               for ( ; i < num_ibs; ++i) {
> >                       ib = &ibs[i];
> >
> >                       /* drop preamble IBs if we don't have a context switch */
> >                       if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
> >                           skip_preamble &&
> >                           !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
> >                           !amdgpu_mcbp &&
> >                           !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, Preamble CE ib must be inserted anyway */
> >                               continue;
> >
> >                         ...
> >                       amdgpu_ring_emit_ib(ring, job, ib, status);
> >
> > Thanks,
> > Ray
> >
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx