All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Deucher <alexdeucher@gmail.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Zhang, Jack \(Jian\)" <Jack.Zhang1@amd.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Maling list - DRI developers <dri-devel@lists.freedesktop.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Alex Deucher <alexander.deucher@amd.com>
Subject: Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13
Date: Thu, 8 Apr 2021 09:03:40 -0400	[thread overview]
Message-ID: <CADnq5_OLhO_En84yEeRsBDtMhJ4OY+7XJtgrjqUDrs-8_x7x0g@mail.gmail.com> (raw)
In-Reply-To: <18a67a9f-4199-ba39-d2a7-419d7993aac4@gmail.com>

On Thu, Apr 8, 2021 at 6:28 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 08.04.21 um 09:13 schrieb Christian König:
> > Am 07.04.21 um 21:04 schrieb Alex Deucher:
> >> On Wed, Apr 7, 2021 at 3:23 AM Dave Airlie <airlied@gmail.com> wrote:
> >>> On Wed, 7 Apr 2021 at 06:54, Alex Deucher <alexdeucher@gmail.com>
> >>> wrote:
> >>>> On Fri, Apr 2, 2021 at 12:22 PM Christian König
> >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>> Hey Alex,
> >>>>>
> >>>>> the TTM and scheduler changes should already be in the drm-misc-next
> >>>>> branch (not 100% sure about the TTM patch, need to double check
> >>>>> next week).
> >>>>>
> >>>> The TTM change is not in drm-misc yet.
> >>>>
> >>>>> Could that cause problems when both are merged into drm-next?
> >>>> Dave, Daniel, how do you want to handle this?  The duplicated patch
> >>>> is this one:
> >>>> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac4eb83ab255de9c31184df51fd1534ba36fd212
> >>>>
> >>>> amdgpu has changes which depend on it.  The same patch is included
> >>>> in this PR.
> >>> Ouch not sure how best to sync up here, maybe get misc-next into my
> >>> tree then rebase your tree on top of it?
> >> I can do that.
> >
> > Please let me double check later today that we have everything we need
> > in drm-misc-next.
>
> There where two patch for TTM (one from Felix and one from Oak) which
> still needed to be pushed to drm-misc-next. I've done that just a minute
> ago.
>

They were included in this PR.

>
> Then we have this patch which fixes a bug in code removed on
> drm-misc-next. I think it should be dropped when amd-staging-drm-next is
> based on drm-next/drm-misc-next.
>
> Author: xinhui pan <xinhui.pan@amd.com>
> Date:   Wed Feb 24 11:28:08 2021 +0800
>
>      drm/ttm: Do not add non-system domain BO into swap list
>

Ok.

>
> I've also found the following patch which is problematic as well:
>
> commit c8a921d49443025e10794342d4433b3f29616409
> Author: Jack Zhang <Jack.Zhang1@amd.com>
> Date:   Mon Mar 8 12:41:27 2021 +0800
>
>      drm/amd/amdgpu implement tdr advanced mode
>
>      [Why]
>      Previous tdr design treats the first job in job_timeout as the bad job.
>      But sometimes a later bad compute job can block a good gfx job and
>      cause an unexpected gfx job timeout because gfx and compute ring share
>      internal GC HW mutually.
>
>      [How]
>      This patch implements an advanced tdr mode.It involves an additinal
>      synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
>      step in order to find the real bad job.
>
>      1. At Step0 Resubmit stage, it synchronously submits and pends for the
>      first job being signaled. If it gets timeout, we identify it as guilty
>      and do hw reset. After that, we would do the normal resubmit step to
>      resubmit left jobs.
>
>      2. For whole gpu reset(vram lost), do resubmit as the old way.
>
>      Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
>      Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
> That one is modifying both amdgpu as well as the scheduler code. IIRC I
> actually requested that the patch is split into two, but that was
> somehow not done.
>
> How should we proceed here? Should I separate the patch, push the
> changes to drm-misc-next and then we merge with drm-next and rebase
> amd-staging-drm-next on top of that?
>
> That's most likely the cleanest option approach as far as I can see.

That's fine with me.  We could have included them in my PR.  Now we
have wait for drm-misc-next to be merged again before we can merge the
amdgpu code.  Is anyone planning to do another drm-misc merge at this
point?

Alex

>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
> >>
> >> Alex
> >>
> >>
> >>> Dave.
> >
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

WARNING: multiple messages have this Message-ID (diff)
From: Alex Deucher <alexdeucher@gmail.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>
Cc: "Zhang, Jack \(Jian\)" <Jack.Zhang1@amd.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>,
	Maling list - DRI developers <dri-devel@lists.freedesktop.org>,
	amd-gfx list <amd-gfx@lists.freedesktop.org>,
	Alex Deucher <alexander.deucher@amd.com>,
	Dave Airlie <airlied@gmail.com>
Subject: Re: [pull] amdgpu, radeon, ttm, sched drm-next-5.13
Date: Thu, 8 Apr 2021 09:03:40 -0400	[thread overview]
Message-ID: <CADnq5_OLhO_En84yEeRsBDtMhJ4OY+7XJtgrjqUDrs-8_x7x0g@mail.gmail.com> (raw)
In-Reply-To: <18a67a9f-4199-ba39-d2a7-419d7993aac4@gmail.com>

On Thu, Apr 8, 2021 at 6:28 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 08.04.21 um 09:13 schrieb Christian König:
> > Am 07.04.21 um 21:04 schrieb Alex Deucher:
> >> On Wed, Apr 7, 2021 at 3:23 AM Dave Airlie <airlied@gmail.com> wrote:
> >>> On Wed, 7 Apr 2021 at 06:54, Alex Deucher <alexdeucher@gmail.com>
> >>> wrote:
> >>>> On Fri, Apr 2, 2021 at 12:22 PM Christian König
> >>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>> Hey Alex,
> >>>>>
> >>>>> the TTM and scheduler changes should already be in the drm-misc-next
> >>>>> branch (not 100% sure about the TTM patch, need to double check
> >>>>> next week).
> >>>>>
> >>>> The TTM change is not in drm-misc yet.
> >>>>
> >>>>> Could that cause problems when both are merged into drm-next?
> >>>> Dave, Daniel, how do you want to handle this?  The duplicated patch
> >>>> is this one:
> >>>> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ac4eb83ab255de9c31184df51fd1534ba36fd212
> >>>>
> >>>> amdgpu has changes which depend on it.  The same patch is included
> >>>> in this PR.
> >>> Ouch not sure how best to sync up here, maybe get misc-next into my
> >>> tree then rebase your tree on top of it?
> >> I can do that.
> >
> > Please let me double check later today that we have everything we need
> > in drm-misc-next.
>
> There where two patch for TTM (one from Felix and one from Oak) which
> still needed to be pushed to drm-misc-next. I've done that just a minute
> ago.
>

They were included in this PR.

>
> Then we have this patch which fixes a bug in code removed on
> drm-misc-next. I think it should be dropped when amd-staging-drm-next is
> based on drm-next/drm-misc-next.
>
> Author: xinhui pan <xinhui.pan@amd.com>
> Date:   Wed Feb 24 11:28:08 2021 +0800
>
>      drm/ttm: Do not add non-system domain BO into swap list
>

Ok.

>
> I've also found the following patch which is problematic as well:
>
> commit c8a921d49443025e10794342d4433b3f29616409
> Author: Jack Zhang <Jack.Zhang1@amd.com>
> Date:   Mon Mar 8 12:41:27 2021 +0800
>
>      drm/amd/amdgpu implement tdr advanced mode
>
>      [Why]
>      Previous tdr design treats the first job in job_timeout as the bad job.
>      But sometimes a later bad compute job can block a good gfx job and
>      cause an unexpected gfx job timeout because gfx and compute ring share
>      internal GC HW mutually.
>
>      [How]
>      This patch implements an advanced tdr mode.It involves an additinal
>      synchronous pre-resubmit step(Step0 Resubmit) before normal resubmit
>      step in order to find the real bad job.
>
>      1. At Step0 Resubmit stage, it synchronously submits and pends for the
>      first job being signaled. If it gets timeout, we identify it as guilty
>      and do hw reset. After that, we would do the normal resubmit step to
>      resubmit left jobs.
>
>      2. For whole gpu reset(vram lost), do resubmit as the old way.
>
>      Signed-off-by: Jack Zhang <Jack.Zhang1@amd.com>
>      Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
>
> That one is modifying both amdgpu as well as the scheduler code. IIRC I
> actually requested that the patch is split into two, but that was
> somehow not done.
>
> How should we proceed here? Should I separate the patch, push the
> changes to drm-misc-next and then we merge with drm-next and rebase
> amd-staging-drm-next on top of that?
>
> That's most likely the cleanest option approach as far as I can see.

That's fine with me.  We could have included them in my PR.  Now we
have wait for drm-misc-next to be merged again before we can merge the
amdgpu code.  Is anyone planning to do another drm-misc merge at this
point?

Alex

>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Christian.
> >
> >>
> >> Alex
> >>
> >>
> >>> Dave.
> >
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

  reply	other threads:[~2021-04-08 13:03 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-01 22:29 [pull] amdgpu, radeon, ttm, sched drm-next-5.13 Alex Deucher
2021-04-01 22:29 ` Alex Deucher
2021-04-02 16:22 ` Christian König
2021-04-02 16:22   ` Christian König
2021-04-06 20:54   ` Alex Deucher
2021-04-06 20:54     ` Alex Deucher
2021-04-07  7:23     ` Dave Airlie
2021-04-07  7:23       ` Dave Airlie
2021-04-07 19:04       ` Alex Deucher
2021-04-07 19:04         ` Alex Deucher
2021-04-08  7:13         ` Christian König
2021-04-08  7:13           ` Christian König
2021-04-08 10:28           ` Christian König
2021-04-08 10:28             ` Christian König
2021-04-08 13:03             ` Alex Deucher [this message]
2021-04-08 13:03               ` Alex Deucher
2021-04-09  9:07               ` Christian König
2021-04-09  9:07                 ` Christian König
2021-04-09 19:50                 ` Dave Airlie
2021-04-09 19:50                   ` Dave Airlie
2021-04-06 15:42 ` Felix Kuehling
2021-04-06 15:42   ` Felix Kuehling
2021-04-06 15:48   ` Alex Deucher
2021-04-06 15:48     ` Alex Deucher
2021-04-07  6:56   ` Christian König
2021-04-07  6:56     ` Christian König
2021-04-07  8:50     ` Chen, Guchun
2021-04-07  8:50       ` Chen, Guchun
2021-04-07 10:37       ` Christian König
2021-04-07 10:37         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADnq5_OLhO_En84yEeRsBDtMhJ4OY+7XJtgrjqUDrs-8_x7x0g@mail.gmail.com \
    --to=alexdeucher@gmail.com \
    --cc=Jack.Zhang1@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel.vetter@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.