dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Rob Clark <robdclark@gmail.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	"Sharma, Shashank" <shashank.sharma@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Amaranath Somalapuram" <amaranath.somalapuram@amd.com>,
	"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"Alexandar Deucher" <alexander.deucher@amd.com>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Shashank Sharma" <contactshashanksharma@gmail.com>,
	"Christian Koenig" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event
Date: Thu, 17 Mar 2022 18:23:42 +0100	[thread overview]
Message-ID: <YjNunvEn0EGjQY1W@phenom.ffwll.local> (raw)
In-Reply-To: <CAF6AEGtUasyC1e0Fz2cFhSMEtUJCJTsFQs7+4mg_FP45LwX=4A@mail.gmail.com>

On Thu, Mar 17, 2022 at 08:34:21AM -0700, Rob Clark wrote:
> On Thu, Mar 17, 2022 at 2:29 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Mar 17, 2022 at 08:03:27AM +0100, Christian König wrote:
> > > Am 16.03.22 um 16:36 schrieb Rob Clark:
> > > > [SNIP]
> > > > just one point of clarification.. in the msm and i915 case it is
> > > > purely for debugging and telemetry (ie. sending crash logs back to
> > > > distro for analysis if user has crash reporting enabled).. it isn't
> > > > used for triggering any action like killing app or compositor.
> > >
> > > By the way, how does msm it's memory management for the devcoredumps?
> >
> > GFP_NORECLAIM all the way. It's purely best effort.
> 
> We do one GEM obj allocation in the snapshot path (the hw has a
> mechanism to snapshot it's own state into a gpu buffer.. not sure if
> nice debugging functionality like that is a commentary on the blob
> driver quality, but I'm not complaining)
> 
> I suppose we could pre-allocate this buffer up-front.. but it doesn't
> seem like a problem, ie. if allocation fails we just skip snapshotting
> stuff that needs the hw crashdumper.  I guess since vram is not
> involved, perhaps that makes the situation a bit more straightforward.

The problem is that you need to allocate with GFP_ATOMIC, instead of
GFP_KERNEL, or things go very bad.

The scheduler dma-fence annotations I've had (well still have them here)
would catch this stuff, but thus far they got nowhere.

> > Note that the fancy new plan for i915 discrete gpu is to only support gpu
> > crash dumps on non-recoverable gpu contexts, i.e. those that do not
> > continue to the next batch when something bad happens. This is what vk
> > wants and also what iris now uses (we do context recovery in userspace in
> > all cases), and non-recoverable contexts greatly simplify the crash dump
> > gather: Only thing you need to gather is the register state from hw
> > (before you reset it), all the batchbuffer bo and indirect state bo (in
> > i915 you can mark which bo to capture in the CS ioctl) can be captured in
> > a worker later on. Which for non-recoverable context is no issue, since
> > subsequent batchbuffers won't trample over any of these things.
> >
> > And that way you can record the crashdump (or at least the big pieces like
> > all the indirect state stuff) with GFP_KERNEL.
> >
> > msm probably gets it wrong since embedded drivers have much less shrinker
> > and generally no mmu notifiers going on :-)
> 
> Note that the bo's associated with the batch are still pinned at this
> point, from the bo lifecycle the batch is still active.  So from the
> point of view of shrinker, there should be no interaction.  We aren't
> doing anything with mmu notifiers (yet), so not entirely sure offhand
> the concern there.
> 
> Currently we just use GFP_KERNEL and bail if allocation fails.

Yeah you have a simple enough shrinker for this not to be a problem. The
issue is that sooner or later things tend to not stay like that, and we're
trying to have common rules for dma_fence to make sure everyone follows
the same rules.
-Daniel

> 
> BR,
> -R
> 
> > > I mean it is strictly forbidden to allocate any memory in the GPU reset
> > > path.
> > >
> > > > I would however *strongly* recommend devcoredump support in other GPU
> > > > drivers (i915's thing pre-dates devcoredump by a lot).. I've used it
> > > > to debug and fix a couple obscure issues that I was not able to
> > > > reproduce by myself.
> > >
> > > Yes, completely agree as well.
> >
> > +1
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

  reply	other threads:[~2022-03-17 17:23 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-09  7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser
2022-03-09 11:18   ` Sharma, Shashank
2022-03-09  8:09 ` Christian König
2022-03-09  9:56 ` Pierre-Eric Pelloux-Prayer
2022-03-09 10:10   ` Simon Ser
2022-03-09 10:24     ` Christian König
2022-03-09 10:28       ` Simon Ser
2022-03-09 10:28       ` Pierre-Eric Pelloux-Prayer
2022-03-09 18:12 ` Rob Clark
2022-03-10  9:55   ` Christian König
2022-03-10 15:24     ` Rob Clark
2022-03-10 16:21       ` Sharma, Shashank
2022-03-10 16:27         ` Andrey Grodzovsky
2022-03-10 17:16           ` Rob Clark
2022-03-10 17:10         ` Rob Clark
2022-03-10 17:19           ` Sharma, Shashank
2022-03-10 17:40             ` Rob Clark
2022-03-10 18:33               ` Abhinav Kumar
2022-03-10 19:14                 ` Sharma, Shashank
2022-03-10 19:35                   ` Rob Clark
2022-03-10 19:44                     ` Sharma, Shashank
2022-03-10 19:56                       ` Rob Clark
2022-03-10 20:17                         ` Sharma, Shashank
2022-03-11  8:30                         ` Pekka Paalanen
2022-03-14 14:23                           ` Alex Deucher
2022-03-14 15:26                             ` Pekka Paalanen
2022-03-15 14:54                               ` Alex Deucher
2022-03-16  8:48                                 ` Pekka Paalanen
2022-03-16 14:12                                   ` Alex Deucher
2022-03-16 15:36                                     ` Rob Clark
2022-03-16 15:48                                       ` Alex Deucher
2022-03-16 16:30                                         ` Rob Clark
2022-03-17  7:03                                       ` Christian König
2022-03-17  9:29                                         ` Daniel Vetter
2022-03-17  9:46                                           ` Christian König
2022-03-17 15:34                                           ` Rob Clark
2022-03-17 17:23                                             ` Daniel Vetter [this message]
2022-03-17 15:40                                           ` Rob Clark
2022-03-17 17:26                                             ` Daniel Vetter
2022-03-17 17:31                                               ` Rob Clark
2022-03-18  7:42                                                 ` Christian König
2022-03-18 15:12                                                   ` Rob Clark
2022-03-21  9:30                                                     ` Christian König
2022-03-21 16:03                                                       ` Rob Clark
2022-03-23 14:07                                                         ` Daniel Stone
2022-03-23 15:14                                                           ` Daniel Vetter
2022-03-23 15:25                                                             ` Christian König
2022-03-26  0:53                                                               ` Olsak, Marek
2022-03-29 12:14                                                                 ` Christian König
2022-03-29 16:25                                                                   ` Marek Olšák
2022-03-30  9:49                                                                     ` Daniel Vetter
2022-03-23 17:30                                                             ` Rob Clark
2022-03-21 14:15                                                     ` Daniel Vetter
2022-03-15  7:13                             ` Dave Airlie
2022-03-15  7:25                               ` Simon Ser
2022-03-15  7:25                               ` Christian König
2022-03-17  9:25                             ` Daniel Vetter
2022-03-16 21:50 ` Rob Clark
2022-03-17  8:42   ` Sharma, Shashank
2022-03-17  9:21     ` Christian König
2022-03-17 10:31       ` Daniel Stone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YjNunvEn0EGjQY1W@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=contactshashanksharma@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@chromium.org \
    --cc=robdclark@gmail.com \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).