From: Daniel Vetter <daniel@ffwll.ch>
To: Rob Clark <robdclark@gmail.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
"Sharma, Shashank" <shashank.sharma@amd.com>,
"Christian König" <ckoenig.leichtzumerken@gmail.com>,
"Amaranath Somalapuram" <amaranath.somalapuram@amd.com>,
"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
dri-devel <dri-devel@lists.freedesktop.org>,
"Alexandar Deucher" <alexander.deucher@amd.com>,
"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
"Shashank Sharma" <contactshashanksharma@gmail.com>,
"Christian Koenig" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event
Date: Thu, 17 Mar 2022 18:23:42 +0100 [thread overview]
Message-ID: <YjNunvEn0EGjQY1W@phenom.ffwll.local> (raw)
In-Reply-To: <CAF6AEGtUasyC1e0Fz2cFhSMEtUJCJTsFQs7+4mg_FP45LwX=4A@mail.gmail.com>
On Thu, Mar 17, 2022 at 08:34:21AM -0700, Rob Clark wrote:
> On Thu, Mar 17, 2022 at 2:29 AM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Mar 17, 2022 at 08:03:27AM +0100, Christian König wrote:
> > > Am 16.03.22 um 16:36 schrieb Rob Clark:
> > > > [SNIP]
> > > > just one point of clarification.. in the msm and i915 case it is
> > > > purely for debugging and telemetry (ie. sending crash logs back to
> > > > distro for analysis if user has crash reporting enabled).. it isn't
> > > > used for triggering any action like killing app or compositor.
> > >
> > > By the way, how does msm it's memory management for the devcoredumps?
> >
> > GFP_NORECLAIM all the way. It's purely best effort.
>
> We do one GEM obj allocation in the snapshot path (the hw has a
> mechanism to snapshot it's own state into a gpu buffer.. not sure if
> nice debugging functionality like that is a commentary on the blob
> driver quality, but I'm not complaining)
>
> I suppose we could pre-allocate this buffer up-front.. but it doesn't
> seem like a problem, ie. if allocation fails we just skip snapshotting
> stuff that needs the hw crashdumper. I guess since vram is not
> involved, perhaps that makes the situation a bit more straightforward.
The problem is that you need to allocate with GFP_ATOMIC, instead of
GFP_KERNEL, or things go very bad.
The scheduler dma-fence annotations I've had (well still have them here)
would catch this stuff, but thus far they got nowhere.
> > Note that the fancy new plan for i915 discrete gpu is to only support gpu
> > crash dumps on non-recoverable gpu contexts, i.e. those that do not
> > continue to the next batch when something bad happens. This is what vk
> > wants and also what iris now uses (we do context recovery in userspace in
> > all cases), and non-recoverable contexts greatly simplify the crash dump
> > gather: Only thing you need to gather is the register state from hw
> > (before you reset it), all the batchbuffer bo and indirect state bo (in
> > i915 you can mark which bo to capture in the CS ioctl) can be captured in
> > a worker later on. Which for non-recoverable context is no issue, since
> > subsequent batchbuffers won't trample over any of these things.
> >
> > And that way you can record the crashdump (or at least the big pieces like
> > all the indirect state stuff) with GFP_KERNEL.
> >
> > msm probably gets it wrong since embedded drivers have much less shrinker
> > and generally no mmu notifiers going on :-)
>
> Note that the bo's associated with the batch are still pinned at this
> point, from the bo lifecycle the batch is still active. So from the
> point of view of shrinker, there should be no interaction. We aren't
> doing anything with mmu notifiers (yet), so not entirely sure offhand
> the concern there.
>
> Currently we just use GFP_KERNEL and bail if allocation fails.
Yeah you have a simple enough shrinker for this not to be a problem. The
issue is that sooner or later things tend to not stay like that, and we're
trying to have common rules for dma_fence to make sure everyone follows
the same rules.
-Daniel
>
> BR,
> -R
>
> > > I mean it is strictly forbidden to allocate any memory in the GPU reset
> > > path.
> > >
> > > > I would however *strongly* recommend devcoredump support in other GPU
> > > > drivers (i915's thing pre-dates devcoredump by a lot).. I've used it
> > > > to debug and fix a couple obscure issues that I was not able to
> > > > reproduce by myself.
> > >
> > > Yes, completely agree as well.
> >
> > +1
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
next prev parent reply other threads:[~2022-03-17 17:23 UTC|newest]
Thread overview: 63+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-09 7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser
2022-03-09 11:18 ` Sharma, Shashank
2022-03-09 8:09 ` Christian König
2022-03-09 9:56 ` Pierre-Eric Pelloux-Prayer
2022-03-09 10:10 ` Simon Ser
2022-03-09 10:24 ` Christian König
2022-03-09 10:28 ` Simon Ser
2022-03-09 10:28 ` Pierre-Eric Pelloux-Prayer
2022-03-09 18:12 ` Rob Clark
2022-03-10 9:55 ` Christian König
2022-03-10 15:24 ` Rob Clark
2022-03-10 16:21 ` Sharma, Shashank
2022-03-10 16:27 ` Andrey Grodzovsky
2022-03-10 17:16 ` Rob Clark
2022-03-10 17:10 ` Rob Clark
2022-03-10 17:19 ` Sharma, Shashank
2022-03-10 17:40 ` Rob Clark
2022-03-10 18:33 ` Abhinav Kumar
2022-03-10 19:14 ` Sharma, Shashank
2022-03-10 19:35 ` Rob Clark
2022-03-10 19:44 ` Sharma, Shashank
2022-03-10 19:56 ` Rob Clark
2022-03-10 20:17 ` Sharma, Shashank
2022-03-11 8:30 ` Pekka Paalanen
2022-03-14 14:23 ` Alex Deucher
2022-03-14 15:26 ` Pekka Paalanen
2022-03-15 14:54 ` Alex Deucher
2022-03-16 8:48 ` Pekka Paalanen
2022-03-16 14:12 ` Alex Deucher
2022-03-16 15:36 ` Rob Clark
2022-03-16 15:48 ` Alex Deucher
2022-03-16 16:30 ` Rob Clark
2022-03-17 7:03 ` Christian König
2022-03-17 9:29 ` Daniel Vetter
2022-03-17 9:46 ` Christian König
2022-03-17 15:34 ` Rob Clark
2022-03-17 17:23 ` Daniel Vetter [this message]
2022-03-17 15:40 ` Rob Clark
2022-03-17 17:26 ` Daniel Vetter
2022-03-17 17:31 ` Rob Clark
2022-03-18 7:42 ` Christian König
2022-03-18 15:12 ` Rob Clark
2022-03-21 9:30 ` Christian König
2022-03-21 16:03 ` Rob Clark
2022-03-23 14:07 ` Daniel Stone
2022-03-23 15:14 ` Daniel Vetter
2022-03-23 15:25 ` Christian König
2022-03-26 0:53 ` Olsak, Marek
2022-03-29 12:14 ` Christian König
2022-03-29 16:25 ` Marek Olšák
2022-03-30 9:49 ` Daniel Vetter
2022-03-23 17:30 ` Rob Clark
2022-03-21 14:15 ` Daniel Vetter
2022-03-15 7:13 ` Dave Airlie
2022-03-15 7:25 ` Simon Ser
2022-03-15 7:25 ` Christian König
2022-03-17 9:25 ` Daniel Vetter
2022-03-16 21:50 ` Rob Clark
2022-03-17 8:42 ` Sharma, Shashank
2022-03-17 9:21 ` Christian König
2022-03-17 10:31 ` Daniel Stone
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YjNunvEn0EGjQY1W@phenom.ffwll.local \
--to=daniel@ffwll.ch \
--cc=alexander.deucher@amd.com \
--cc=amaranath.somalapuram@amd.com \
--cc=amd-gfx@lists.freedesktop.org \
--cc=christian.koenig@amd.com \
--cc=ckoenig.leichtzumerken@gmail.com \
--cc=contactshashanksharma@gmail.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=quic_abhinavk@quicinc.com \
--cc=robdclark@chromium.org \
--cc=robdclark@gmail.com \
--cc=shashank.sharma@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).