dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: Alex Deucher <alexdeucher@gmail.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	"Sharma, Shashank" <shashank.sharma@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Amaranath Somalapuram" <amaranath.somalapuram@amd.com>,
	"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Alexandar Deucher" <alexander.deucher@amd.com>,
	"Shashank Sharma" <contactshashanksharma@gmail.com>,
	"Christian Koenig" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event
Date: Wed, 16 Mar 2022 09:30:35 -0700	[thread overview]
Message-ID: <CAF6AEGvbuPd6YtX3kRofOoEpLj9KG1mrqyQwaGpPmQ=X-VM8CA@mail.gmail.com> (raw)
In-Reply-To: <CADnq5_O2CVSLxShPFCzMxiXmnHXYbW1vFLJ5K=oigtXshULvqw@mail.gmail.com>

On Wed, Mar 16, 2022 at 8:48 AM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Wed, Mar 16, 2022 at 11:35 AM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Wed, Mar 16, 2022 at 7:12 AM Alex Deucher <alexdeucher@gmail.com> wrote:
> > >
> > > On Wed, Mar 16, 2022 at 4:48 AM Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >
> > [snip]
> > > > With new UAPI comes the demand of userspace proof, not hand-waving. You
> > > > would not be proposing this new interface if you didn't have use cases
> > > > in mind, even just one. You have to document what you imagine the new
> > > > thing to be used for, so that the appropriateness can be evaluated. If
> > > > the use case is deemed inappropriate for the proposed UAPI, you need to
> > > > find another use case to justify adding the new UAPI. If there is no
> > > > use for the UAPI, it shouldn't be added, right? Adding UAPI and hoping
> > > > someone finds use for it seems backwards to me.
> > >
> > > We do have a use case.  It's what I described originally.  There is a
> > > user space daemon (could be a compositor, could be something else)
> > > that runs and listens for GPU reset notifications.  When it receives a
> > > notification, it takes action and kills the guilty app and restarts
> > > the compositer and gathers any relevant data related to the GPU hang
> > > (if possible).  We can revisit this discussion once we have the whole
> > > implementation complete.  Other drivers seem to do similar things
> > > already today via different means (msm using devcoredump, i915 seems
> > > to have its own GPU reset notification mechanism, etc.).  It just
> > > seemed like there was value in having a generic drm GPU reset
> > > notification, but maybe not yet.
> >
> > just one point of clarification.. in the msm and i915 case it is
> > purely for debugging and telemetry (ie. sending crash logs back to
> > distro for analysis if user has crash reporting enabled).. it isn't
> > used for triggering any action like killing app or compositor.
>
> Sure.  you could use this proposed event for telemetry gathering as
> well.  The important part is the event.  What you do with it is up to
> the user.

True, but for that purpose (telemetry) let's leverage something that
userspace already has support for

> > I would however *strongly* recommend devcoredump support in other GPU
> > drivers (i915's thing pre-dates devcoredump by a lot).. I've used it
> > to debug and fix a couple obscure issues that I was not able to
> > reproduce by myself.
>
> Agreed.  I think devcoredump makes sense and we'll ultimately enable
> support for that in amdgpu.  I think there is value in a GPU specific
> reset event as well for the use case I outlined, but maybe the
> devcoredump one is enough.  We'd just need to rely on the userspace
> side to filter as appropriate for events it cares about.  I'm not sure
> how many other devcore dump events are commonly seen.

They are, like CPU coredumps, not expected to be a frequent thing.
There are a number of other non-GPU drivers which also use devcoredump
(mostly wifi and remoteproc, ie. things like DSPs).  We did also
recently add msm devcoredump support for the display side of things,
in addition to gpu, to dump display state if we hit underruns, link
training fail, etc.  Each "crash" creates a new transient devcore
device so crashes from different devices can be reported in parallel.
But we do rate-limit the GPU crash reports by not reporting another
one until the previous one is read out and cleared by userspace.  (You
don't strictly need to do that, but it is probably a good idea..
usually the first crash is the interesting one.)

The ChromeOS crash-reporter does have an allow-list for which devcore
dumps are sent back, mainly because in the case of wifi drivers it is
hard to ensure that the dump does not include PII (like network names,
etc).  It would be pretty trivial to add amdgpu to the allow-list and
setup a "magic signature" so that queries could be run on amdgpu
devcore dumps.  I can help out with that, since I went through the
same process already for msm.

I'm not entirely sure what other distros do in this area.

BR,
-R

  reply	other threads:[~2022-03-16 16:29 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-09  7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser
2022-03-09 11:18   ` Sharma, Shashank
2022-03-09  8:09 ` Christian König
2022-03-09  9:56 ` Pierre-Eric Pelloux-Prayer
2022-03-09 10:10   ` Simon Ser
2022-03-09 10:24     ` Christian König
2022-03-09 10:28       ` Simon Ser
2022-03-09 10:28       ` Pierre-Eric Pelloux-Prayer
2022-03-09 18:12 ` Rob Clark
2022-03-10  9:55   ` Christian König
2022-03-10 15:24     ` Rob Clark
2022-03-10 16:21       ` Sharma, Shashank
2022-03-10 16:27         ` Andrey Grodzovsky
2022-03-10 17:16           ` Rob Clark
2022-03-10 17:10         ` Rob Clark
2022-03-10 17:19           ` Sharma, Shashank
2022-03-10 17:40             ` Rob Clark
2022-03-10 18:33               ` Abhinav Kumar
2022-03-10 19:14                 ` Sharma, Shashank
2022-03-10 19:35                   ` Rob Clark
2022-03-10 19:44                     ` Sharma, Shashank
2022-03-10 19:56                       ` Rob Clark
2022-03-10 20:17                         ` Sharma, Shashank
2022-03-11  8:30                         ` Pekka Paalanen
2022-03-14 14:23                           ` Alex Deucher
2022-03-14 15:26                             ` Pekka Paalanen
2022-03-15 14:54                               ` Alex Deucher
2022-03-16  8:48                                 ` Pekka Paalanen
2022-03-16 14:12                                   ` Alex Deucher
2022-03-16 15:36                                     ` Rob Clark
2022-03-16 15:48                                       ` Alex Deucher
2022-03-16 16:30                                         ` Rob Clark [this message]
2022-03-17  7:03                                       ` Christian König
2022-03-17  9:29                                         ` Daniel Vetter
2022-03-17  9:46                                           ` Christian König
2022-03-17 15:34                                           ` Rob Clark
2022-03-17 17:23                                             ` Daniel Vetter
2022-03-17 15:40                                           ` Rob Clark
2022-03-17 17:26                                             ` Daniel Vetter
2022-03-17 17:31                                               ` Rob Clark
2022-03-18  7:42                                                 ` Christian König
2022-03-18 15:12                                                   ` Rob Clark
2022-03-21  9:30                                                     ` Christian König
2022-03-21 16:03                                                       ` Rob Clark
2022-03-23 14:07                                                         ` Daniel Stone
2022-03-23 15:14                                                           ` Daniel Vetter
2022-03-23 15:25                                                             ` Christian König
2022-03-26  0:53                                                               ` Olsak, Marek
2022-03-29 12:14                                                                 ` Christian König
2022-03-29 16:25                                                                   ` Marek Olšák
2022-03-30  9:49                                                                     ` Daniel Vetter
2022-03-23 17:30                                                             ` Rob Clark
2022-03-21 14:15                                                     ` Daniel Vetter
2022-03-15  7:13                             ` Dave Airlie
2022-03-15  7:25                               ` Simon Ser
2022-03-15  7:25                               ` Christian König
2022-03-17  9:25                             ` Daniel Vetter
2022-03-16 21:50 ` Rob Clark
2022-03-17  8:42   ` Sharma, Shashank
2022-03-17  9:21     ` Christian König
2022-03-17 10:31       ` Daniel Stone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAF6AEGvbuPd6YtX3kRofOoEpLj9KG1mrqyQwaGpPmQ=X-VM8CA@mail.gmail.com' \
    --to=robdclark@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=alexdeucher@gmail.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=contactshashanksharma@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@chromium.org \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).