dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: "Sharma, Shashank" <shashank.sharma@amd.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Amaranath Somalapuram" <amaranath.somalapuram@amd.com>,
	"Abhinav Kumar" <quic_abhinavk@quicinc.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Alexandar Deucher" <alexander.deucher@amd.com>,
	"Shashank Sharma" <contactshashanksharma@gmail.com>,
	"Christian Koenig" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event
Date: Thu, 10 Mar 2022 11:56:41 -0800	[thread overview]
Message-ID: <CAF6AEGv3Wv+p1j2B-t22eeK+8rx-qrQHCGoXeV1-XPYp2Om7zg@mail.gmail.com> (raw)
In-Reply-To: <cda15a47-f469-2a7e-87b6-adf00e631ef0@amd.com>

On Thu, Mar 10, 2022 at 11:44 AM Sharma, Shashank
<shashank.sharma@amd.com> wrote:
>
>
>
> On 3/10/2022 8:35 PM, Rob Clark wrote:
> > On Thu, Mar 10, 2022 at 11:14 AM Sharma, Shashank
> > <shashank.sharma@amd.com> wrote:
> >>
> >>
> >>
> >> On 3/10/2022 7:33 PM, Abhinav Kumar wrote:
> >>>
> >>>
> >>> On 3/10/2022 9:40 AM, Rob Clark wrote:
> >>>> On Thu, Mar 10, 2022 at 9:19 AM Sharma, Shashank
> >>>> <shashank.sharma@amd.com> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 3/10/2022 6:10 PM, Rob Clark wrote:
> >>>>>> On Thu, Mar 10, 2022 at 8:21 AM Sharma, Shashank
> >>>>>> <shashank.sharma@amd.com> wrote:
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> On 3/10/2022 4:24 PM, Rob Clark wrote:
> >>>>>>>> On Thu, Mar 10, 2022 at 1:55 AM Christian König
> >>>>>>>> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> Am 09.03.22 um 19:12 schrieb Rob Clark:
> >>>>>>>>>> On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma
> >>>>>>>>>> <contactshashanksharma@gmail.com> wrote:
> >>>>>>>>>>> From: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>>>>>>>
> >>>>>>>>>>> This patch adds a new sysfs event, which will indicate
> >>>>>>>>>>> the userland about a GPU reset, and can also provide
> >>>>>>>>>>> some information like:
> >>>>>>>>>>> - process ID of the process involved with the GPU reset
> >>>>>>>>>>> - process name of the involved process
> >>>>>>>>>>> - the GPU status info (using flags)
> >>>>>>>>>>>
> >>>>>>>>>>> This patch also introduces the first flag of the flags
> >>>>>>>>>>> bitmap, which can be appended as and when required.
> >>>>>>>>>> Why invent something new, rather than using the already existing
> >>>>>>>>>> devcoredump?
> >>>>>>>>>
> >>>>>>>>> Yeah, that's a really valid question.
> >>>>>>>>>
> >>>>>>>>>> I don't think we need (or should encourage/allow) something drm
> >>>>>>>>>> specific when there is already an existing solution used by both
> >>>>>>>>>> drm
> >>>>>>>>>> and non-drm drivers.  Userspace should not have to learn to support
> >>>>>>>>>> yet another mechanism to do the same thing.
> >>>>>>>>>
> >>>>>>>>> Question is how is userspace notified about new available core
> >>>>>>>>> dumps?
> >>>>>>>>
> >>>>>>>> I haven't looked into it too closely, as the CrOS userspace
> >>>>>>>> crash-reporter already had support for devcoredump, so it "just
> >>>>>>>> worked" out of the box[1].  I believe a udev event is what triggers
> >>>>>>>> the crash-reporter to go read the devcore dump out of sysfs.
> >>>>>>>
> >>>>>>> I had a quick look at the devcoredump code, and it doesn't look like
> >>>>>>> that is sending an event to the user, so we still need an event to
> >>>>>>> indicate a GPU reset.
> >>>>>>
> >>>>>> There definitely is an event to userspace, I suspect somewhere down
> >>>>>> the device_add() path?
> >>>>>>
> >>>>>
> >>>>> Let me check that out as well, hope that is not due to a driver-private
> >>>>> event for GPU reset, coz I think I have seen some of those in a few DRM
> >>>>> drivers.
> >>>>>
> >>>>
> >>>> Definitely no driver private event for drm/msm .. I haven't dug
> >>>> through it all but this is the collector for devcoredump, triggered
> >>>> somehow via udev.  Most likely from event triggered by device_add()
> >>>>
> >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fchromium.googlesource.com%2Fchromiumos%2Fplatform2%2F%2B%2FHEAD%2Fcrash-reporter%2Fudev_collector.cc&amp;data=04%7C01%7Cshashank.sharma%40amd.com%7Cb4e920f125ae4d7de29708da02cd3112%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637825377562005233%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=M4xHPErex4vn7l3lNPgniiMp%2BKb3SpOHQo2QLAndxDQ%3D&amp;reserved=0
> >>>>
> >>>
> >>> Yes, that is correct. the uevent for devcoredump is from device_add()
> >>>
> >> Yes, I could confirm in the code that device_add() sends a uevent.
> >>
> >> kobject_uevent(&dev->kobj, KOBJ_ADD);
> >>
> >> I was trying to map the ChromiumOs's udev event rules with the event
> >> being sent from device_add(), what I could see is there is only one udev
> >> rule for any DRM subsystem events in ChromiumOs's 99-crash-reporter.rules:
> >>
> >> ACTION=="change", SUBSYSTEM=="drm", KERNEL=="card0", ENV{ERROR}=="1", \
> >>     RUN+="/sbin/crash_reporter
> >> --udev=KERNEL=card0:SUBSYSTEM=drm:ACTION=change"
> >>
> >> Can someone confirm that this is the rule which gets triggered when a
> >> devcoredump is generated ? I could not find an ERROR=1 string in the
> >> env[] while sending this event from dev_add();
> >
> > I think it is actually this rule:
> >
> > ACTION=="add", SUBSYSTEM=="devcoredump", \
> >    RUN+="/sbin/crash_reporter
> > --udev=SUBSYSTEM=devcoredump:ACTION=add:KERNEL_NUMBER=%n"
> >
> > It is something non-drm specific because it supports devcore dumps
> > from non drm drivers.  I know at least some of the wifi and remoteproc
> > drivers use it.
> >
>
> Ah, this seems like a problem for me. I understand it will work for a
> reset/recovery app well, but if a DRM client (like a compositor), who
> wants to listen only to DRM events (like a GPU reset), wouldn't this
> create a lot of noise for it ? Like every time any subsystem produces
> this coredump, there will be a change in devcoresump subsystem, and the
> client will have to parse the core file, and then will have to decide if
> it wants to react to this, or ignore.
>
> Wouldn't a GPU reset event, specific to DRM subsystem server better in
> such case ?
>

So, I suppose there are two different use-cases.. for something like
distro which has generic crash telemetry (ie. when users opt in to
automated crash reporting), and in general for debugging gpu crashes,
you want devcoredump, preferably with plenty of information about gpu
state, etc, so you actually have a chance of debugging problems you
can't necessarily reproduce locally.  Note also that mesa CI has some
limited support for collecting devcore dumps if a CI run triggers a
GPU fault.

For something like just notifying a compositor that a gpu crash
happened, perhaps drm_event is more suitable.  See
virtio_gpu_fence_event_create() for an example of adding new event
types.  Although maybe you want it to be an event which is not device
specific.  This isn't so much of a debugging use-case as simply
notification.

BR,
-R

  reply	other threads:[~2022-03-10 19:56 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-09  7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser
2022-03-09 11:18   ` Sharma, Shashank
2022-03-09  8:09 ` Christian König
2022-03-09  9:56 ` Pierre-Eric Pelloux-Prayer
2022-03-09 10:10   ` Simon Ser
2022-03-09 10:24     ` Christian König
2022-03-09 10:28       ` Simon Ser
2022-03-09 10:28       ` Pierre-Eric Pelloux-Prayer
2022-03-09 18:12 ` Rob Clark
2022-03-10  9:55   ` Christian König
2022-03-10 15:24     ` Rob Clark
2022-03-10 16:21       ` Sharma, Shashank
2022-03-10 16:27         ` Andrey Grodzovsky
2022-03-10 17:16           ` Rob Clark
2022-03-10 17:10         ` Rob Clark
2022-03-10 17:19           ` Sharma, Shashank
2022-03-10 17:40             ` Rob Clark
2022-03-10 18:33               ` Abhinav Kumar
2022-03-10 19:14                 ` Sharma, Shashank
2022-03-10 19:35                   ` Rob Clark
2022-03-10 19:44                     ` Sharma, Shashank
2022-03-10 19:56                       ` Rob Clark [this message]
2022-03-10 20:17                         ` Sharma, Shashank
2022-03-11  8:30                         ` Pekka Paalanen
2022-03-14 14:23                           ` Alex Deucher
2022-03-14 15:26                             ` Pekka Paalanen
2022-03-15 14:54                               ` Alex Deucher
2022-03-16  8:48                                 ` Pekka Paalanen
2022-03-16 14:12                                   ` Alex Deucher
2022-03-16 15:36                                     ` Rob Clark
2022-03-16 15:48                                       ` Alex Deucher
2022-03-16 16:30                                         ` Rob Clark
2022-03-17  7:03                                       ` Christian König
2022-03-17  9:29                                         ` Daniel Vetter
2022-03-17  9:46                                           ` Christian König
2022-03-17 15:34                                           ` Rob Clark
2022-03-17 17:23                                             ` Daniel Vetter
2022-03-17 15:40                                           ` Rob Clark
2022-03-17 17:26                                             ` Daniel Vetter
2022-03-17 17:31                                               ` Rob Clark
2022-03-18  7:42                                                 ` Christian König
2022-03-18 15:12                                                   ` Rob Clark
2022-03-21  9:30                                                     ` Christian König
2022-03-21 16:03                                                       ` Rob Clark
2022-03-23 14:07                                                         ` Daniel Stone
2022-03-23 15:14                                                           ` Daniel Vetter
2022-03-23 15:25                                                             ` Christian König
2022-03-26  0:53                                                               ` Olsak, Marek
2022-03-29 12:14                                                                 ` Christian König
2022-03-29 16:25                                                                   ` Marek Olšák
2022-03-30  9:49                                                                     ` Daniel Vetter
2022-03-23 17:30                                                             ` Rob Clark
2022-03-21 14:15                                                     ` Daniel Vetter
2022-03-15  7:13                             ` Dave Airlie
2022-03-15  7:25                               ` Simon Ser
2022-03-15  7:25                               ` Christian König
2022-03-17  9:25                             ` Daniel Vetter
2022-03-16 21:50 ` Rob Clark
2022-03-17  8:42   ` Sharma, Shashank
2022-03-17  9:21     ` Christian König
2022-03-17 10:31       ` Daniel Stone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGv3Wv+p1j2B-t22eeK+8rx-qrQHCGoXeV1-XPYp2Om7zg@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=contactshashanksharma@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=quic_abhinavk@quicinc.com \
    --cc=robdclark@chromium.org \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).