dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: Rob Clark <robdclark@gmail.com>
To: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Cc: "Rob Clark" <robdclark@chromium.org>,
	"Sharma, Shashank" <shashank.sharma@amd.com>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Amaranath Somalapuram" <amaranath.somalapuram@amd.com>,
	dri-devel <dri-devel@lists.freedesktop.org>,
	"amd-gfx list" <amd-gfx@lists.freedesktop.org>,
	"Alexandar Deucher" <alexander.deucher@amd.com>,
	"Shashank Sharma" <contactshashanksharma@gmail.com>,
	"Christian Koenig" <christian.koenig@amd.com>
Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event
Date: Thu, 10 Mar 2022 09:16:05 -0800	[thread overview]
Message-ID: <CAF6AEGtrNOdgH7wEpM7vrFxq8088Pv+H_F+3qrBtixsdVTROCA@mail.gmail.com> (raw)
In-Reply-To: <5f6f6d7a-240f-f82d-b1dd-d3328ed75c07@amd.com>

On Thu, Mar 10, 2022 at 8:27 AM Andrey Grodzovsky
<andrey.grodzovsky@amd.com> wrote:
>
>
> On 2022-03-10 11:21, Sharma, Shashank wrote:
> >
> >
> > On 3/10/2022 4:24 PM, Rob Clark wrote:
> >> On Thu, Mar 10, 2022 at 1:55 AM Christian König
> >> <ckoenig.leichtzumerken@gmail.com> wrote:
> >>>
> >>>
> >>>
> >>> Am 09.03.22 um 19:12 schrieb Rob Clark:
> >>>> On Tue, Mar 8, 2022 at 11:40 PM Shashank Sharma
> >>>> <contactshashanksharma@gmail.com> wrote:
> >>>>> From: Shashank Sharma <shashank.sharma@amd.com>
> >>>>>
> >>>>> This patch adds a new sysfs event, which will indicate
> >>>>> the userland about a GPU reset, and can also provide
> >>>>> some information like:
> >>>>> - process ID of the process involved with the GPU reset
> >>>>> - process name of the involved process
> >>>>> - the GPU status info (using flags)
> >>>>>
> >>>>> This patch also introduces the first flag of the flags
> >>>>> bitmap, which can be appended as and when required.
> >>>> Why invent something new, rather than using the already existing
> >>>> devcoredump?
> >>>
> >>> Yeah, that's a really valid question.
> >>>
> >>>> I don't think we need (or should encourage/allow) something drm
> >>>> specific when there is already an existing solution used by both drm
> >>>> and non-drm drivers.  Userspace should not have to learn to support
> >>>> yet another mechanism to do the same thing.
> >>>
> >>> Question is how is userspace notified about new available core dumps?
> >>
> >> I haven't looked into it too closely, as the CrOS userspace
> >> crash-reporter already had support for devcoredump, so it "just
> >> worked" out of the box[1].  I believe a udev event is what triggers
> >> the crash-reporter to go read the devcore dump out of sysfs.
> >
> > I had a quick look at the devcoredump code, and it doesn't look like
> > that is sending an event to the user, so we still need an event to
> > indicate a GPU reset.
> >
> > - Shashank
>
>
> Another point I raised in another thread is that it looks like we might
> want to use devcoredump during ASIC reset to dump more HW related data
> which is useful
> for debugging. It means the use client will have to extract the pid and
> process name out from a bigger data set - Is that ok ? We can probably
> put it at the beginning
> for easiest parsing.
>

Yes, this is what we do for drm/msm.. the start of the devcore file
has something like:

----
kernel: 5.14.0-rc3-debug+
module: msm
time: 1632763923.453207637
comm: deqp-gles3:sq0
cmdline: ./deqp-gles31 --deqp-surface-width=256
--deqp-surface-height=256 --deqp-gl-config-name=rgba8888d24s8ms0
--deqp-visibility=hidden
--deqp-caselist-file=/home/robclark/src/deqp/build/modules/gles31/new-run/c33.r1.caselist.txt
--deqp-log-filename=/home/robclark/src/deqp/build/modules/gles31/new-run/c33.r1.qpa
--deqp-log-flush=disable
--deqp-shadercache-filename=/home/robclark/src/deqp/build/modules/gles31/new-run/t499826814672.shader_cache
--deqp-shadercache-truncate=disable
revision: 618 (6.1.8.0)
----

We capture quite a lot of state, cmdstream that triggered the hang,
register/state dumps, microcontroller state, etc.  But we do go out of
our way to not capture textures or caches that might contain texture
data by default (for privacy reasons)

It has been hugely useful for debugging a few issues that happen
rarely enough that they are difficult to reproduce.  I guess that is
crowd-sourced debugging ;-)

BR,
-R

  reply	other threads:[~2022-03-10 17:15 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma
2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma
2022-03-09  7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser
2022-03-09 11:18   ` Sharma, Shashank
2022-03-09  8:09 ` Christian König
2022-03-09  9:56 ` Pierre-Eric Pelloux-Prayer
2022-03-09 10:10   ` Simon Ser
2022-03-09 10:24     ` Christian König
2022-03-09 10:28       ` Simon Ser
2022-03-09 10:28       ` Pierre-Eric Pelloux-Prayer
2022-03-09 18:12 ` Rob Clark
2022-03-10  9:55   ` Christian König
2022-03-10 15:24     ` Rob Clark
2022-03-10 16:21       ` Sharma, Shashank
2022-03-10 16:27         ` Andrey Grodzovsky
2022-03-10 17:16           ` Rob Clark [this message]
2022-03-10 17:10         ` Rob Clark
2022-03-10 17:19           ` Sharma, Shashank
2022-03-10 17:40             ` Rob Clark
2022-03-10 18:33               ` Abhinav Kumar
2022-03-10 19:14                 ` Sharma, Shashank
2022-03-10 19:35                   ` Rob Clark
2022-03-10 19:44                     ` Sharma, Shashank
2022-03-10 19:56                       ` Rob Clark
2022-03-10 20:17                         ` Sharma, Shashank
2022-03-11  8:30                         ` Pekka Paalanen
2022-03-14 14:23                           ` Alex Deucher
2022-03-14 15:26                             ` Pekka Paalanen
2022-03-15 14:54                               ` Alex Deucher
2022-03-16  8:48                                 ` Pekka Paalanen
2022-03-16 14:12                                   ` Alex Deucher
2022-03-16 15:36                                     ` Rob Clark
2022-03-16 15:48                                       ` Alex Deucher
2022-03-16 16:30                                         ` Rob Clark
2022-03-17  7:03                                       ` Christian König
2022-03-17  9:29                                         ` Daniel Vetter
2022-03-17  9:46                                           ` Christian König
2022-03-17 15:34                                           ` Rob Clark
2022-03-17 17:23                                             ` Daniel Vetter
2022-03-17 15:40                                           ` Rob Clark
2022-03-17 17:26                                             ` Daniel Vetter
2022-03-17 17:31                                               ` Rob Clark
2022-03-18  7:42                                                 ` Christian König
2022-03-18 15:12                                                   ` Rob Clark
2022-03-21  9:30                                                     ` Christian König
2022-03-21 16:03                                                       ` Rob Clark
2022-03-23 14:07                                                         ` Daniel Stone
2022-03-23 15:14                                                           ` Daniel Vetter
2022-03-23 15:25                                                             ` Christian König
2022-03-26  0:53                                                               ` Olsak, Marek
2022-03-29 12:14                                                                 ` Christian König
2022-03-29 16:25                                                                   ` Marek Olšák
2022-03-30  9:49                                                                     ` Daniel Vetter
2022-03-23 17:30                                                             ` Rob Clark
2022-03-21 14:15                                                     ` Daniel Vetter
2022-03-15  7:13                             ` Dave Airlie
2022-03-15  7:25                               ` Simon Ser
2022-03-15  7:25                               ` Christian König
2022-03-17  9:25                             ` Daniel Vetter
2022-03-16 21:50 ` Rob Clark
2022-03-17  8:42   ` Sharma, Shashank
2022-03-17  9:21     ` Christian König
2022-03-17 10:31       ` Daniel Stone

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAF6AEGtrNOdgH7wEpM7vrFxq8088Pv+H_F+3qrBtixsdVTROCA@mail.gmail.com \
    --to=robdclark@gmail.com \
    --cc=alexander.deucher@amd.com \
    --cc=amaranath.somalapuram@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=andrey.grodzovsky@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=contactshashanksharma@gmail.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=robdclark@chromium.org \
    --cc=shashank.sharma@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).