From: "Christian König" <christian.koenig@amd.com> To: "Daniel Vetter" <daniel@ffwll.ch>, "Christian König" <ckoenig.leichtzumerken@gmail.com> Cc: Rob Clark <robdclark@chromium.org>, "Sharma, Shashank" <shashank.sharma@amd.com>, Amaranath Somalapuram <amaranath.somalapuram@amd.com>, Abhinav Kumar <quic_abhinavk@quicinc.com>, dri-devel <dri-devel@lists.freedesktop.org>, Alexandar Deucher <alexander.deucher@amd.com>, amd-gfx list <amd-gfx@lists.freedesktop.org>, Shashank Sharma <contactshashanksharma@gmail.com> Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event Date: Thu, 17 Mar 2022 10:46:27 +0100 [thread overview] Message-ID: <303f0fda-485c-2f0b-4ae2-d0e5a7c349c1@amd.com> (raw) In-Reply-To: <YjL/k6kh+5RihGIV@phenom.ffwll.local> Am 17.03.22 um 10:29 schrieb Daniel Vetter: > On Thu, Mar 17, 2022 at 08:03:27AM +0100, Christian König wrote: >> Am 16.03.22 um 16:36 schrieb Rob Clark: >>> [SNIP] >>> just one point of clarification.. in the msm and i915 case it is >>> purely for debugging and telemetry (ie. sending crash logs back to >>> distro for analysis if user has crash reporting enabled).. it isn't >>> used for triggering any action like killing app or compositor. >> By the way, how does msm it's memory management for the devcoredumps? > GFP_NORECLAIM all the way. It's purely best effort. Ok, good to know that it's as simple as that. > Note that the fancy new plan for i915 discrete gpu is to only support gpu > crash dumps on non-recoverable gpu contexts, i.e. those that do not > continue to the next batch when something bad happens. > This is what vk wants That's exactly what I'm telling an internal team for a couple of years now as well. Good to know that this is not that totally crazy. > and also what iris now uses (we do context recovery in userspace in > all cases), and non-recoverable contexts greatly simplify the crash dump > gather: Only thing you need to gather is the register state from hw > (before you reset it), all the batchbuffer bo and indirect state bo (in > i915 you can mark which bo to capture in the CS ioctl) can be captured in > a worker later on. Which for non-recoverable context is no issue, since > subsequent batchbuffers won't trample over any of these things. > > And that way you can record the crashdump (or at least the big pieces like > all the indirect state stuff) with GFP_KERNEL. Interesting idea, so basically we only do the state we need to reset initially and grab a reference on the killed application to gather the rest before we clean them up. Going to keep that in mind as well. Thanks, Christian. > > msm probably gets it wrong since embedded drivers have much less shrinker > and generally no mmu notifiers going on :-) > >> I mean it is strictly forbidden to allocate any memory in the GPU reset >> path. >> >>> I would however *strongly* recommend devcoredump support in other GPU >>> drivers (i915's thing pre-dates devcoredump by a lot).. I've used it >>> to debug and fix a couple obscure issues that I was not able to >>> reproduce by myself. >> Yes, completely agree as well. > +1 > > Cheers, Daniel
WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <christian.koenig@amd.com> To: "Daniel Vetter" <daniel@ffwll.ch>, "Christian König" <ckoenig.leichtzumerken@gmail.com> Cc: Rob Clark <robdclark@chromium.org>, "Sharma, Shashank" <shashank.sharma@amd.com>, Amaranath Somalapuram <amaranath.somalapuram@amd.com>, Abhinav Kumar <quic_abhinavk@quicinc.com>, dri-devel <dri-devel@lists.freedesktop.org>, Alexandar Deucher <alexander.deucher@amd.com>, Rob Clark <robdclark@gmail.com>, amd-gfx list <amd-gfx@lists.freedesktop.org>, Alex Deucher <alexdeucher@gmail.com>, Shashank Sharma <contactshashanksharma@gmail.com> Subject: Re: [PATCH v2 1/2] drm: Add GPU reset sysfs event Date: Thu, 17 Mar 2022 10:46:27 +0100 [thread overview] Message-ID: <303f0fda-485c-2f0b-4ae2-d0e5a7c349c1@amd.com> (raw) In-Reply-To: <YjL/k6kh+5RihGIV@phenom.ffwll.local> Am 17.03.22 um 10:29 schrieb Daniel Vetter: > On Thu, Mar 17, 2022 at 08:03:27AM +0100, Christian König wrote: >> Am 16.03.22 um 16:36 schrieb Rob Clark: >>> [SNIP] >>> just one point of clarification.. in the msm and i915 case it is >>> purely for debugging and telemetry (ie. sending crash logs back to >>> distro for analysis if user has crash reporting enabled).. it isn't >>> used for triggering any action like killing app or compositor. >> By the way, how does msm it's memory management for the devcoredumps? > GFP_NORECLAIM all the way. It's purely best effort. Ok, good to know that it's as simple as that. > Note that the fancy new plan for i915 discrete gpu is to only support gpu > crash dumps on non-recoverable gpu contexts, i.e. those that do not > continue to the next batch when something bad happens. > This is what vk wants That's exactly what I'm telling an internal team for a couple of years now as well. Good to know that this is not that totally crazy. > and also what iris now uses (we do context recovery in userspace in > all cases), and non-recoverable contexts greatly simplify the crash dump > gather: Only thing you need to gather is the register state from hw > (before you reset it), all the batchbuffer bo and indirect state bo (in > i915 you can mark which bo to capture in the CS ioctl) can be captured in > a worker later on. Which for non-recoverable context is no issue, since > subsequent batchbuffers won't trample over any of these things. > > And that way you can record the crashdump (or at least the big pieces like > all the indirect state stuff) with GFP_KERNEL. Interesting idea, so basically we only do the state we need to reset initially and grab a reference on the killed application to gather the rest before we clean them up. Going to keep that in mind as well. Thanks, Christian. > > msm probably gets it wrong since embedded drivers have much less shrinker > and generally no mmu notifiers going on :-) > >> I mean it is strictly forbidden to allocate any memory in the GPU reset >> path. >> >>> I would however *strongly* recommend devcoredump support in other GPU >>> drivers (i915's thing pre-dates devcoredump by a lot).. I've used it >>> to debug and fix a couple obscure issues that I was not able to >>> reproduce by myself. >> Yes, completely agree as well. > +1 > > Cheers, Daniel
next prev parent reply other threads:[~2022-03-17 9:46 UTC|newest] Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-03-08 18:04 [PATCH v2 1/2] drm: Add GPU reset sysfs event Shashank Sharma 2022-03-08 18:04 ` [PATCH v2 2/2] drm/amdgpu: add work function for GPU reset event Shashank Sharma 2022-03-09 7:47 ` [PATCH v2 1/2] drm: Add GPU reset sysfs event Simon Ser 2022-03-09 11:18 ` Sharma, Shashank 2022-03-09 8:09 ` Christian König 2022-03-09 9:56 ` Pierre-Eric Pelloux-Prayer 2022-03-09 10:10 ` Simon Ser 2022-03-09 10:24 ` Christian König 2022-03-09 10:28 ` Simon Ser 2022-03-09 10:28 ` Pierre-Eric Pelloux-Prayer 2022-03-09 18:12 ` Rob Clark 2022-03-10 9:55 ` Christian König 2022-03-10 15:24 ` Rob Clark 2022-03-10 16:21 ` Sharma, Shashank 2022-03-10 16:27 ` Andrey Grodzovsky 2022-03-10 17:16 ` Rob Clark 2022-03-10 17:10 ` Rob Clark 2022-03-10 17:19 ` Sharma, Shashank 2022-03-10 17:40 ` Rob Clark 2022-03-10 18:33 ` Abhinav Kumar 2022-03-10 19:14 ` Sharma, Shashank 2022-03-10 19:35 ` Rob Clark 2022-03-10 19:44 ` Sharma, Shashank 2022-03-10 19:56 ` Rob Clark 2022-03-10 20:17 ` Sharma, Shashank 2022-03-11 8:30 ` Pekka Paalanen 2022-03-14 14:23 ` Alex Deucher 2022-03-14 14:23 ` Alex Deucher 2022-03-14 15:26 ` Pekka Paalanen 2022-03-14 15:26 ` Pekka Paalanen 2022-03-15 14:54 ` Alex Deucher 2022-03-15 14:54 ` Alex Deucher 2022-03-16 8:48 ` Pekka Paalanen 2022-03-16 8:48 ` Pekka Paalanen 2022-03-16 14:12 ` Alex Deucher 2022-03-16 14:12 ` Alex Deucher 2022-03-16 15:36 ` Rob Clark 2022-03-16 15:36 ` Rob Clark 2022-03-16 15:48 ` Alex Deucher 2022-03-16 15:48 ` Alex Deucher 2022-03-16 16:30 ` Rob Clark 2022-03-16 16:30 ` Rob Clark 2022-03-17 7:03 ` Christian König 2022-03-17 7:03 ` Christian König 2022-03-17 9:29 ` Daniel Vetter 2022-03-17 9:29 ` Daniel Vetter 2022-03-17 9:46 ` Christian König [this message] 2022-03-17 9:46 ` Christian König 2022-03-17 15:34 ` Rob Clark 2022-03-17 15:34 ` Rob Clark 2022-03-17 17:23 ` Daniel Vetter 2022-03-17 17:23 ` Daniel Vetter 2022-03-17 15:40 ` Rob Clark 2022-03-17 15:40 ` Rob Clark 2022-03-17 17:26 ` Daniel Vetter 2022-03-17 17:26 ` Daniel Vetter 2022-03-17 17:31 ` Rob Clark 2022-03-17 17:31 ` Rob Clark 2022-03-18 7:42 ` Christian König 2022-03-18 7:42 ` Christian König 2022-03-18 15:12 ` Rob Clark 2022-03-18 15:12 ` Rob Clark 2022-03-21 9:30 ` Christian König 2022-03-21 9:30 ` Christian König 2022-03-21 16:03 ` Rob Clark 2022-03-21 16:03 ` Rob Clark 2022-03-23 14:07 ` Daniel Stone 2022-03-23 15:14 ` Daniel Vetter 2022-03-23 15:14 ` Daniel Vetter 2022-03-23 15:25 ` Christian König 2022-03-23 15:25 ` Christian König 2022-03-26 0:53 ` Olsak, Marek 2022-03-26 0:53 ` Olsak, Marek 2022-03-29 12:14 ` Christian König 2022-03-29 12:14 ` Christian König 2022-03-29 16:25 ` Marek Olšák 2022-03-29 16:25 ` Marek Olšák 2022-03-30 9:49 ` Daniel Vetter 2022-03-30 9:49 ` Daniel Vetter 2022-03-23 17:30 ` Rob Clark 2022-03-23 17:30 ` Rob Clark 2022-03-21 14:15 ` Daniel Vetter 2022-03-21 14:15 ` Daniel Vetter 2022-03-15 7:13 ` Dave Airlie 2022-03-15 7:13 ` Dave Airlie 2022-03-15 7:25 ` Simon Ser 2022-03-15 7:25 ` Simon Ser 2022-03-15 7:25 ` Christian König 2022-03-15 7:25 ` Christian König 2022-03-17 9:25 ` Daniel Vetter 2022-03-16 21:50 ` Rob Clark 2022-03-17 8:42 ` Sharma, Shashank 2022-03-17 9:21 ` Christian König 2022-03-17 10:31 ` Daniel Stone 2022-03-17 10:31 ` Daniel Stone
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=303f0fda-485c-2f0b-4ae2-d0e5a7c349c1@amd.com \ --to=christian.koenig@amd.com \ --cc=alexander.deucher@amd.com \ --cc=amaranath.somalapuram@amd.com \ --cc=amd-gfx@lists.freedesktop.org \ --cc=ckoenig.leichtzumerken@gmail.com \ --cc=contactshashanksharma@gmail.com \ --cc=daniel@ffwll.ch \ --cc=dri-devel@lists.freedesktop.org \ --cc=quic_abhinavk@quicinc.com \ --cc=robdclark@chromium.org \ --cc=shashank.sharma@amd.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.