All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Matt Turner <mattst88@gmail.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	Ben Widawsky <ben@bwidawsk.net>
Subject: Re: [PATCH 02/13] drm/i915: Copy user requested buffers into the error state
Date: Sun, 2 Apr 2017 09:51:57 +0100	[thread overview]
Message-ID: <20170402085157.GA17134@nuc-i3427.alporthouse.com> (raw)
In-Reply-To: <CAEdQ38EibMWcVJs0mJLiLON1cJfp-m70kZ2uPX09dhEoC_-G6Q@mail.gmail.com>

On Sat, Apr 01, 2017 at 05:48:55PM -0700, Matt Turner wrote:
> On Wed, Mar 29, 2017 at 8:56 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > Introduce a new execobject.flag (EXEC_OBJECT_CAPTURE) that userspace may
> > use to indicate that it wants the contents of this buffer preserved in
> > the error state (/sys/class/drm/cardN/error) following a GPU hang
> > involving this batch.
> >
> > Use this at your discretion, the contents of the error state. although
> > compressed, are allocated with GFP_ATOMIC (i.e. limited) and kept for all
> > eternity (until the error state is destroyed).
> >
> > Based on an earlier patch by Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Matt Turner <mattst88@gmail.com>
> > Acked-by: Ben Widawsky <ben@bwidawsk.net>
> > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > ---
> 
> Thank you, Chris. With this in place (and a few patches from Ben
> rebased for libdrm and Mesa) I can disassemble the shader program from
> an error state.
> 
> In this case, I turned off the end-of-thread bit on the sendc in order
> to cause a hang:
> 
> render ring --- user = 0x00000000 fff75000
> pln(8)          g124<1>F        g4<0,1,0>F      g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g125<1>F        g4.4<0,1,0>F    g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g126<1>F        g5<0,1,0>F      g2<8,8,1>F      {
> align1 1Q compacted };
> pln(8)          g127<1>F        g5.4<0,1,0>F    g2<8,8,1>F      {
> align1 1Q compacted };
> sendc(8)        null<1>UW       g124<8,8,1>F
>                             render RT write SIMD8 LastRT Surface = 0
> mlen 4 rlen 0 { align1 1Q };
> nop                                                             ;
> pln(16)         g120<1>F        g6<0,1,0>F      g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g122<1>F        g6.4<0,1,0>F    g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g124<1>F        g7<0,1,0>F      g2<8,8,1>F      {
> align1 1H compacted };
> pln(16)         g126<1>F        g7.4<0,1,0>F    g2<8,8,1>F      {
> align1 1H compacted };
> sendc(16)       null<1>UW       g120<8,8,1>F
>                             render RT write SIMD16 LastRT Surface = 0
> mlen 8 rlen 0 { align1 1H };
> illegal(1)                                                      { align1 1N };
> 
> Presumably we would like to save more than just instruction buffers.
> Do we have a good way of discerning what each blob of data in the
> error state is?

The prechosen set are named (batch, ring, HW context, HW status,
semaphore). The user ones just have a nondescript 'user'. My thinking
was that either there would be an additional debug only (aub-esque)
buffer added to the execbuf that contained all the useful info to index
the other buffers captured, or userspace puts a header/footer into its
captured batches. I did consider the possibility of adding a tag through
the execobject, maybe 8-bits inside flags, but I prefer the approach
of embedding information into the buffers (much more flexibile).

It is also possible to take the simulator route and decode the buffers
according to the current GPU state, the link between relocation
addresses and buffer address should be sufficient?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2017-04-02  8:52 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-29 15:56 Another week, another eb bomb Chris Wilson
2017-03-29 15:56 ` [PATCH 01/13] drm/i915: Reinstate reservation_object zapping for batch_pool objects Chris Wilson
2017-03-29 15:56 ` [PATCH 02/13] drm/i915: Copy user requested buffers into the error state Chris Wilson
2017-04-02  0:48   ` Matt Turner
2017-04-02  8:51     ` Chris Wilson [this message]
2017-04-12 21:43   ` Chris Wilson
2017-04-15  4:49     ` Matt Turner
2017-04-15 11:42       ` Chris Wilson
2017-03-29 15:56 ` [PATCH 03/13] drm/i915: Amalgamate execbuffer parameter structures Chris Wilson
2017-03-29 15:56 ` [PATCH 04/13] drm/i915: Use vma->exec_entry as our double-entry placeholder Chris Wilson
2017-03-31  9:29   ` Joonas Lahtinen
2017-04-10 10:30     ` Chris Wilson
2017-03-29 15:56 ` [PATCH 05/13] drm/i915: Split vma exec_link/evict_link Chris Wilson
2017-03-29 15:56 ` [PATCH 06/13] drm/i915: Store a direct lookup from object handle to vma Chris Wilson
2017-03-31  9:56   ` Joonas Lahtinen
2017-03-29 15:56 ` [PATCH 07/13] drm/i915: Pass vma to relocate entry Chris Wilson
2017-03-29 15:56 ` [PATCH 08/13] drm/i915: Eliminate lots of iterations over the execobjects array Chris Wilson
2017-04-04 14:57   ` Joonas Lahtinen
2017-04-10 12:17     ` Chris Wilson
2017-04-11 20:45     ` [PATCH v4] " Chris Wilson
2017-03-29 15:56 ` [PATCH 09/13] drm/i915: First try the previous execbuffer location Chris Wilson
2017-03-29 15:56 ` [PATCH 10/13] drm/i915: Wait upon userptr get-user-pages within execbuffer Chris Wilson
2017-03-29 15:56 ` [PATCH 11/13] drm/i915: Allow execbuffer to use the first object as the batch Chris Wilson
2017-03-29 15:56 ` [PATCH 12/13] drm/i915: Async GPU relocation processing Chris Wilson
2017-04-03 13:54   ` Joonas Lahtinen
2017-03-29 15:56 ` [PATCH 13/13] drm/i915/scheduler: Support user-defined priorities Chris Wilson
2017-03-29 16:17 ` ✓ Fi.CI.BAT: success for series starting with [01/13] drm/i915: Reinstate reservation_object zapping for batch_pool objects Patchwork
2017-04-11 20:47 ` ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Reinstate reservation_object zapping for batch_pool objects (rev2) Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170402085157.GA17134@nuc-i3427.alporthouse.com \
    --to=chris@chris-wilson.co.uk \
    --cc=ben@bwidawsk.net \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=mattst88@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.