All of lore.kernel.org
 help / color / mirror / Atom feed
From: Chris Wilson <chris@chris-wilson.co.uk>
To: Ian Romanick <idr@freedesktop.org>,
	"intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>
Subject: Re: [RFC] GPU reset notification interface
Date: Wed, 18 Jul 2012 08:06:10 +0100	[thread overview]
Message-ID: <1342595191_10959@CP5-2952> (raw)
In-Reply-To: <5005E433.40200@freedesktop.org>

On Tue, 17 Jul 2012 15:16:19 -0700, Ian Romanick <idr@freedesktop.org> wrote:
> I'm getting ready to implement the reset notification part of 
> GL_ARB_robustness in the i965 driver.  There are a bunch of quirky bits 
> of the extension that are causing some grief in designing the kernel / 
> user interface.  I think I've settled on an interface that should meet 
> all of the requirements, but I want to bounce it off people before I 
> start writing code.

What's the rationale for reset-in-progress? The actual delay between a
hang being noticed and the reset worker kicked off is going to be
very, very small -- in theory there is no window of opportunity for
another client to sneek and perform more work.

If we replace 'index' with breadcrumb, I'm much happier. Then you
have

struct drm_i915_reset_counts {
	u32 num_resets;
	u32 last_reset; /* using the seqno breadcrumb */
	u32 flags; /* 0x1 if in progress */

	u32 num_contexts;
	struct drm_i915_context_reset_counts contexts[0];
};

struct drm_i915_context_reset_counts {
	u32 ctx_id; /* per-fd namespace */

	u32 num_resets;
	u32 last_reset; // seqno

	u32 num_guilty;
	u32 last_guilty; // seqno
/*
	u32 num_innocent;
	u32 last_innocent;

	u32 num_unproven;
	u32 last_unproven;
*/
};;

So what do the terms mean as far as we are aware?

guilty - this context owned the batch we determine the GPU is executing

innocent - this context owns buffer that were on the active list at the
           time of execution and presumed trashed with the reset, i..e
           the context itself was on the active list.

unproven - a hang occurred in the driver without any indication of a
           guilty party (e.g. a semaphore hang due to bad olr), and
           this context was on the active list.

If we only increment the reset counter for a context if it is active at
the time of reset, then
  num_resets = num_guiltly + num_innocent + num_unproven;
and we can drop num_unproven. Since there is then no difference in
effect between unproven/innocence, we need only report the last time it
was found guilty and the total number of reset events involving this
context.

Similarly, the per-fd global counter is only incremented if one of its
contexts was active at the time of an event. The presumption being that
userspace does the broadcast across shared buffers/contexts to other
processes and updates the shared buffers as necessary.

We only keep 'monotonically' increasing state, so userspace is
responsible for computing deltas to catch events.

For the breadcrumb I'd just to use the seqno. It matters not to
userspace that the event breadcrumb jumps by more than one between
events, as it only sees a compressed view anyway (i.e. there may be
external resets that it knows nothing about).

The largest uncertainity for me is the guilty/innocence/unknown
counters and the value of providing both innocent and unproven. Then
there is the question whether we track/report resets that are nothing to
do with the fd. And whether contexts or buffers is more meaningful. From
a tracking perspective, contexts wins both for the kernel, and
presumably for userspace. And given a queriable last_write_seqno for
userspace, it can easily convert a context reset seqno into a trash
list.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

  parent reply	other threads:[~2012-07-18  7:06 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-17 22:16 [RFC] GPU reset notification interface Ian Romanick
2012-07-18  1:57 ` Ian Romanick
2012-07-18  7:06 ` Chris Wilson [this message]
2012-07-18  9:20 ` Daniel Vetter
2012-07-18 16:23   ` Ian Romanick
2012-07-18 16:36     ` Adam Jackson
2012-07-18 16:42       ` Daniel Vetter
2012-07-18 16:55     ` Daniel Vetter
2012-07-19  3:47       ` Ben Widawsky
2012-07-19 15:58       ` Ian Romanick

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1342595191_10959@CP5-2952 \
    --to=chris@chris-wilson.co.uk \
    --cc=idr@freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.