All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Widawsky <ben@bwidawsk.net>
To: Andi Kleen <andi@firstfloor.org>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [RFC] algorithm for handling bad cachelines
Date: Wed, 28 Mar 2012 14:15:27 -0700	[thread overview]
Message-ID: <20120328141527.55deb6aa@bwidawsk.net> (raw)
In-Reply-To: <m2k4253v29.fsf@firstfloor.org>

On Wed, 28 Mar 2012 02:59:26 -0700
Andi Kleen <andi@firstfloor.org> wrote:

> Ben Widawsky <ben@bwidawsk.net> writes:
> >
> > 1. Handle cache line going bad interrupt.
> > <After n number of these interrupts to the same line,>
> 
> Never use global n without timeout for corrected errors, you would 
> need a leaky bucket with a suitable timeout.

As I understand electrons (which is not very well) parity errors happen
all the time and are transparently corrected by our HW. So I suppose
'n' is still interesting information, but your point is noted. It is
probably better to let userspace decide that n value.

Take this with a grain of salt because the number of interrupts we get
is speculative as I haven't actually tried to enable this.

> 
> > 2. send a uevent
> > 2.5 reset the GPU (docs tell us to)
> > <On module load>
> 
> Persistent lists on disk usually suffer from all kinds of problems,
> e.g. you need to detect when the board or CPU has changed.
> Also when the problem is temporary you do not really want
> to save such information permanent.
> 
> Usually it's better to rediscover such state each time and handle
> it again. Then you also don't need the uevent or complicated
> user interfaces.

It seems nice to have information stored non-volatility. It doesn't have
to be used by the user, but assuming they want to load the option to
actually detect these events, it's probably also beneficial to give the
known bad cachelines since this requires a GPU reset once detected. The
reset both takes time, and may do more damage (that is based on past
experience/products only and I hope IVB can magnificently recover from
our bad GPU programming).

> 
> > Any feedback is highly appreciated. I couldn't really find much
> > precedent for doing this in other drivers, so pointers to similar
> > things would also be highly welcome.
> 
> http://mcelog.org
> 
> -Andi

Thanks.

      parent reply	other threads:[~2012-03-28 21:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-27 14:19 [RFC] algorithm for handling bad cachelines Ben Widawsky
2012-03-27 14:34 ` Chris Wilson
2012-03-27 14:50 ` Daniel Vetter
2012-03-27 15:09   ` Ben Widawsky
2012-03-27 15:33     ` Daniel Vetter
2012-03-28 17:26 ` Jesse Barnes
2012-03-28 18:04   ` Ben Widawsky
     [not found] ` <m2k4253v29.fsf@firstfloor.org>
2012-03-28 21:15   ` Ben Widawsky [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120328141527.55deb6aa@bwidawsk.net \
    --to=ben@bwidawsk.net \
    --cc=andi@firstfloor.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.