All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Ben Widawsky <ben@bwidawsk.net>
Cc: intel-gfx@lists.freedesktop.org
Subject: Re: [RFC] algorithm for handling bad cachelines
Date: Tue, 27 Mar 2012 17:33:01 +0200	[thread overview]
Message-ID: <20120327153300.GC16018@phenom.ffwll.local> (raw)
In-Reply-To: <20120327080931.0c511ede@bwidawsk.net>

On Tue, Mar 27, 2012 at 08:09:31AM -0700, Ben Widawsky wrote:
> On Tue, 27 Mar 2012 16:50:39 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Mar 27, 2012 at 07:19:43AM -0700, Ben Widawsky wrote:
> > > I wanted to run this by folks before I start doing any actual work.
> > > 
> > > This is primarily for GPGPU, or perhaps *really* accurate rendering
> > > requirements.
> > > 
> > > IVB+ has an interrupt to tell us when a cacheline seems to be going
> > > bad. There is also a mechanism to remap the bad cachelines. The
> > > implementation details aren't quite clear to me yet, but I'd like to
> > > enable this feature for userspace.
> > > 
> > > Here is my current plan, but it involves filesystem access, so it's
> > > probably going to get a lot of flames.
> > > 
> > > 1. Handle cache line going bad interrupt.
> > > <After n number of these interrupts to the same line,>
> > > 2. send a uevent
> > > 2.5 reset the GPU (docs tell us to)
> > > <On module load>
> > > 3. Read  a module parameter with a path in the filesystem
> > > of the list of bad lines. It's not clear to me yet exactly what I
> > > need to store, but it should be a relatively simple list.
> > 
> > .... path in filesystem is no-go for kernel interface. So bad
> > cachelines need to go into the modele parameter itself. Or we add a
> > sysfs interface and reset the gpu (because if my understanding is
> > right, we can't disable cachelines once the gpu has used them).
> 
> I think we have to assume the list could get quite long. So long in
> fact, I imagine the user may often want to reset it and try his/her
> luck again with some lines.
> 
> Could you elaborate more on why it's a no-go? The module parameter
> setting itself is limited to root. I was trying to clearly understand
> exactly why this can't be done, and some of the lore behind why file
> access in the kernel is such a bad thing (assuming the files being
> accessed are set at module load time). I wouldn't want to go the route
> of loading an arbitrary path - which seems like a terrible idea;
> though it works for firmware blobs, and I half thought we could load
> this like a firmware blob.
> 
> Anyway, assuming a gpu reset is sufficient to remap (docs only clearly
> state reset works for disabling, iirc) then I would like to do that.
> What is the appropriate interface for that? The dev node? Sysfs?

I personally prefer sysfs for this. Albeit you might have some issues with
the one value per file limit ... I guess a list of hex values is ok
though.

> > > 4. Parse list on driver load, and handle as necessary.
> > > 5. goto 1.
> > > 
> > > Probably the biggest unanswered question is exactly when in the HW
> > > loading do we have to finish remapping. If it can happen at any time
> > > while the card is running, I don't need the filesystem stuff, but I
> > > believe I need to remap the lines quite early in the device
> > > bootstrap.
> > 
> > I believe so, too ;-)
> > 
> > > The only alternative I have is a huge comma separated string for a
> > > module parameter, but I kind of like reading the file better.
> > 
> > Well, you can't read a file from the kernel because we might init the
> > driver without any userspace present (when the driver is built-in).
> 
> Userspace should still be present in this case, right? The kernel
> command line should suffice, I think.

Somewhen later on, but only after the hw is intialized. But if you're
going the runtime interface route anyway, it doesn't matter.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

  reply	other threads:[~2012-03-27 15:32 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-27 14:19 [RFC] algorithm for handling bad cachelines Ben Widawsky
2012-03-27 14:34 ` Chris Wilson
2012-03-27 14:50 ` Daniel Vetter
2012-03-27 15:09   ` Ben Widawsky
2012-03-27 15:33     ` Daniel Vetter [this message]
2012-03-28 17:26 ` Jesse Barnes
2012-03-28 18:04   ` Ben Widawsky
     [not found] ` <m2k4253v29.fsf@firstfloor.org>
2012-03-28 21:15   ` Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120327153300.GC16018@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=ben@bwidawsk.net \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.