From mboxrd@z Thu Jan 1 00:00:00 1970 From: Daniel Vetter Subject: Re: [RFC] algorithm for handling bad cachelines Date: Tue, 27 Mar 2012 16:50:39 +0200 Message-ID: <20120327145039.GB16018@phenom.ffwll.local> References: <20120327071943.061bba40@bwidawsk.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Received: from mail-wg0-f43.google.com (mail-wg0-f43.google.com [74.125.82.43]) by gabe.freedesktop.org (Postfix) with ESMTP id BEC669E9D6 for ; Tue, 27 Mar 2012 07:49:58 -0700 (PDT) Received: by wgbdr12 with SMTP id dr12so6141wgb.12 for ; Tue, 27 Mar 2012 07:49:58 -0700 (PDT) Content-Disposition: inline In-Reply-To: <20120327071943.061bba40@bwidawsk.net> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+gcfxdi-intel-gfx=m.gmane.org@lists.freedesktop.org To: Ben Widawsky Cc: intel-gfx@lists.freedesktop.org List-Id: intel-gfx@lists.freedesktop.org On Tue, Mar 27, 2012 at 07:19:43AM -0700, Ben Widawsky wrote: > I wanted to run this by folks before I start doing any actual work. > > This is primarily for GPGPU, or perhaps *really* accurate rendering > requirements. > > IVB+ has an interrupt to tell us when a cacheline seems to be going bad. > There is also a mechanism to remap the bad cachelines. The > implementation details aren't quite clear to me yet, but I'd like to > enable this feature for userspace. > > Here is my current plan, but it involves filesystem access, so it's > probably going to get a lot of flames. > > 1. Handle cache line going bad interrupt. > > 2. send a uevent > 2.5 reset the GPU (docs tell us to) > > 3. Read a module parameter with a path in the filesystem > of the list of bad lines. It's not clear to me yet exactly what I need > to store, but it should be a relatively simple list. .... path in filesystem is no-go for kernel interface. So bad cachelines need to go into the modele parameter itself. Or we add a sysfs interface and reset the gpu (because if my understanding is right, we can't disable cachelines once the gpu has used them). > 4. Parse list on driver load, and handle as necessary. > 5. goto 1. > > Probably the biggest unanswered question is exactly when in the HW > loading do we have to finish remapping. If it can happen at any time > while the card is running, I don't need the filesystem stuff, but I > believe I need to remap the lines quite early in the device bootstrap. I believe so, too ;-) > The only alternative I have is a huge comma separated string for a > module parameter, but I kind of like reading the file better. Well, you can't read a file from the kernel because we might init the driver without any userspace present (when the driver is built-in). > Any feedback is highly appreciated. I couldn't really find much > precedent for doing this in other drivers, so pointers to similar > things would also be highly welcome. -Daniel -- Daniel Vetter Mail: daniel@ffwll.ch Mobile: +41 (0)79 365 57 48