From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1760377Ab3B1PPi (ORCPT ); Thu, 28 Feb 2013 10:15:38 -0500 Received: from mail-ob0-f182.google.com ([209.85.214.182]:62886 "EHLO mail-ob0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1760351Ab3B1PPc (ORCPT ); Thu, 28 Feb 2013 10:15:32 -0500 MIME-Version: 1.0 In-Reply-To: References: Date: Thu, 28 Feb 2013 10:15:31 -0500 Message-ID: Subject: Re: [git pull] drm merge for 3.9-rc1 From: Josh Boyer To: Alex Deucher Cc: Dave Airlie , Alex Deucher , Jerome Glisse , torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, DRI mailing list Content-Type: text/plain; charset=ISO-8859-1 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher wrote: > On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer wrote: >> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher wrote: >>>>>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit >>>>>> >>>>>> So I don't think that's actually the cause of the problem. Or at least >>>>>> not that alone. I reverted it on top of Linus' latest tree and I still >>>>>> get the lockups. >>>>> >>>>> Actually, git bisect does seem to have gotten it correct. Once I >>>>> actually tested the revert of just that on top of Linus' tree (commit >>>>> d895cb1af1), things seem to be working much better. I've rebooted a >>>>> dozen times without a lockup. The most I've seen it take on a kernel >>>>> with that commit included is 3 reboots, so that's definitely at least an >>>>> improvement. >>>> >>>> I give up. GPU issues are not my thing. 2 reboots after I sent that it >>>> gave me pretty rainbow static again. So it might have been an >>>> improvement, but revert it is not a solution. >>>> >>>> Looking at there rest of the commits, the whole GPU rework might be >>>> suspect, but I clearly have no clue. >>> >>> GPUs are tricky beasts :) >> >> Understatement ;). >> >>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the >>> problem anyway since it only affects 6xx/7xx and your card is handled >>> by the evergreen code. I'll put together some patches to help narrow >>> down the problem. >> >> Yeah, that's the biggest problem I have, not knowing which functions are >> actually being executed for this card. It looks like a combination of >> stuff in evergreen.c and ni.c, but I have no idea. >> >> Patches would be great. If nothing else, I'm really good at building >> kernels and rebooting by now. > > Two possible fixes attached. The first attempts a full reset of all > blocks if the MC (memory controller) is hung. That may work better > than just resetting the MC. The second just disables MC reset. I'm > not sure we can reliably tell if it's busy due to display requests > hitting the MC periodically which would lead to needlessly resetting > it possibly leading to failures like you are seeing. OK. I'll test them individually. It will probably take a bit because I'll want to do numerous reboots if things seem "fixed" with one or the other. I'll let you know how things go. josh