From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C46ECC433E6 for ; Tue, 2 Feb 2021 20:30:21 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 8785A64F61 for ; Tue, 2 Feb 2021 20:30:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232626AbhBBU3v (ORCPT ); Tue, 2 Feb 2021 15:29:51 -0500 Received: from audible.transient.net ([24.143.126.66]:45780 "HELO audible.transient.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S233785AbhBBU1h (ORCPT ); Tue, 2 Feb 2021 15:27:37 -0500 Received: (qmail 31260 invoked from network); 2 Feb 2021 20:26:48 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 2 Feb 2021 20:26:48 -0000 Received: (nullmailer pid 3418 invoked by uid 1000); Tue, 02 Feb 2021 20:26:48 -0000 Date: Tue, 2 Feb 2021 20:26:48 +0000 From: Jamie Heilman To: Karol Herbst , Ben Skeggs , LKML , nouveau Subject: Re: [Nouveau] nouveau regression post v5.8, still present in v5.10 Message-ID: Mail-Followup-To: Karol Herbst , Ben Skeggs , LKML , nouveau References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jamie Heilman wrote: > Jamie Heilman wrote: > > Karol Herbst wrote: > > > fyi, there is a patch which solves a maybe related issue on your GPU, > > > mind giving it a try before we dig further? > > > https://gitlab.freedesktop.org/drm/nouveau/-/issues/14#note_767791 > > > > So, I tried that. Turns out, I can still trigger a problem. Is it > > the same problem? Maybe? I also tried applying the patch from > > > > ca386aa7155a ("drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug") > > to 5.8.0-rc6-01516-g0a96099691c8 and very interestingly, it changed > > the mode of failure to same thing I saw with 5.10.9 patched with the patch > > from that bug report. In both cases I get this in the log: > > > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > ... > > and so on > > > > In one incident my monitor would't even wake up anymore after this. > > > > > > I'm trying to repo it now on an unpatched 5.8.0-rc6-01515-gae09163ac27c > > right now, as running glxgears does seem to help reproduce problems > > faster which is nice, I'm just not entirely sure it's the same set of > > problems; hopefully that version is free from issues, but we'll > > see... > > Ugh, well I can crash 5.8.0-rc6-01515-gae09163ac27c and 5.8.18 in > basically the same way running glxgears and a xset dpms force off > loop. So I'm starting to think it's not the same thing, and that > problem has been latent from before I started having periodic issues. > > I should note that my exact testing technique for the above was to run > 4 copies of glxgears and the xset force dpms off loop at the same > time. Really looks more like it triggers a resource starvation issue > maybe. The crash is also worse, particularly if I don't do anything > about it right away as my workstation eventually falls off the network > and I'm forced to power cycle it; the crashes I was chasing after > wouldn't do quite that much violence, normally I could still log in, > rebuild a kernel, and shut things down cleanly. > > More than one bug here I suspect. OK, I went back and bisected again while patching known issues to get a better idea what was causing the problem I've been having and I'm confident it was the bug which Bastian Beranek's patch (now in mainline) addressed. My original bisection got confused by the EVO push buffer HW bug which was fixed in ca386aa7155a54. Once I bisected with the patch from ca386aa7155a54 applied, my bisection landed on f844eb485eb05 and Bastian Beranek's patch fixed that right up. 'course I remain mildly concerned I can crash the kernel with little more than glxgears and xset ... but the original stability problem I reported I can safely say has been fixed. If I can figure out the nature of what I suspect is unrecoverable resource starvation, I'll start a new thread for that. -- Jamie Heilman http://audible.transient.net/~jamie/ From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B5A8AC433E0 for ; Tue, 2 Feb 2021 20:30:19 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 505A364F61 for ; Tue, 2 Feb 2021 20:30:19 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 505A364F61 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=audible.transient.net Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=nouveau-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E22876E222; Tue, 2 Feb 2021 20:30:18 +0000 (UTC) Received: from audible.transient.net (audible.transient.net [24.143.126.66]) by gabe.freedesktop.org (Postfix) with SMTP id 1CEAD8921D for ; Tue, 2 Feb 2021 20:26:50 +0000 (UTC) Received: (qmail 31260 invoked from network); 2 Feb 2021 20:26:48 -0000 Received: from cucamonga.audible.transient.net (192.168.2.5) by canarsie.audible.transient.net with QMQP; 2 Feb 2021 20:26:48 -0000 Received: (nullmailer pid 3418 invoked by uid 1000); Tue, 02 Feb 2021 20:26:48 -0000 Date: Tue, 2 Feb 2021 20:26:48 +0000 From: Jamie Heilman To: Karol Herbst , Ben Skeggs , LKML , nouveau Message-ID: Mail-Followup-To: Karol Herbst , Ben Skeggs , LKML , nouveau References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-Mailman-Approved-At: Tue, 02 Feb 2021 20:30:17 +0000 Subject: Re: [Nouveau] nouveau regression post v5.8, still present in v5.10 X-BeenThere: nouveau@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Nouveau development list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: nouveau-bounces@lists.freedesktop.org Sender: "Nouveau" Jamie Heilman wrote: > Jamie Heilman wrote: > > Karol Herbst wrote: > > > fyi, there is a patch which solves a maybe related issue on your GPU, > > > mind giving it a try before we dig further? > > > https://gitlab.freedesktop.org/drm/nouveau/-/issues/14#note_767791 > > > > So, I tried that. Turns out, I can still trigger a problem. Is it > > the same problem? Maybe? I also tried applying the patch from > > > > ca386aa7155a ("drm/nouveau/kms/nv50-gp1xx: add WAR for EVO push buffer HW bug") > > to 5.8.0-rc6-01516-g0a96099691c8 and very interestingly, it changed > > the mode of failure to same thing I saw with 5.10.9 patched with the patch > > from that bug report. In both cases I get this in the log: > > > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > kern.err: nouveau 0000:01:00.0: Xorg[2243]: nv50cal_space: -16 > > ... > > and so on > > > > In one incident my monitor would't even wake up anymore after this. > > > > > > I'm trying to repo it now on an unpatched 5.8.0-rc6-01515-gae09163ac27c > > right now, as running glxgears does seem to help reproduce problems > > faster which is nice, I'm just not entirely sure it's the same set of > > problems; hopefully that version is free from issues, but we'll > > see... > > Ugh, well I can crash 5.8.0-rc6-01515-gae09163ac27c and 5.8.18 in > basically the same way running glxgears and a xset dpms force off > loop. So I'm starting to think it's not the same thing, and that > problem has been latent from before I started having periodic issues. > > I should note that my exact testing technique for the above was to run > 4 copies of glxgears and the xset force dpms off loop at the same > time. Really looks more like it triggers a resource starvation issue > maybe. The crash is also worse, particularly if I don't do anything > about it right away as my workstation eventually falls off the network > and I'm forced to power cycle it; the crashes I was chasing after > wouldn't do quite that much violence, normally I could still log in, > rebuild a kernel, and shut things down cleanly. > > More than one bug here I suspect. OK, I went back and bisected again while patching known issues to get a better idea what was causing the problem I've been having and I'm confident it was the bug which Bastian Beranek's patch (now in mainline) addressed. My original bisection got confused by the EVO push buffer HW bug which was fixed in ca386aa7155a54. Once I bisected with the patch from ca386aa7155a54 applied, my bisection landed on f844eb485eb05 and Bastian Beranek's patch fixed that right up. 'course I remain mildly concerned I can crash the kernel with little more than glxgears and xset ... but the original stability problem I reported I can safely say has been fixed. If I can figure out the nature of what I suspect is unrecoverable resource starvation, I'll start a new thread for that. -- Jamie Heilman http://audible.transient.net/~jamie/ _______________________________________________ Nouveau mailing list Nouveau@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau