From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 19959C433DF for ; Thu, 25 Jun 2020 09:58:50 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id DEFB42076E for ; Thu, 25 Jun 2020 09:58:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DEFB42076E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bugzilla.kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 599BF6EA1E; Thu, 25 Jun 2020 09:58:49 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7766E6EA1E for ; Thu, 25 Jun 2020 09:58:47 +0000 (UTC) From: bugzilla-daemon@bugzilla.kernel.org To: dri-devel@lists.freedesktop.org Subject: [Bug 206475] amdgpu under load drop signal to monitor until hard reset Date: Thu, 25 Jun 2020 09:58:46 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo drivers_video-dri@kernel-bugs.osdl.org X-Bugzilla-Product: Drivers X-Bugzilla-Component: Video(DRI - non Intel) X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: andrewammerlaan@riseup.net X-Bugzilla-Status: REOPENED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: drivers_video-dri@kernel-bugs.osdl.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" https://bugzilla.kernel.org/show_bug.cgi?id=206475 --- Comment #17 from Andrew Ammerlaan (andrewammerlaan@riseup.net) --- (In reply to Alex Deucher from comment #16) > When the GPU is in reset all reads to the MMIO BAR return 1s so you are just > getting all ones until the reset succeeds. 511 is just all ones. This > patch will fix that issue: > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > ?id=9271dfd9e0f79e2969dcbe28568bce0fdc4f8f73 Well there goes my hypotheses of the broken thermal sensor xD. I did discover yesterday that the fan of my GPU spins relatively slow under high load. When the GPU reached ~80 degrees Celsius, the fan didn't even spin at half the maximum RPM! I used the pwmconfig script and the fancontrol service from lm_sensors to force the fan to go to the maximum RPM just before reaching 80 degrees Celsius. It's very noisy, *but* the GPU stays well below 70 degrees Celsius now, even under heavy load. As this issue seems to occur only when the GPU is hotter then ~75 degrees Celsius, I'm hoping that this will help in preventing the problem. I'm still confused as to why this is at all necessary, the critical temperature is 91, so why do I encounter these issues at ~80? -- You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel