From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6AD8FC433DB for ; Mon, 22 Mar 2021 09:36:49 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id E4C8B6186A for ; Mon, 22 Mar 2021 09:36:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E4C8B6186A Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bugzilla.kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 03D6689A88; Mon, 22 Mar 2021 09:36:48 +0000 (UTC) Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by gabe.freedesktop.org (Postfix) with ESMTPS id CDB2B89A88 for ; Mon, 22 Mar 2021 09:36:46 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPS id 90DDE6192A for ; Mon, 22 Mar 2021 09:36:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1616405805; bh=HydhpkgmhLllmxzVmuAshmlFLOh0VumgOTDHzk9DKNU=; h=From:To:Subject:Date:In-Reply-To:References:From; b=f+q7TUuqByUCGHpevpDn+LEOtWR20V/ZQqxV4trrDnLf7ctxKUyYMir4iUhkKzBFX HMVqQrfIfAc9/UdA1zAA1t+O2xU69q13e05YzfnrBxhW/+GmxvxPc1Xqxztoddizly vB4AYl57O8zI/Dmtvceh4sydMVWSjKYxr1gTX/w09KoHUcIWToTHqXogy8OKNy9sx+ 3EaajK0Ysl67v/LVvgraJg2zk5rkyT5urkMC/2UlpWJWsmqINofSWy4vpQKss/yA+d 6eDctNPuKPOm0itPaxvWlE+1/6AY3Xg63RJYRtpwl/XaegdB5cexVN3AjscslDvaTJ Wdz7E3aPPssJg== From: bugzilla-daemon@bugzilla.kernel.org To: dri-devel@lists.freedesktop.org Subject: [Bug 206475] amdgpu under load drop signal to monitor until hard reset Date: Mon, 22 Mar 2021 09:36:45 +0000 X-Bugzilla-Reason: None X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: AssignedTo drivers_video-dri@kernel-bugs.osdl.org X-Bugzilla-Product: Drivers X-Bugzilla-Component: Video(DRI - non Intel) X-Bugzilla-Version: 2.5 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: rodomar705@protonmail.com X-Bugzilla-Status: RESOLVED X-Bugzilla-Resolution: ANSWERED X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: drivers_video-dri@kernel-bugs.osdl.org X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: X-Bugzilla-URL: https://bugzilla.kernel.org/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" https://bugzilla.kernel.org/show_bug.cgi?id=206475 Marco (rodomar705@protonmail.com) changed: What |Removed |Added ---------------------------------------------------------------------------- Status|REOPENED |RESOLVED Resolution|--- |ANSWERED --- Comment #20 from Marco (rodomar705@protonmail.com) --- I finally got where the problem was, and completely fixed it. It was hardware. The issue was the heatsink was not contacting completely a section on the mosfets that was feeding power to the core of the card. Under full load they was thermal tripping for overheating and completely stalling the card to avoid damages to themselves. The problem was that this card wasn't reporting the temps of them to software, even if the actual vrm controller was (or if it was shutting down only when the mosfet trigger purely a signal asserting the thermal runaway condition). This was hell to debug and fix, as always with hardware problems, but after a stress test on both Windows and Linux under full clock, the issue is not present anymore. I'll keep my optimized clocks for lower temperatures and less fan noise, but for me the issue wasn't software. -- You may reply to this email to add a comment. You are receiving this mail because: You are watching the assignee of the bug. _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel