From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.4 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F41ABC49EA2 for ; Wed, 16 Jun 2021 11:00:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id DB7A061351 for ; Wed, 16 Jun 2021 11:00:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232053AbhFPLCv (ORCPT ); Wed, 16 Jun 2021 07:02:51 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34182 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231270AbhFPLCt (ORCPT ); Wed, 16 Jun 2021 07:02:49 -0400 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2F9C8C061574 for ; Wed, 16 Jun 2021 04:00:42 -0700 (PDT) Received: by mail-ed1-x52c.google.com with SMTP id w21so2114185edv.3 for ; Wed, 16 Jun 2021 04:00:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=sYAv+AVeb+Fl5OMylSeVqqG1epfvgEGQg3zps/TWGMY=; b=aO4lKMzMpD+C0oS3UIKPWBQnGIq1n3zblm08F6g5xcTnuaeQtVbzjzgNnIYGEJvmCA t1i9+82HhRWJ8pJrJAIYZCjbS3NlK2vP5ZOU+wOGzKczgcx7YG6DOt1tprLBASNoBW+j JcgEtKpdc8N/7puP3wZsjl167ne/Ch5MonvJIOqsO+hRfv9zqRnnHIJyVWLDdxOWa2tl aEF2V3gS4SyI0GWZ+I7Go9ZAkV6VEfWMYaIo5VXozGt9erIyl9hhhtXvaVp6i/7bIMhL KToyYClsjbvgeaEFgL+zIvDPRumjvqm0dqm0jmyVIYiMRlDYXuvQgAlxhw01Thwl/8rZ jUVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=sYAv+AVeb+Fl5OMylSeVqqG1epfvgEGQg3zps/TWGMY=; b=AlY0hEd+ElaWB2+xTDQUholYsFD73CN3BjlRlJu5J+IXWLbPNN6A3xl5di5kCKzrWu 9yUmBU0paXyrw3Pf+tApgQhRDdg6YASH6NxpiG5fA/3dA8rhcoyuxpeCpGAIVxwp+E/N yIQ392c8GUJuXXP2Kiz6GIYudhraYQE8ElTVn5Rorc+oG5VACDIeX/romg7MW+pCI8l+ bqB56FGAkEUMEZBCe7mTE1BirI+9Vxptm5gcqh3OMfQv8w4mR7mwA1ybGis+BJWwgcIT Kmr7zh2wT55NjhetOvDF4I3IOng3P10Yd+13VUWPNrMoljOLVNB64+uYzQUbr2gsPYdt es4g== X-Gm-Message-State: AOAM5313UMrHCSezQcbLx4KT/UsABGjIkbGpFGK05x8mxZPpTxNbW9P3 5Vp7XFwBiCAbppXbxHTqlwaK0EbHpzI= X-Google-Smtp-Source: ABdhPJwALbarIGLPP4tky17k4V8t9tif2+4uah9WHyiY+qOeSeASu1heOe0dMT7dChJVH9vsXheYaA== X-Received: by 2002:a05:6402:487:: with SMTP id k7mr3323656edv.315.1623841240803; Wed, 16 Jun 2021 04:00:40 -0700 (PDT) Received: from ?IPv6:2a02:908:1252:fb60:afc4:3771:10a6:8a6d? ([2a02:908:1252:fb60:afc4:3771:10a6:8a6d]) by smtp.gmail.com with ESMTPSA id d22sm1392068ejj.47.2021.06.16.04.00.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Jun 2021 04:00:40 -0700 (PDT) Subject: Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem() To: Dan Carpenter Cc: Thomas Hellstr , B6@mwanda, m , Huang Rui , David Airlie , Daniel Vetter , dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org References: <03d0b798-d1ab-5b6f-2c27-8140d923d445@gmail.com> <20210616083758.GC1901@kadam> <520a9d1f-8841-8d5e-595d-23783de8333d@gmail.com> <20210616093604.GD1901@kadam> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <7354cd94-06bf-ec36-4539-c3570c1775ae@gmail.com> Date: Wed, 16 Jun 2021 13:00:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210616093604.GD1901@kadam> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Am 16.06.21 um 11:36 schrieb Dan Carpenter: > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote: >> >> Am 16.06.21 um 10:37 schrieb Dan Carpenter: >>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote: >>>> Sending the first message didn't worked, so let's try again. >>>> >>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter: >>>>> There are three bugs here: >>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds. >>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment >>>>> was wrong and it was really assigning "new_mem = old_mem;". There >>>>> is no need for this assignment anyway as we already have the value >>>>> for "new_mem". >>>>> 3) The (!new_man->use_tt) condition is reversed. >>>>> >>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.") >>>>> Signed-off-by: Dan Carpenter >>>>> --- >>>>> This is from reading the code and I can't swear that I have understood >>>>> it correctly. My nouveau driver is currently unusable and this patch >>>>> has not helped. But hopefully if I fix enough bugs eventually it will >>>>> start to work. >>>> Well NAK, the code previously looked quite well and you are breaking it now. >>>> >>>> What's the problem with nouveau? >>>> >>> The new Firefox seems to excersize nouveau more than the old one so >>> when I start 10 firefox windows it just hangs the graphics. >>> >>> I've added debug code and it seems like the problem is that >>> nv50_mem_new() is failing. >> Sounds like it is running out of memory to me. >> >> Do you have a dmesg? >> > At first there was a very straight forward use after free bug which I > fixed. > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u > > But now the use after free is gone the only thing in dmesg is: > "[TTM] Buffer eviction failed". And I have some firmware missing. > > [ 205.489763] rfkill: input handler disabled > [ 205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 > [ 205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 > [ 205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data > [ 205.678304] nouveau 0000:01:00.0: msvld: init failed, -19 > [ 296.150632] [TTM] Buffer eviction failed > [ 417.084265] [TTM] Buffer eviction failed > [ 447.295961] [TTM] Buffer eviction failed > [ 510.800231] [TTM] Buffer eviction failed > [ 556.101384] [TTM] Buffer eviction failed > [ 616.495790] [TTM] Buffer eviction failed > [ 692.014007] [TTM] Buffer eviction failed > > The eviction failed message only shows up a minute after the hang so it > seems more like a symptom than a root cause. Yeah, look at the timing. What happens is that the buffer eviction timed out because the hardware is locked up. No idea what that could be. It might not even be kernel related at all. Regards, Christian. > > regards, > dan carpenter > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.2 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, DKIM_INVALID,DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,MAILING_LIST_MULTI, NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB363C48BE5 for ; Wed, 16 Jun 2021 11:00:43 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AB0B2611CA for ; Wed, 16 Jun 2021 11:00:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AB0B2611CA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=dri-devel-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 221D16E102; Wed, 16 Jun 2021 11:00:43 +0000 (UTC) Received: from mail-ed1-x529.google.com (mail-ed1-x529.google.com [IPv6:2a00:1450:4864:20::529]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1E4736E102 for ; Wed, 16 Jun 2021 11:00:42 +0000 (UTC) Received: by mail-ed1-x529.google.com with SMTP id t3so2088201edc.7 for ; Wed, 16 Jun 2021 04:00:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding:content-language; bh=sYAv+AVeb+Fl5OMylSeVqqG1epfvgEGQg3zps/TWGMY=; b=aO4lKMzMpD+C0oS3UIKPWBQnGIq1n3zblm08F6g5xcTnuaeQtVbzjzgNnIYGEJvmCA t1i9+82HhRWJ8pJrJAIYZCjbS3NlK2vP5ZOU+wOGzKczgcx7YG6DOt1tprLBASNoBW+j JcgEtKpdc8N/7puP3wZsjl167ne/Ch5MonvJIOqsO+hRfv9zqRnnHIJyVWLDdxOWa2tl aEF2V3gS4SyI0GWZ+I7Go9ZAkV6VEfWMYaIo5VXozGt9erIyl9hhhtXvaVp6i/7bIMhL KToyYClsjbvgeaEFgL+zIvDPRumjvqm0dqm0jmyVIYiMRlDYXuvQgAlxhw01Thwl/8rZ jUVg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=sYAv+AVeb+Fl5OMylSeVqqG1epfvgEGQg3zps/TWGMY=; b=mDl6NwWeU4/+w2up6vh2S0dKvNXpcrvSTFYIFBSmnr+EpUgWZNz5C8ivAaRh9Be03s TCOpNoceAl26zOiu5op6I8Eg+J/mV30PF64xy1aGjv52SbDua9JA1wCSncmIFWN2IssO P/G8v18UCSLLEfUNerj6tm3tSvtsp/sbSJfa5IbRaloXekeKPPFGIzuLNmI/ZQ4DXYmI ldgHQ5dlIr+lxrOhrec4Nj0AWktA8rC1wrauF9yBIyvryLZtaJRa61X8EW+EcBHSslPl gIfPYjiI8WpjpU1sayxFlhm2vuqR1Z8ZRxFkg+iet3O5tULebK75F6ppR3eu0goo+Ewr z9Dg== X-Gm-Message-State: AOAM533Cld8x+/2eeUigh9rjfHehwRrqpdpcx9BB6kioQURROgS4n6be qxawLVcJIKgh7Q3FIG3aahs= X-Google-Smtp-Source: ABdhPJwALbarIGLPP4tky17k4V8t9tif2+4uah9WHyiY+qOeSeASu1heOe0dMT7dChJVH9vsXheYaA== X-Received: by 2002:a05:6402:487:: with SMTP id k7mr3323656edv.315.1623841240803; Wed, 16 Jun 2021 04:00:40 -0700 (PDT) Received: from ?IPv6:2a02:908:1252:fb60:afc4:3771:10a6:8a6d? ([2a02:908:1252:fb60:afc4:3771:10a6:8a6d]) by smtp.gmail.com with ESMTPSA id d22sm1392068ejj.47.2021.06.16.04.00.39 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 16 Jun 2021 04:00:40 -0700 (PDT) Subject: Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem() To: Dan Carpenter References: <03d0b798-d1ab-5b6f-2c27-8140d923d445@gmail.com> <20210616083758.GC1901@kadam> <520a9d1f-8841-8d5e-595d-23783de8333d@gmail.com> <20210616093604.GD1901@kadam> From: =?UTF-8?Q?Christian_K=c3=b6nig?= Message-ID: <7354cd94-06bf-ec36-4539-c3570c1775ae@gmail.com> Date: Wed, 16 Jun 2021 13:00:38 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210616093604.GD1901@kadam> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: B6@mwanda, m , David Airlie , linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, Huang Rui , Thomas Hellstr Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Am 16.06.21 um 11:36 schrieb Dan Carpenter: > On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote: >> >> Am 16.06.21 um 10:37 schrieb Dan Carpenter: >>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote: >>>> Sending the first message didn't worked, so let's try again. >>>> >>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter: >>>>> There are three bugs here: >>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds. >>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment >>>>> was wrong and it was really assigning "new_mem = old_mem;". There >>>>> is no need for this assignment anyway as we already have the value >>>>> for "new_mem". >>>>> 3) The (!new_man->use_tt) condition is reversed. >>>>> >>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.") >>>>> Signed-off-by: Dan Carpenter >>>>> --- >>>>> This is from reading the code and I can't swear that I have understood >>>>> it correctly. My nouveau driver is currently unusable and this patch >>>>> has not helped. But hopefully if I fix enough bugs eventually it will >>>>> start to work. >>>> Well NAK, the code previously looked quite well and you are breaking it now. >>>> >>>> What's the problem with nouveau? >>>> >>> The new Firefox seems to excersize nouveau more than the old one so >>> when I start 10 firefox windows it just hangs the graphics. >>> >>> I've added debug code and it seems like the problem is that >>> nv50_mem_new() is failing. >> Sounds like it is running out of memory to me. >> >> Do you have a dmesg? >> > At first there was a very straight forward use after free bug which I > fixed. > https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u > > But now the use after free is gone the only thing in dmesg is: > "[TTM] Buffer eviction failed". And I have some firmware missing. > > [ 205.489763] rfkill: input handler disabled > [ 205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2 > [ 205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2 > [ 205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data > [ 205.678304] nouveau 0000:01:00.0: msvld: init failed, -19 > [ 296.150632] [TTM] Buffer eviction failed > [ 417.084265] [TTM] Buffer eviction failed > [ 447.295961] [TTM] Buffer eviction failed > [ 510.800231] [TTM] Buffer eviction failed > [ 556.101384] [TTM] Buffer eviction failed > [ 616.495790] [TTM] Buffer eviction failed > [ 692.014007] [TTM] Buffer eviction failed > > The eviction failed message only shows up a minute after the hang so it > seems more like a symptom than a root cause. Yeah, look at the timing. What happens is that the buffer eviction timed out because the hardware is locked up. No idea what that could be. It might not even be kernel related at all. Regards, Christian. > > regards, > dan carpenter >