All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Thomas Hellstr <C3@mwanda>,
	B6@mwanda, m <thomas.hellstrom@linux.intel.com>,
	Huang Rui <ray.huang@amd.com>, David Airlie <airlied@linux.ie>,
	Daniel Vetter <daniel@ffwll.ch>,
	dri-devel@lists.freedesktop.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
Date: Wed, 16 Jun 2021 13:00:38 +0200	[thread overview]
Message-ID: <7354cd94-06bf-ec36-4539-c3570c1775ae@gmail.com> (raw)
In-Reply-To: <20210616093604.GD1901@kadam>



Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>> Sending the first message didn't worked, so let's try again.
>>>>
>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>> There are three bugs here:
>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>       was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>       is no need for this assignment anyway as we already have the value
>>>>>       for "new_mem".
>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>
>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>> ---
>>>>> This is from reading the code and I can't swear that I have understood
>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>> start to work.
>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>
>>>> What's the problem with nouveau?
>>>>
>>> The new Firefox seems to excersize nouveau more than the old one so
>>> when I start 10 firefox windows it just hangs the graphics.
>>>
>>> I've added debug code and it seems like the problem is that
>>> nv50_mem_new() is failing.
>> Sounds like it is running out of memory to me.
>>
>> Do you have a dmesg?
>>
> At first there was a very straight forward use after free bug which I
> fixed.
> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>
> But now the use after free is gone the only thing in dmesg is:
> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>
> [  205.489763] rfkill: input handler disabled
> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> [  296.150632] [TTM] Buffer eviction failed
> [  417.084265] [TTM] Buffer eviction failed
> [  447.295961] [TTM] Buffer eviction failed
> [  510.800231] [TTM] Buffer eviction failed
> [  556.101384] [TTM] Buffer eviction failed
> [  616.495790] [TTM] Buffer eviction failed
> [  692.014007] [TTM] Buffer eviction failed
>
> The eviction failed message only shows up a minute after the hang so it
> seems more like a symptom than a root cause.

Yeah, look at the timing. What happens is that the buffer eviction timed 
out because the hardware is locked up.

No idea what that could be. It might not even be kernel related at all.

Regards,
Christian.

>
> regards,
> dan carpenter
>


WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Dan Carpenter <dan.carpenter@oracle.com>
Cc: B6@mwanda, m <thomas.hellstrom@linux.intel.com>,
	David Airlie <airlied@linux.ie>,
	linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org,
	Huang Rui <ray.huang@amd.com>, Thomas Hellstr <C3@mwanda>
Subject: Re: [PATCH] drm/ttm: fix error handling in ttm_bo_handle_move_mem()
Date: Wed, 16 Jun 2021 13:00:38 +0200	[thread overview]
Message-ID: <7354cd94-06bf-ec36-4539-c3570c1775ae@gmail.com> (raw)
In-Reply-To: <20210616093604.GD1901@kadam>



Am 16.06.21 um 11:36 schrieb Dan Carpenter:
> On Wed, Jun 16, 2021 at 10:47:14AM +0200, Christian König wrote:
>>
>> Am 16.06.21 um 10:37 schrieb Dan Carpenter:
>>> On Wed, Jun 16, 2021 at 08:46:33AM +0200, Christian König wrote:
>>>> Sending the first message didn't worked, so let's try again.
>>>>
>>>> Am 16.06.21 um 08:30 schrieb Dan Carpenter:
>>>>> There are three bugs here:
>>>>> 1) We need to call unpopulate() if ttm_tt_populate() succeeds.
>>>>> 2) The "new_man = ttm_manager_type(bdev, bo->mem.mem_type);" assignment
>>>>>       was wrong and it was really assigning "new_mem = old_mem;".  There
>>>>>       is no need for this assignment anyway as we already have the value
>>>>>       for "new_mem".
>>>>> 3) The (!new_man->use_tt) condition is reversed.
>>>>>
>>>>> Fixes: ba4e7d973dd0 ("drm: Add the TTM GPU memory manager subsystem.")
>>>>> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
>>>>> ---
>>>>> This is from reading the code and I can't swear that I have understood
>>>>> it correctly.  My nouveau driver is currently unusable and this patch
>>>>> has not helped.  But hopefully if I fix enough bugs eventually it will
>>>>> start to work.
>>>> Well NAK, the code previously looked quite well and you are breaking it now.
>>>>
>>>> What's the problem with nouveau?
>>>>
>>> The new Firefox seems to excersize nouveau more than the old one so
>>> when I start 10 firefox windows it just hangs the graphics.
>>>
>>> I've added debug code and it seems like the problem is that
>>> nv50_mem_new() is failing.
>> Sounds like it is running out of memory to me.
>>
>> Do you have a dmesg?
>>
> At first there was a very straight forward use after free bug which I
> fixed.
> https://lore.kernel.org/nouveau/YMinJwpIei9n1Pn1@mwanda/T/#u
>
> But now the use after free is gone the only thing in dmesg is:
> "[TTM] Buffer eviction failed".  And I have some firmware missing.
>
> [  205.489763] rfkill: input handler disabled
> [  205.678292] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084 failed with error -2
> [  205.678300] nouveau 0000:01:00.0: Direct firmware load for nouveau/nva8_fuc084d failed with error -2
> [  205.678302] nouveau 0000:01:00.0: msvld: unable to load firmware data
> [  205.678304] nouveau 0000:01:00.0: msvld: init failed, -19
> [  296.150632] [TTM] Buffer eviction failed
> [  417.084265] [TTM] Buffer eviction failed
> [  447.295961] [TTM] Buffer eviction failed
> [  510.800231] [TTM] Buffer eviction failed
> [  556.101384] [TTM] Buffer eviction failed
> [  616.495790] [TTM] Buffer eviction failed
> [  692.014007] [TTM] Buffer eviction failed
>
> The eviction failed message only shows up a minute after the hang so it
> seems more like a symptom than a root cause.

Yeah, look at the timing. What happens is that the buffer eviction timed 
out because the hardware is locked up.

No idea what that could be. It might not even be kernel related at all.

Regards,
Christian.

>
> regards,
> dan carpenter
>


  reply	other threads:[~2021-06-16 11:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-10 15:41 question about error handling in ttm_bo_handle_move_mem() Dan Carpenter
2021-06-16  6:30 ` [PATCH] drm/ttm: fix " Dan Carpenter
2021-06-16  6:30   ` Dan Carpenter
2021-06-16  6:46   ` Christian König
2021-06-16  6:46     ` Christian König
2021-06-16  8:37     ` Dan Carpenter
2021-06-16  8:37       ` Dan Carpenter
2021-06-16  8:47       ` Christian König
2021-06-16  8:47         ` Christian König
2021-06-16  9:36         ` Dan Carpenter
2021-06-16  9:36           ` Dan Carpenter
2021-06-16 11:00           ` Christian König [this message]
2021-06-16 11:00             ` Christian König
2021-06-16 19:19             ` Dan Carpenter
2021-06-16 19:19               ` Dan Carpenter
2021-06-17  7:41               ` Christian König
2021-06-17  7:41                 ` Christian König
2021-06-17 16:54                 ` Daniel Vetter
2021-06-17 16:54                   ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7354cd94-06bf-ec36-4539-c3570c1775ae@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=B6@mwanda \
    --cc=C3@mwanda \
    --cc=airlied@linux.ie \
    --cc=dan.carpenter@oracle.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ray.huang@amd.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.