All of lore.kernel.org
 help / color / mirror / Atom feed
* [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip
@ 2019-02-09 14:17 bugzilla-daemon
  2019-02-09 14:27 ` [Bug 202537] " bugzilla-daemon
                   ` (22 more replies)
  0 siblings, 23 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-09 14:17 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

            Bug ID: 202537
           Summary: amdgpu/DC failed to reserve new abo buffer before flip
           Product: Drivers
           Version: 2.5
    Kernel Version: 4.20
          Hardware: All
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: linux@bernd-steinhauser.de
        Regression: No

I've been using amdgpu for a long time on my Kaveri (A7800) now and it works
fine.
In the recent kernel versions (I think since 4.15), I've been trying it with
the DC activated and apart from some initial issue with the HDMI connection it
works fine and on 4.19 it's rock-stable.

However, when I tried 4.20 (all versions from 4.20.1 to 4.20.6), I'm
experiencing a regression.
Initially, everything works fine, but at some point especially video-related
things stop working properly.
vaapi seems more affected than vdpau, but at some point they both fail to setup
the hw decoding.
(btw, vdpau for some reason can only run for one video at the time, while vaapi
can do multiple ones, but that's an unrelated issue. It used to be different a
year ago or even earlier on)

I think the problems start when I see a lot of messages like this:
[drm:amdgpu_display_crtc_page_flip_target] *ERROR* failed to reserve new abo
buffer before flip

However, after that I can continue for a bit, possibly with the restriction of
not being able to use vaapi but vdpau.
At some point, the system will fail due to a memory leak, at least the OOM is
starting to kill stuff until it ends up killing the window manager and X11.
Before that I get these messages:
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_vm_validate_pt_bos() failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_cs_list_validate(validated) failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!
[drm:amdgpu_cs_ioctl] *ERROR* amdgpu_vm_validate_pt_bos() failed.
[drm:amdgpu_cs_ioctl] *ERROR* Not enough memory for command submission!

--- snip --- (lots of oom activity)

and finally:
[TTM] Out of kernel memory
[TTM] Out of kernel memory
[TTM] Out of kernel memory
[TTM] Out of kernel memory
[TTM] Out of kernel memory
[TTM] Out of kernel memory
[TTM] Out of kernel memory
amdgpu 0000:00:01.0: (-12) failed to allocate kernel bo
[drm:amdgpu_uvd_free_handles] *ERROR* Error destroying UVD -12!

The latter one – I think – actually being the solution to the OOM problem, but
I'm certainly not an expert.

CPU/GPU is:
vendor_id       : AuthenticAMD
cpu family      : 21
model           : 48
model name      : AMD A10-7800 Radeon R7, 12 Compute Cores 4C+8G
stepping        : 1
microcode       : 0x6003106
cpu MHz         : 1592.730
cache size      : 2048 KB

Back to 4.19 for now since that runs beautifully.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
@ 2019-02-09 14:27 ` bugzilla-daemon
  2019-02-11 10:47 ` bugzilla-daemon
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-09 14:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #1 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Created attachment 281077
  --> https://bugzilla.kernel.org/attachment.cgi?id=281077&action=edit
kernel messages

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
  2019-02-09 14:27 ` [Bug 202537] " bugzilla-daemon
@ 2019-02-11 10:47 ` bugzilla-daemon
  2019-02-11 17:09 ` bugzilla-daemon
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-11 10:47 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

Michel Dänzer (michel@daenzer.net) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |harry.wentland@amd.com,
                   |                            |nicholas.kazlauskas@amd.com

--- Comment #2 from Michel Dänzer (michel@daenzer.net) ---
Yeah, looks like a memory leak.

Please bisect and/or provide kmemleak output, otherwise it might be difficult
to make progress on this issue.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
  2019-02-09 14:27 ` [Bug 202537] " bugzilla-daemon
  2019-02-11 10:47 ` bugzilla-daemon
@ 2019-02-11 17:09 ` bugzilla-daemon
  2019-02-11 17:20 ` bugzilla-daemon
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-11 17:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #3 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Sure, I can try to bisect it, but it would help if I could narrow the amount of
commits down, because usually the problem doesn't come right away, so it would
take some time to find out.
e.g. restricting to commits made in drivers/gpu/drm/amd would result in about 8
steps instead of 13.
It would really help if I could narrow it even down further, like a subset of
files?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (2 preceding siblings ...)
  2019-02-11 17:09 ` bugzilla-daemon
@ 2019-02-11 17:20 ` bugzilla-daemon
  2019-02-11 17:50 ` bugzilla-daemon
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-11 17:20 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #4 from Michel Dänzer (michel@daenzer.net) ---
drivers/gpu/drm/amd/display/ seems likely. Even if the result from that doesn't
make sense, it should at least narrow down the other commits you need to test.

Or maybe just start with kmemleak?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (3 preceding siblings ...)
  2019-02-11 17:20 ` bugzilla-daemon
@ 2019-02-11 17:50 ` bugzilla-daemon
  2019-02-11 18:59 ` bugzilla-daemon
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-11 17:50 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #5 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
I'll have a look at kmemleak, but I've never worked with it, so it would be
nice to have a backup in case I don't get along with it. ;)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (4 preceding siblings ...)
  2019-02-11 17:50 ` bugzilla-daemon
@ 2019-02-11 18:59 ` bugzilla-daemon
  2019-02-12  9:07 ` bugzilla-daemon--- via dri-devel
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-11 18:59 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #6 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Created attachment 281105
  --> https://bugzilla.kernel.org/attachment.cgi?id=281105&action=edit
kmemleak output with 4.20.6

So I let kmemleak do a scan and this is the output.
In case it matters, I let mpv render a video with hwdec vaapi, since that is
how I first noticed that something's going wrong.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (5 preceding siblings ...)
  2019-02-11 18:59 ` bugzilla-daemon
@ 2019-02-12  9:07 ` bugzilla-daemon--- via dri-devel
  2019-02-12 15:15 ` bugzilla-daemon--- via dri-devel
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-12  9:07 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #7 from Michel Dänzer (michel@daenzer.net) ---
kmemleak is claiming there are leaks all over the place. That's weird, since
other people (including myself) aren't seeing any such leaks, also with 4.20
based kernels.

So, I'm afraid this indicates some lower level issue, and you'll have to bisect
without making any assumptions about where the problem lies.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (6 preceding siblings ...)
  2019-02-12  9:07 ` bugzilla-daemon--- via dri-devel
@ 2019-02-12 15:15 ` bugzilla-daemon--- via dri-devel
  2019-02-12 17:09 ` bugzilla-daemon--- via dri-devel
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-12 15:15 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |pmenzel+bugzilla.kernel.org
                   |                            |@molgen.mpg.de

--- Comment #8 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
I hit kmemleak problems, and reported those at freedesktop.org [1].
Unfortuntately, I have not had access to the system after the report, and won’t
have until the end of next week.

[1]: https://bugs.freedesktop.org/show_bug.cgi?id=109389
     "[Bug 109389] memory leak in `amdgpu_bo_create()`"

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (7 preceding siblings ...)
  2019-02-12 15:15 ` bugzilla-daemon--- via dri-devel
@ 2019-02-12 17:09 ` bugzilla-daemon--- via dri-devel
  2019-02-14  7:34 ` bugzilla-daemon--- via dri-devel
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-12 17:09 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #9 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
(In reply to Michel Dänzer from comment #7)
> kmemleak is claiming there are leaks all over the place. That's weird, since
> other people (including myself) aren't seeing any such leaks, also with 4.20
> based kernels.
> 
> So, I'm afraid this indicates some lower level issue, and you'll have to
> bisect without making any assumptions about where the problem lies.

Well, after my test above I crosschecked this on 4.19.20 and I definitely don't
see any memleaks there.

So now that I know how to perform a quick test for each kernel version,
bisecting this shouldn't be a big deal anymore and I'll try to do that later
on.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (8 preceding siblings ...)
  2019-02-12 17:09 ` bugzilla-daemon--- via dri-devel
@ 2019-02-14  7:34 ` bugzilla-daemon--- via dri-devel
  2019-02-15 22:37 ` bugzilla-daemon--- via dri-devel
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-14  7:34 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #10 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Unfortunately this turns out to be much harder than expected, because about 1/3
of the revs to test just won't boot at all (like instant kernel panic and not
responding).
This problem was fixed somewhere in the release candidates of 4.19, but I first
need to track down the fix so I can properly continue with the bisect.

Of the rest, another 1/3 of the revs do boot, but only with a black screen.
While I can ssh into the system and check for memleaks, I don't think it's a
proper test, because it seems to me as if amdgpu failed to initialize properly.
So I need to track down the fix for this (again somewhere in the release
candidates of 4.19) as well.

So far, all I can be sure of is that the responsible commit was before
v4.19-rc5 was backmerged into drm-next and drm-misc-next
(7b76d0588477d4b6097a9048b42835a45caf5c48).
But that still leaves quite a few commits to test.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (9 preceding siblings ...)
  2019-02-14  7:34 ` bugzilla-daemon--- via dri-devel
@ 2019-02-15 22:37 ` bugzilla-daemon--- via dri-devel
  2019-02-16 19:17 ` bugzilla-daemon--- via dri-devel
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-15 22:37 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #11 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Ok, so finally I think I've been able to track this down.
Not 100% sure, because for the final test versions I had to apply a few patches
to fix bugs that otherwise would've prevented tests.
In any case, this was the first version that showed this massive amount of
memleaks, before there were only 6 (of which 4 were related to HID and ACPI).
5d35ed4832dab334e076a24c18a52776c2f24911 is the first bad commit
commit 5d35ed4832dab334e076a24c18a52776c2f24911
Author: Christian König <christian.koenig@amd.com>
Date:   Fri Aug 31 11:08:06 2018 +0200

    drm/amdgpu: fix idle state and bulk_moveable flag
····
    Add BOs to the idle state again and correctly clear the flag when
    new BOs are added.
····
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Tested-by: Michel Dänzer <michel.daenzer@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 28e778e55b368e605e6f2df4efea4be5f324d4ae
371220da179e31b7d2c97741dd984cb896fcb4c4 M      drivers

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (10 preceding siblings ...)
  2019-02-15 22:37 ` bugzilla-daemon--- via dri-devel
@ 2019-02-16 19:17 ` bugzilla-daemon--- via dri-devel
  2019-02-17  8:07 ` bugzilla-daemon--- via dri-devel
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-16 19:17 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

Christian König (christian.koenig@amd.com) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |christian.koenig@amd.com

--- Comment #12 from Christian König (christian.koenig@amd.com) ---
Well that was a known issue, but it should be fixed with 4.20.

Sorry to note that, but you most likely have a bisect result of a patch causing
a memory leak which is already fixed.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (11 preceding siblings ...)
  2019-02-16 19:17 ` bugzilla-daemon--- via dri-devel
@ 2019-02-17  8:07 ` bugzilla-daemon--- via dri-devel
  2019-02-17  8:36 ` bugzilla-daemon--- via dri-devel
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-17  8:07 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #13 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
meh …

well, I just tested 4.20.10 and I do still see a lot of memory leaks there.
kmemleak: 93 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

And it looks an awful lot the same as when I checked the commit id above.
So if you say it was fixed, it might be helpful to point me to the commit id
when it was fixed, so I can check that and use it as a starting/reference
point.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (12 preceding siblings ...)
  2019-02-17  8:07 ` bugzilla-daemon--- via dri-devel
@ 2019-02-17  8:36 ` bugzilla-daemon--- via dri-devel
  2019-02-17  8:57 ` bugzilla-daemon--- via dri-devel
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-17  8:36 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #14 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
Hi. Testing Linux 5.0-rc6 for some minutes, I am not seeing these kmemleak
messages anymore on the MSI Mortar B350M. Bernd, could you test this?

Unfortunately, the commit supposedly fixing the introduced leak by the commit
you bisected does not have a Fixes tag. At least the command below returns
nothing.

    git log --grep "5d35ed4832d" origin/master

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (13 preceding siblings ...)
  2019-02-17  8:36 ` bugzilla-daemon--- via dri-devel
@ 2019-02-17  8:57 ` bugzilla-daemon--- via dri-devel
  2019-02-17 10:00 ` bugzilla-daemon--- via dri-devel
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-17  8:57 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #15 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Sure, can test that one.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (14 preceding siblings ...)
  2019-02-17  8:57 ` bugzilla-daemon--- via dri-devel
@ 2019-02-17 10:00 ` bugzilla-daemon--- via dri-devel
  2019-02-18  7:45 ` bugzilla-daemon
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon--- via dri-devel @ 2019-02-17 10:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #16 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Created attachment 281177
  --> https://bugzilla.kernel.org/attachment.cgi?id=281177&action=edit
kmemleak output with 5.0-rc6

Nope sorry, I see the same with kernel 5.0-rc6.

btw, those first two leaks which start with acpi functions, I've seen those in
every version I tested, including the later 4.19 versions.
Don't know if I should open a bug report about that one (or maybe there is
already one).

I'll wait with further testing for the commit id of the fix.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (15 preceding siblings ...)
  2019-02-17 10:00 ` bugzilla-daemon--- via dri-devel
@ 2019-02-18  7:45 ` bugzilla-daemon
  2019-02-18  8:08 ` bugzilla-daemon
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-18  7:45 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #17 from Christian König (christian.koenig@amd.com) ---
We completely disabled the feature added in "5d35ed4832d" for upstreaming later
on.

Can you guys please test amd-staging-drm-next as well and check if the problem
occurs there as well. If not then please bisect what fixed it.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (16 preceding siblings ...)
  2019-02-18  7:45 ` bugzilla-daemon
@ 2019-02-18  8:08 ` bugzilla-daemon
  2019-02-18  9:02 ` bugzilla-daemon
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-18  8:08 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #18 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
(In reply to Christian König from comment #17)
> We completely disabled the feature added in "5d35ed4832d" for upstreaming
> later on.

Sorry, I do not understand your reply at all. Could you please rephrase? What
commit does that, what you describe?

> Can you guys please test amd-staging-drm-next as well and check if the
> problem occurs there as well. If not then please bisect what fixed it.

Bernd and I seem to have different problems – or I updated user space not
triggering the problematic path anymore or did not do the steps to reproduce it
(although starting GDM should have been enough).

Anyway, why should the fix be bisected? To apply it to stable?

Bernd, if you have time, it’d be great, if you listed the commits here, which
you needed to apply on top to fix the other regressions.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (17 preceding siblings ...)
  2019-02-18  8:08 ` bugzilla-daemon
@ 2019-02-18  9:02 ` bugzilla-daemon
  2019-02-18 22:26 ` bugzilla-daemon
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-18  9:02 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #19 from Christian König (christian.koenig@amd.com) ---
(In reply to Paul Menzel from comment #18)
> (In reply to Christian König from comment #17)
> > We completely disabled the feature added in "5d35ed4832d" for upstreaming
> > later on.
> 
> Sorry, I do not understand your reply at all. Could you please rephrase?
> What commit does that, what you describe?

Commit 5d35ed4832d is a bug fix for bulk moves, which is a feature which should
be completely disabled in 4.20. So your bisecting is most likely incorrect.

> > Can you guys please test amd-staging-drm-next as well and check if the
> > problem occurs there as well. If not then please bisect what fixed it.
> 
> Bernd and I seem to have different problems – or I updated user space not
> triggering the problematic path anymore or did not do the steps to reproduce
> it (although starting GDM should have been enough).
> 
> Anyway, why should the fix be bisected? To apply it to stable?

Yes, exactly.

It looks like that 4.20 is either using bulk moves (which it shouldn't) or we
have introduced another problem which also caused memory leaks.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (18 preceding siblings ...)
  2019-02-18  9:02 ` bugzilla-daemon
@ 2019-02-18 22:26 ` bugzilla-daemon
  2019-02-18 22:28 ` bugzilla-daemon
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-18 22:26 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #20 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
(In reply to Christian König from comment #17)
> We completely disabled the feature added in "5d35ed4832d" for upstreaming
> later on.
> 
> Can you guys please test amd-staging-drm-next as well and check if the
> problem occurs there as well. If not then please bisect what fixed it.
Would've been nice to point me to the corresponding repo as well.
Don't worry, I've figured it out, but still would've been nice.
In any case, current HEAD of amd-staging-drm-next looks good to me, I can't
reproduce the memleaks with that one.

I'll try to find the fix, but that'll take me 2-3 days.

(In reply to Paul Menzel from comment #18)
> Bernd, if you have time, it’d be great, if you listed the commits here,
> which you needed to apply on top to fix the other regressions.
Most importantly 9d27e39d309c93025ae6aa97236af15bef2a5f1f, which says it's for
Carrizo, but it seems to affect my Kaveri as well, which wouldn't be surprising
since the two are related.
But on your Ryzen(?) system, this one might not be necessary.
I also applied 03651735fbded39f608163718f816ab9cf14fba7 on top for a wider
range of commits after 972a21f94631642d6714bb2a1983b7b15a77526d since otherwise
the system would freeze very quickly.
But even with that one applied the mentioned id above is very unstable and I
have only about 1min or so to do my tests.
Still that was enough time to do the tests at least twice and show that there
is the same flood of memory leaks with pretty much the same function sequences.

(In reply to Christian König from comment #19)
> 
> Commit 5d35ed4832d is a bug fix for bulk moves, which is a feature which
> should be completely disabled in 4.20. So your bisecting is most likely
> incorrect.
> 
Well, as I said, I'm not 100% sure, because I had to apply two patches to be
even able to test.
But I've repeated my tests with those two versions earlier on and came to the
same result.
b995795bf09b6bb7847a2a9fc8e6b5b4ab0ce20c does show exactly 6 memleaks to me and
those are the 2 acpi ones I mentioned above and 4 showing hid function
sequences, but nothing with drm or similar.
One commit later (5d35ed4832d) with the same two patches applied it's a
different story and I get 60 or more memleaks listed, which you have to admit
look an awful lot similar to what I've posted for 5.0-rc1 above (I'll upload
the log in a minute).
Now that could be pure coincidence, but I would be surprised if it was.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (19 preceding siblings ...)
  2019-02-18 22:26 ` bugzilla-daemon
@ 2019-02-18 22:28 ` bugzilla-daemon
  2019-02-19 19:27 ` bugzilla-daemon
  2019-02-19 21:00 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-18 22:28 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #21 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
Created attachment 281201
  --> https://bugzilla.kernel.org/attachment.cgi?id=281201&action=edit
kmemleak output with 4.19.0-rc1 5d35ed4832da

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (20 preceding siblings ...)
  2019-02-18 22:28 ` bugzilla-daemon
@ 2019-02-19 19:27 ` bugzilla-daemon
  2019-02-19 21:00 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-19 19:27 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #22 from Paul Menzel (pmenzel+bugzilla.kernel.org@molgen.mpg.de) ---
Ok, being back at the system after some days, I see the kmemleaks are still
present with Linux 5.0-rc6+.

Bernd, what triggers this on your system? What is your test case? Start some
program?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [Bug 202537] amdgpu/DC failed to reserve new abo buffer before flip
  2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
                   ` (21 preceding siblings ...)
  2019-02-19 19:27 ` bugzilla-daemon
@ 2019-02-19 21:00 ` bugzilla-daemon
  22 siblings, 0 replies; 24+ messages in thread
From: bugzilla-daemon @ 2019-02-19 21:00 UTC (permalink / raw)
  To: dri-devel

https://bugzilla.kernel.org/show_bug.cgi?id=202537

--- Comment #23 from Bernd Steinhauser (linux@bernd-steinhauser.de) ---
(In reply to Paul Menzel from comment #22)
> Bernd, what triggers this on your system? What is your test case? Start some
> program?
basically start the system, log in, ensuring that /sys/kernel/debug/kmemleak is
empty, then initiating the scan and waiting for the result.
I found that testing without the login (starting sddm in my case) can be enough
to spot the memleaks, but you can't be sure.
Also, I think that putting some more work there for the gpu (e.g. playing a
video) helps to spot more memleaks quicker, thus getting a more reliable result
quicker, but I it doesn't seem necessary.

In case I don't find memleaks, I still repeat the scan routing a few times, do
something else in the meantime (like preparing the next test version) and then
before rebooting do the scan once more, just to be sure.
So in total – on my rather slow system – every version is tested for about
30min, although in case of a bad version about 5min is enough.

Anyway, back to the original topic. bisecting this time went much more smoothly
and much quicker than before and I can actually present the result already, see
below.
I tried to apply the fix on top of 4.20.10, but that doesn't compile as it most
likely depends on other commits.
So unfortunately can't cross-check this at the moment.
@Paul: might be a good idea if you check this as well, meaning to test
b61857b5e and its parent.

git bisect start '--term-old' 'unfixed' '--term-new' 'fixed'
# unfixed: [8fe28cb58bcb235034b64cbbb7550a8a43fd88be] Linux 4.20
git bisect unfixed 8fe28cb58bcb235034b64cbbb7550a8a43fd88be
# fixed: [256445aee13f4de36cb47c13a9560b5d74faacd2] drm/amdgpu: remove some old
unused dpm helpers
git bisect fixed 256445aee13f4de36cb47c13a9560b5d74faacd2
# unfixed: [e0c38a4d1f196a4b17d2eba36afff8f656a4f1de] Merge
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect unfixed e0c38a4d1f196a4b17d2eba36afff8f656a4f1de
# unfixed: [9ef10340749e1da0c7fde609cedd5360f8484a0b] Merge tag
'xtensa-20181228' of git://github.com/jcmvbkbc/linux-xtensa
git bisect unfixed 9ef10340749e1da0c7fde609cedd5360f8484a0b
# unfixed: [fcf010449ebe1db0cb68b2c6410972a782f2bd14] Merge tag 'kgdb-4.21-rc1'
of git://git.kernel.org/pub/scm/linux/kernel/git/danielt/linux
git bisect unfixed fcf010449ebe1db0cb68b2c6410972a782f2bd14
# unfixed: [9b286efeb5eb5aaa2712873fc1f928b2f879dbde] Merge branch 'for-linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect unfixed 9b286efeb5eb5aaa2712873fc1f928b2f879dbde
# unfixed: [ac5eed2b41776b05cf03aac761d3bb5e64eea24c] Merge branch
'perf-urgent-for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect unfixed ac5eed2b41776b05cf03aac761d3bb5e64eea24c
# unfixed: [5dc3fc5a7835f6b98184d2b8df909c5230c37a2c] drm/amd/display: Check if
registers are available before accessing
git bisect unfixed 5dc3fc5a7835f6b98184d2b8df909c5230c37a2c
# fixed: [87076c8829465b8ae71225f7e639e0e28ab4b4a2] drm/amd/display: Re-enable
CRC capture following modeset
git bisect fixed 87076c8829465b8ae71225f7e639e0e28ab4b4a2
# fixed: [84d3245599f527138c4d4b87deed14a7e85cd81b] drm/amdgpu: Add missing
power attribute to APU check
git bisect fixed 84d3245599f527138c4d4b87deed14a7e85cd81b
# unfixed: [ae6d343541bb75958e9535d056adaf4ff6a66d6a] drm/ttm: add lru notify
to bo driver v2
git bisect unfixed ae6d343541bb75958e9535d056adaf4ff6a66d6a
# fixed: [5d50fcbda7b0acd301bb1fc3d828df0aa29237b8] drm/ttm: stop always moving
BOs on the LRU on page fault
git bisect fixed 5d50fcbda7b0acd301bb1fc3d828df0aa29237b8
# fixed: [d7337ca2640cde21ff178bd78f01d94cd5ea2e08] drm/amd/powerplay: support
retrieving and adjusting SOC clock power levels V2
git bisect fixed d7337ca2640cde21ff178bd78f01d94cd5ea2e08
# fixed: [b61857b5e365889d67a6296c413df396032d374d] drm/amdgpu: set
bulk_moveable to false when lru changed v2
git bisect fixed b61857b5e365889d67a6296c413df396032d374d
# first fixed commit: [b61857b5e365889d67a6296c413df396032d374d] drm/amdgpu:
set bulk_moveable to false when lru changed v2
commit b61857b5e365889d67a6296c413df396032d374d
Author: Chunming Zhou <david1.zhou@amd.com>
Date:   Thu Jan 10 15:49:54 2019 +0800

    drm/amdgpu: set bulk_moveable to false when lru changed v2
····
    if lru is changed, we cannot do bulk moving.
    v2:
    root bo isn't in bulk moving, skip its change.
····
    Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
    Reviewed-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 3544338af6c797a518386198369dc4766961d151
392a4c14309bd108b20046609138f7bc2859f3f7 M      drivers

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2019-02-19 21:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-09 14:17 [Bug 202537] New: amdgpu/DC failed to reserve new abo buffer before flip bugzilla-daemon
2019-02-09 14:27 ` [Bug 202537] " bugzilla-daemon
2019-02-11 10:47 ` bugzilla-daemon
2019-02-11 17:09 ` bugzilla-daemon
2019-02-11 17:20 ` bugzilla-daemon
2019-02-11 17:50 ` bugzilla-daemon
2019-02-11 18:59 ` bugzilla-daemon
2019-02-12  9:07 ` bugzilla-daemon--- via dri-devel
2019-02-12 15:15 ` bugzilla-daemon--- via dri-devel
2019-02-12 17:09 ` bugzilla-daemon--- via dri-devel
2019-02-14  7:34 ` bugzilla-daemon--- via dri-devel
2019-02-15 22:37 ` bugzilla-daemon--- via dri-devel
2019-02-16 19:17 ` bugzilla-daemon--- via dri-devel
2019-02-17  8:07 ` bugzilla-daemon--- via dri-devel
2019-02-17  8:36 ` bugzilla-daemon--- via dri-devel
2019-02-17  8:57 ` bugzilla-daemon--- via dri-devel
2019-02-17 10:00 ` bugzilla-daemon--- via dri-devel
2019-02-18  7:45 ` bugzilla-daemon
2019-02-18  8:08 ` bugzilla-daemon
2019-02-18  9:02 ` bugzilla-daemon
2019-02-18 22:26 ` bugzilla-daemon
2019-02-18 22:28 ` bugzilla-daemon
2019-02-19 19:27 ` bugzilla-daemon
2019-02-19 21:00 ` bugzilla-daemon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.