* Questions about KMS flip
@ 2021-11-04 12:51 Christian König
2021-11-04 16:44 ` Harry Wentland
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-04 12:51 UTC (permalink / raw)
To: Michel Dänzer, Wentland, Harry, Daniel Vetter, dri-devel
Hi guys,
adding the usual suspects which might know that of hand: When we do a
KMS page flip, who keeps the reference to the BO while it is scanned out?
We are running into warning backtraces from TTM which look more than odd.
Thanks,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-04 12:51 Questions about KMS flip Christian König
@ 2021-11-04 16:44 ` Harry Wentland
2021-11-05 12:20 ` Ville Syrjälä
2021-11-05 18:13 ` Daniel Vetter
0 siblings, 2 replies; 28+ messages in thread
From: Harry Wentland @ 2021-11-04 16:44 UTC (permalink / raw)
To: Christian König, Michel Dänzer, Daniel Vetter,
dri-devel, Kazlauskas, Nicholas
+Nick
It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
Harry
On 2021-11-04 08:51, Christian König wrote:
> Hi guys,
>
> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
>
> We are running into warning backtraces from TTM which look more than odd.
>
> Thanks,
> Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-04 16:44 ` Harry Wentland
@ 2021-11-05 12:20 ` Ville Syrjälä
2021-11-05 18:13 ` Daniel Vetter
1 sibling, 0 replies; 28+ messages in thread
From: Ville Syrjälä @ 2021-11-05 12:20 UTC (permalink / raw)
To: Harry Wentland
Cc: dri-devel, Michel Dänzer, Christian König, Kazlauskas,
Nicholas
On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> +Nick
>
> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
BTW looks like you have a possible leak during fb init;
amdgpu_display_framebuffer_init() grabs the refs to the BOs,
but drm_framebuffer_init() might still fail (at least
theoretically) which will then leak those BO refs.
>
> Harry
>
> On 2021-11-04 08:51, Christian König wrote:
> > Hi guys,
> >
> > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> >
> > We are running into warning backtraces from TTM which look more than odd.
> >
> > Thanks,
> > Christian.
--
Ville Syrjälä
Intel
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-04 16:44 ` Harry Wentland
2021-11-05 12:20 ` Ville Syrjälä
@ 2021-11-05 18:13 ` Daniel Vetter
2021-11-08 7:44 ` Christian König
1 sibling, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-05 18:13 UTC (permalink / raw)
To: Harry Wentland
Cc: dri-devel, Michel Dänzer, Christian König, Kazlauskas,
Nicholas
On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> +Nick
>
> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
Yup plane state holds reference for its entire existing (well this holds
in general as design principle for atomic state structs, just makes life
easier). And the plane state is guaranteed to exist from when we first pin
(prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
hook).
Out of curiosity, what's blowing up?
-Daniel
>
> Harry
>
> On 2021-11-04 08:51, Christian König wrote:
> > Hi guys,
> >
> > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> >
> > We are running into warning backtraces from TTM which look more than odd.
> >
> > Thanks,
> > Christian.
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-05 18:13 ` Daniel Vetter
@ 2021-11-08 7:44 ` Christian König
2021-11-08 14:59 ` Daniel Vetter
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-08 7:44 UTC (permalink / raw)
To: Daniel Vetter, Harry Wentland
Cc: Michel Dänzer, Kazlauskas, Nicholas, dri-devel
Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
>> +Nick
>>
>> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> Yup plane state holds reference for its entire existing (well this holds
> in general as design principle for atomic state structs, just makes life
> easier). And the plane state is guaranteed to exist from when we first pin
> (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> hook).
>
> Out of curiosity, what's blowing up?
The TTM pin count warning. What happens is that we try to free up a BO
while it is still being pinned.
My best guess is that some DMA-buf client is doing something wrong, but
it could of course also be that the BO was pinned for scanout.
Christian.
> -Daniel
>
>> Harry
>>
>> On 2021-11-04 08:51, Christian König wrote:
>>> Hi guys,
>>>
>>> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
>>>
>>> We are running into warning backtraces from TTM which look more than odd.
>>>
>>> Thanks,
>>> Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-08 7:44 ` Christian König
@ 2021-11-08 14:59 ` Daniel Vetter
2021-11-10 10:18 ` Christian König
0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-08 14:59 UTC (permalink / raw)
To: Christian König; +Cc: Michel Dänzer, dri-devel, Kazlauskas, Nicholas
On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
> Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> > On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> > > +Nick
> > >
> > > It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> > Yup plane state holds reference for its entire existing (well this holds
> > in general as design principle for atomic state structs, just makes life
> > easier). And the plane state is guaranteed to exist from when we first pin
> > (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> > hook).
> >
> > Out of curiosity, what's blowing up?
>
> The TTM pin count warning. What happens is that we try to free up a BO while
> it is still being pinned.
>
> My best guess is that some DMA-buf client is doing something wrong, but it
> could of course also be that the BO was pinned for scanout.
We check in dma_buf_release whether there's anything left over, so I think
the dma-buf scenario is rather unlikely.
I guess worst case we could add a cookie struct to dma_buf_pin that you
need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
maybe implement that in ttm even. Otherwise I don't have good ideas.
-Daniel
>
> Christian.
>
> > -Daniel
> >
> > > Harry
> > >
> > > On 2021-11-04 08:51, Christian König wrote:
> > > > Hi guys,
> > > >
> > > > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > > >
> > > > We are running into warning backtraces from TTM which look more than odd.
> > > >
> > > > Thanks,
> > > > Christian.
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-08 14:59 ` Daniel Vetter
@ 2021-11-10 10:18 ` Christian König
2021-11-10 13:26 ` Daniel Vetter
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-10 10:18 UTC (permalink / raw)
To: Daniel Vetter; +Cc: Michel Dänzer, Kazlauskas, Nicholas, dri-devel
Am 08.11.21 um 15:59 schrieb Daniel Vetter:
> On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
>> Am 05.11.21 um 19:13 schrieb Daniel Vetter:
>>> On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
>>>> +Nick
>>>>
>>>> It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
>>> Yup plane state holds reference for its entire existing (well this holds
>>> in general as design principle for atomic state structs, just makes life
>>> easier). And the plane state is guaranteed to exist from when we first pin
>>> (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
>>> hook).
>>>
>>> Out of curiosity, what's blowing up?
>> The TTM pin count warning. What happens is that we try to free up a BO while
>> it is still being pinned.
>>
>> My best guess is that some DMA-buf client is doing something wrong, but it
>> could of course also be that the BO was pinned for scanout.
> We check in dma_buf_release whether there's anything left over, so I think
> the dma-buf scenario is rather unlikely.
That was unfortunately only added quite recently and is currently
backported to older kernels.
> I guess worst case we could add a cookie struct to dma_buf_pin that you
> need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
> maybe implement that in ttm even. Otherwise I don't have good ideas.
I was thinking about something similar for ttm_bo_pin().
BTW: How would I do that? I know that dump_stack() prints the current
stack trace into the system log, but how would I get this as string?
Thanks,
Christian.
> -Daniel
>
>> Christian.
>>
>>> -Daniel
>>>
>>>> Harry
>>>>
>>>> On 2021-11-04 08:51, Christian König wrote:
>>>>> Hi guys,
>>>>>
>>>>> adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
>>>>>
>>>>> We are running into warning backtraces from TTM which look more than odd.
>>>>>
>>>>> Thanks,
>>>>> Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-10 10:18 ` Christian König
@ 2021-11-10 13:26 ` Daniel Vetter
2021-11-12 12:47 ` Christian König
0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2021-11-10 13:26 UTC (permalink / raw)
To: Christian König; +Cc: Michel Dänzer, dri-devel, Kazlauskas, Nicholas
On Wed, Nov 10, 2021 at 11:18:02AM +0100, Christian König wrote:
> Am 08.11.21 um 15:59 schrieb Daniel Vetter:
> > On Mon, Nov 08, 2021 at 08:44:24AM +0100, Christian König wrote:
> > > Am 05.11.21 um 19:13 schrieb Daniel Vetter:
> > > > On Thu, Nov 04, 2021 at 12:44:34PM -0400, Harry Wentland wrote:
> > > > > +Nick
> > > > >
> > > > > It looks to be the old drm_plane_state->fb holds that reference. See dm_plane_helper_cleanup_fb() in amdgpu_dm.c.
> > > > Yup plane state holds reference for its entire existing (well this holds
> > > > in general as design principle for atomic state structs, just makes life
> > > > easier). And the plane state is guaranteed to exist from when we first pin
> > > > (prepare_fb plane hook) to when it's getting unpinned (cleanup_fb plane
> > > > hook).
> > > >
> > > > Out of curiosity, what's blowing up?
> > > The TTM pin count warning. What happens is that we try to free up a BO while
> > > it is still being pinned.
> > >
> > > My best guess is that some DMA-buf client is doing something wrong, but it
> > > could of course also be that the BO was pinned for scanout.
> > We check in dma_buf_release whether there's anything left over, so I think
> > the dma-buf scenario is rather unlikely.
>
> That was unfortunately only added quite recently and is currently backported
> to older kernels.
>
> > I guess worst case we could add a cookie struct to dma_buf_pin that you
> > need to pass to dma_buf_unpin, and wherein we can capture a backtrace. Or
> > maybe implement that in ttm even. Otherwise I don't have good ideas.
>
> I was thinking about something similar for ttm_bo_pin().
>
> BTW: How would I do that? I know that dump_stack() prints the current stack
> trace into the system log, but how would I get this as string?
stack depot, not stuff it into a string. stack depot is a lot more
efficient and capturing backtraces, since it de-dupes and hashes and all
you need is a pointer iirc. linux/stackdepot.h for interface, I think we
recently gained a few more users of it in drm. It's _really_ nifty.
-Daniel
>
> Thanks,
> Christian.
>
> > -Daniel
> >
> > > Christian.
> > >
> > > > -Daniel
> > > >
> > > > > Harry
> > > > >
> > > > > On 2021-11-04 08:51, Christian König wrote:
> > > > > > Hi guys,
> > > > > >
> > > > > > adding the usual suspects which might know that of hand: When we do a KMS page flip, who keeps the reference to the BO while it is scanned out?
> > > > > >
> > > > > > We are running into warning backtraces from TTM which look more than odd.
> > > > > >
> > > > > > Thanks,
> > > > > > Christian.
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-10 13:26 ` Daniel Vetter
@ 2021-11-12 12:47 ` Christian König
0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-12 12:47 UTC (permalink / raw)
To: Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
Cc: Michel Dänzer, amd-gfx list, dri-devel
Hi guys,
Am 10.11.21 um 14:26 schrieb Daniel Vetter:
> [SNIP]
> stack depot, not stuff it into a string. stack depot is a lot more
> efficient and capturing backtraces, since it de-dupes and hashes and all
> you need is a pointer iirc. linux/stackdepot.h for interface, I think we
> recently gained a few more users of it in drm. It's _really_ nifty.
> -Daniel
thanks for the pointer Daniel, that framework indeed looks like it can
become handy.
Anyway this unfortunately turned out to be work for Harray and Nicholas.
In detail it's about this bug report here:
https://bugzilla.kernel.org/show_bug.cgi?id=214621
Lang was able to reproduce the issue and narrow it down to the pin in
amdgpu_display_crtc_page_flip_target().
In other words we somehow have an unbalanced pinning of the scanout
buffer in DC.
What should we do about that?
Regards,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
@ 2021-11-12 12:47 ` Christian König
0 siblings, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-12 12:47 UTC (permalink / raw)
To: Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
Cc: Michel Dänzer, amd-gfx list, Daniel Vetter, dri-devel
Hi guys,
Am 10.11.21 um 14:26 schrieb Daniel Vetter:
> [SNIP]
> stack depot, not stuff it into a string. stack depot is a lot more
> efficient and capturing backtraces, since it de-dupes and hashes and all
> you need is a pointer iirc. linux/stackdepot.h for interface, I think we
> recently gained a few more users of it in drm. It's _really_ nifty.
> -Daniel
thanks for the pointer Daniel, that framework indeed looks like it can
become handy.
Anyway this unfortunately turned out to be work for Harray and Nicholas.
In detail it's about this bug report here:
https://bugzilla.kernel.org/show_bug.cgi?id=214621
Lang was able to reproduce the issue and narrow it down to the pin in
amdgpu_display_crtc_page_flip_target().
In other words we somehow have an unbalanced pinning of the scanout
buffer in DC.
What should we do about that?
Regards,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 12:47 ` Christian König
(?)
@ 2021-11-12 14:29 ` Michel Dänzer
2021-11-12 14:30 ` Michel Dänzer
-1 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 14:29 UTC (permalink / raw)
To: Christian König, Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
Cc: dri-devel, amd-gfx list
On 2021-11-12 13:47, Christian König wrote:
>
> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>
> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>
> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
dm_plane_helper_cleanup_fb.
With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 14:29 ` Michel Dänzer
@ 2021-11-12 14:30 ` Michel Dänzer
2021-11-12 15:03 ` Christian König
0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 14:30 UTC (permalink / raw)
To: Christian König, Harry Wentland, Kazlauskas, Nicholas, Yu, Lang
Cc: amd-gfx list, dri-devel
On 2021-11-12 15:29, Michel Dänzer wrote:
> On 2021-11-12 13:47, Christian König wrote:
>>
>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>
>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>
>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>
> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> dm_plane_helper_cleanup_fb.
>
>
> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
This should say amdgpu_display_unpin_work_func.
> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>
>
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 14:30 ` Michel Dänzer
@ 2021-11-12 15:03 ` Christian König
2021-11-12 16:10 ` Michel Dänzer
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-12 15:03 UTC (permalink / raw)
To: Michel Dänzer, Christian König, Harry Wentland,
Kazlauskas, Nicholas, Yu, Lang
Cc: dri-devel, amd-gfx list
Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> On 2021-11-12 15:29, Michel Dänzer wrote:
>> On 2021-11-12 13:47, Christian König wrote:
>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>>
>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>
>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>> dm_plane_helper_cleanup_fb.
>>
>>
>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> This should say amdgpu_display_unpin_work_func.
Ah! So that is the classic (e.g. non atomic) path?
>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
Nope, my educated guess is rather that we free up the BO before
amdgpu_display_unpin_work_func is called.
E.g. not pin unbalance, but rather use after free.
Regards,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 15:03 ` Christian König
@ 2021-11-12 16:10 ` Michel Dänzer
2021-11-15 6:41 ` Lang Yu
2021-11-15 7:25 ` Christian König
0 siblings, 2 replies; 28+ messages in thread
From: Michel Dänzer @ 2021-11-12 16:10 UTC (permalink / raw)
To: Christian König, Christian König, Harry Wentland,
Kazlauskas, Nicholas, Yu, Lang
Cc: amd-gfx list, dri-devel
On 2021-11-12 16:03, Christian König wrote:
> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>> On 2021-11-12 13:47, Christian König wrote:
>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://bugzilla.kernel.org/show_bug.cgi?id=214621
>>>>
>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>
>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>> dm_plane_helper_cleanup_fb.
>>>
>>>
>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>> This should say amdgpu_display_unpin_work_func.
>
> Ah! So that is the classic (e.g. non atomic) path?
Presumably.
>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>
> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>
> E.g. not pin unbalance, but rather use after free.
amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 16:10 ` Michel Dänzer
@ 2021-11-15 6:41 ` Lang Yu
2021-11-15 7:25 ` Christian König
1 sibling, 0 replies; 28+ messages in thread
From: Lang Yu @ 2021-11-15 6:41 UTC (permalink / raw)
To: Michel DDDnzer
Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
Christian KKKnig
On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> On 2021-11-12 16:03, Christian König wrote:
> > Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>> On 2021-11-12 13:47, Christian König wrote:
> >>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
> >>>>
> >>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>
> >>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>> dm_plane_helper_cleanup_fb.
> >>>
> >>>
> >>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >> This should say amdgpu_display_unpin_work_func.
> >
> > Ah! So that is the classic (e.g. non atomic) path?
>
> Presumably.
>
>
> >>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >
> > Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >
> > E.g. not pin unbalance, but rather use after free.
>
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
Actually, each call to amdgpu_display_crtc_page_flip_target() will
1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
(crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
Next call.
The problem is the pinned buffer of last call to
amdgpu_display_crtc_page_flip_target() is not unpinned.
It should be an imbalance call to pin/unpin.
>
> --
> Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fJroRJc4SYuUXX1U52X%2FktYpAKaADg3kM9PsONnKu8c%3D&reserved=0
> Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
@ 2021-11-15 6:41 ` Lang Yu
0 siblings, 0 replies; 28+ messages in thread
From: Lang Yu @ 2021-11-15 6:41 UTC (permalink / raw)
To: Michel DDDnzer
Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
Harry Wentland, Christian KKKnig
On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> On 2021-11-12 16:03, Christian König wrote:
> > Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>> On 2021-11-12 13:47, Christian König wrote:
> >>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
> >>>>
> >>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>
> >>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>> dm_plane_helper_cleanup_fb.
> >>>
> >>>
> >>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >> This should say amdgpu_display_unpin_work_func.
> >
> > Ah! So that is the classic (e.g. non atomic) path?
>
> Presumably.
>
>
> >>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >
> > Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >
> > E.g. not pin unbalance, but rather use after free.
>
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
Actually, each call to amdgpu_display_crtc_page_flip_target() will
1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
(crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
Next call.
The problem is the pinned buffer of last call to
amdgpu_display_crtc_page_flip_target() is not unpinned.
It should be an imbalance call to pin/unpin.
>
> --
> Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=fJroRJc4SYuUXX1U52X%2FktYpAKaADg3kM9PsONnKu8c%3D&reserved=0
> Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-12 16:10 ` Michel Dänzer
2021-11-15 6:41 ` Lang Yu
@ 2021-11-15 7:25 ` Christian König
1 sibling, 0 replies; 28+ messages in thread
From: Christian König @ 2021-11-15 7:25 UTC (permalink / raw)
To: Michel Dänzer, Christian König, Harry Wentland,
Kazlauskas, Nicholas, Yu, Lang
Cc: amd-gfx list, dri-devel
Am 12.11.21 um 17:10 schrieb Michel Dänzer:
> On 2021-11-12 16:03, Christian König wrote:
>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7Cchristian.koenig%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302340621335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=pvLGq%2FJRvVy0k5GMGF2UPotCSdbiQNfndtjI14luAUg%3D&reserved=0
>>>>>
>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>
>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>> dm_plane_helper_cleanup_fb.
>>>>
>>>>
>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>> This should say amdgpu_display_unpin_work_func.
>> Ah! So that is the classic (e.g. non atomic) path?
> Presumably.
>
>
>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>
>> E.g. not pin unbalance, but rather use after free.
> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
Yeah, seen that in the meantime as well.
But we also have a WARN_ON() when the pincount overruns, so that can't
be it either.
Long story short I have no idea what's going on here.
Regards,
Christian.
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 6:41 ` Lang Yu
(?)
@ 2021-11-15 8:38 ` Michel Dänzer
2021-11-15 9:04 ` Lang Yu
-1 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15 8:38 UTC (permalink / raw)
To: Lang Yu
Cc: Christian König, dri-devel, Kazlauskas, Nicholas,
amd-gfx list, Christian König
On 2021-11-15 07:41, Lang Yu wrote:
> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>> On 2021-11-12 16:03, Christian König wrote:
>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cca557eab16864ab544a108d9a5f6f288%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637723302338811983%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=9oG6k22BVp%2BkSvRX6uMlGXc6G%2BA5UL0Qy3W5iDTpvzs%3D&reserved=0
>>>>>>
>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>
>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>> dm_plane_helper_cleanup_fb.
>>>>>
>>>>>
>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>> This should say amdgpu_display_unpin_work_func.
>>>
>>> Ah! So that is the classic (e.g. non atomic) path?
>>
>> Presumably.
>>
>>
>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>
>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>
>>> E.g. not pin unbalance, but rather use after free.
>>
>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
>
>
> Actually, each call to amdgpu_display_crtc_page_flip_target() will
>
> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
>
> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> Next call.
>
> The problem is the pinned buffer of last call to
> amdgpu_display_crtc_page_flip_target() is not unpinned.
It's unpinned in dce_v*_0_crtc_disable.
I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 8:38 ` Michel Dänzer
@ 2021-11-15 9:04 ` Lang Yu
2021-11-15 9:49 ` Michel Dänzer
0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-15 9:04 UTC (permalink / raw)
To: Michel DDDnzer
Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
Christian KKKnig
On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> On 2021-11-15 07:41, Lang Yu wrote:
> > On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >> On 2021-11-12 16:03, Christian König wrote:
> >>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=od%2BNksWOff%2FtsuAYZLX7lIJFQJMY2OScVqhLclPYWAQ%3D&reserved=0
> >>>>>>
> >>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>
> >>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>> dm_plane_helper_cleanup_fb.
> >>>>>
> >>>>>
> >>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>> This should say amdgpu_display_unpin_work_func.
> >>>
> >>> Ah! So that is the classic (e.g. non atomic) path?
> >>
> >> Presumably.
> >>
> >>
> >>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>
> >>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>
> >>> E.g. not pin unbalance, but rather use after free.
> >>
> >> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >
> >
> > Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >
> > 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> > (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >
> > 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> > Next call.
> >
> > The problem is the pinned buffer of last call to
> > amdgpu_display_crtc_page_flip_target() is not unpinned.
>
> It's unpinned in dce_v*_0_crtc_disable.
I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
So it's not unpinned...
>
> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
>
>
> --
> Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=o2m1p98zVr16n58wdOWBWyEzETMAfEGqMYAtaAZozgo%3D&reserved=0
> Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 9:04 ` Lang Yu
@ 2021-11-15 9:49 ` Michel Dänzer
2021-11-15 11:31 ` Lang Yu
0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15 9:49 UTC (permalink / raw)
To: Lang Yu
Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
Christian KKKnig
On 2021-11-15 10:04, Lang Yu wrote:
> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
>> On 2021-11-15 07:41, Lang Yu wrote:
>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>>>> On 2021-11-12 16:03, Christian König wrote:
>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C766e41a1c30544442b6b08d9a81358b0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725623316543180%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=od%2BNksWOff%2FtsuAYZLX7lIJFQJMY2OScVqhLclPYWAQ%3D&reserved=0
>>>>>>>>
>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>>>
>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>>>> dm_plane_helper_cleanup_fb.
>>>>>>>
>>>>>>>
>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>>>> This should say amdgpu_display_unpin_work_func.
>>>>>
>>>>> Ah! So that is the classic (e.g. non atomic) path?
>>>>
>>>> Presumably.
>>>>
>>>>
>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>>>
>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>>>
>>>>> E.g. not pin unbalance, but rather use after free.
>>>>
>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
>>>
>>>
>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
>>>
>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
>>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
>>>
>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
>>> Next call.
>>>
>>> The problem is the pinned buffer of last call to
>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
>>
>> It's unpinned in dce_v*_0_crtc_disable.
>
> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> So it's not unpinned...
__drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
Have you checked for the issue I described below? Should be pretty easy to catch.
>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 9:49 ` Michel Dänzer
@ 2021-11-15 11:31 ` Lang Yu
2021-11-15 12:04 ` Michel Dänzer
0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-15 11:31 UTC (permalink / raw)
To: Michel DDDnzer
Cc: Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas, dri-devel,
Christian KKKnig
On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
> On 2021-11-15 10:04, Lang Yu wrote:
> > On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> >> On 2021-11-15 07:41, Lang Yu wrote:
> >>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >>>> On 2021-11-12 16:03, Christian König wrote:
> >>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3D&reserved=0
> >>>>>>>>
> >>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>>>
> >>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>>>> dm_plane_helper_cleanup_fb.
> >>>>>>>
> >>>>>>>
> >>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>>>> This should say amdgpu_display_unpin_work_func.
> >>>>>
> >>>>> Ah! So that is the classic (e.g. non atomic) path?
> >>>>
> >>>> Presumably.
> >>>>
> >>>>
> >>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>>>
> >>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>>>
> >>>>> E.g. not pin unbalance, but rather use after free.
> >>>>
> >>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >>>
> >>>
> >>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >>>
> >>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >>>
> >>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >>> Next call.
> >>>
> >>> The problem is the pinned buffer of last call to
> >>> amdgpu_display_crtc_page_flip_target() is not unpinned.
> >>
> >> It's unpinned in dce_v*_0_crtc_disable.
> >
> > I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> > So it's not unpinned...
>
> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
>
> Have you checked for the issue I described below? Should be pretty easy to catch.
>
>
> >> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
never called. Though a single call to dce_v*_0_crtc_do_set_base() will
only pin the BO, I found it will be unpinned in next call to
dce_v*_0_crtc_do_set_base(). Anyway, these pin/unpin operations looks
error-prone.
Regards,
Lang
>
> --
> Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=EppaFKsy78tecP6oIVVuBh3enb%2Fu8jurugqEwUxDCYk%3D&reserved=0
> Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 11:31 ` Lang Yu
@ 2021-11-15 12:04 ` Michel Dänzer
2021-11-16 3:27 ` Lang Yu
0 siblings, 1 reply; 28+ messages in thread
From: Michel Dänzer @ 2021-11-15 12:04 UTC (permalink / raw)
To: Lang Yu
Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
Christian KKKnig
On 2021-11-15 12:31, Lang Yu wrote:
> On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
>> On 2021-11-15 10:04, Lang Yu wrote:
>>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
>>>> On 2021-11-15 07:41, Lang Yu wrote:
>>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
>>>>>> On 2021-11-12 16:03, Christian König wrote:
>>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
>>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
>>>>>>>>> On 2021-11-12 13:47, Christian König wrote:
>>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7Cee54c4d055d040ef9f8b08d9a81d3eb9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725665833112900%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=7nwIYd1um420XHVpOzeIvz37%2FLQqHF%2F6aRKfzgxUTnM%3D&reserved=0
>>>>>>>>>>
>>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
>>>>>>>>>>
>>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
>>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
>>>>>>>>> dm_plane_helper_cleanup_fb.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
>>>>>>>> This should say amdgpu_display_unpin_work_func.
>>>>>>>
>>>>>>> Ah! So that is the classic (e.g. non atomic) path?
>>>>>>
>>>>>> Presumably.
>>>>>>
>>>>>>
>>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
>>>>>>>
>>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
>>>>>>>
>>>>>>> E.g. not pin unbalance, but rather use after free.
>>>>>>
>>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
>>>>>
>>>>>
>>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
>>>>>
>>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
>>>>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
>>>>>
>>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
>>>>> Next call.
>>>>>
>>>>> The problem is the pinned buffer of last call to
>>>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
>>>>
>>>> It's unpinned in dce_v*_0_crtc_disable.
>>>
>>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
>>> So it's not unpinned...
>>
>> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
>>
>> Have you checked for the issue I described below? Should be pretty easy to catch.
>>
>>
>>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
>
> Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
> never called.
It would be expected to happen when the screen turns off, e.g. due to DPMS.
> Though a single call to dce_v*_0_crtc_do_set_base() will
> only pin the BO, I found it will be unpinned in next call to
> dce_v*_0_crtc_do_set_base().
Yeah, that's the normal case when the new BO is different from the old one.
To catch the case I described, try something like
diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
index 18a7b3bd633b..5726bd87a355 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
@@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
return r;
if (!atomic) {
+ WARN_ON_ONCE(target_fb == fb);
r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
if (unlikely(r != 0)) {
amdgpu_bo_unreserve(abo);
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-15 12:04 ` Michel Dänzer
@ 2021-11-16 3:27 ` Lang Yu
2021-11-16 7:14 ` Christian König
0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-16 3:27 UTC (permalink / raw)
To: Michel DDDnzer
Cc: Christian KKKnig, dri-devel, Kazlauskas, Nicholas, amd-gfx list,
Christian KKKnig
On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> On 2021-11-15 12:31, Lang Yu wrote:
> > On Mon, Nov 15, 2021 at 10:49:39AM +0100, Michel DDDnzer wrote:
> >> On 2021-11-15 10:04, Lang Yu wrote:
> >>> On Mon, Nov 15, 2021 at 09:38:47AM +0100, Michel DDDnzer wrote:
> >>>> On 2021-11-15 07:41, Lang Yu wrote:
> >>>>> On Fri, Nov 12, 2021 at 05:10:27PM +0100, Michel DDDnzer wrote:
> >>>>>> On 2021-11-12 16:03, Christian König wrote:
> >>>>>>> Am 12.11.21 um 15:30 schrieb Michel Dänzer:
> >>>>>>>> On 2021-11-12 15:29, Michel Dänzer wrote:
> >>>>>>>>> On 2021-11-12 13:47, Christian König wrote:
> >>>>>>>>>> Anyway this unfortunately turned out to be work for Harray and Nicholas. In detail it's about this bug report here: https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugzilla.kernel.org%2Fshow_bug.cgi%3Fid%3D214621&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=Qz2VKDE0%2BcIuXZMy0IL3NLZAjeXLimwl8PPZh4Cp4iE%3D&reserved=0
> >>>>>>>>>>
> >>>>>>>>>> Lang was able to reproduce the issue and narrow it down to the pin in amdgpu_display_crtc_page_flip_target().
> >>>>>>>>>>
> >>>>>>>>>> In other words we somehow have an unbalanced pinning of the scanout buffer in DC.
> >>>>>>>>> DC doesn't use amdgpu_display_crtc_page_flip_target AFAICT. The corresponding pin with DC would be in dm_plane_helper_prepare_fb, paired with the unpin in
> >>>>>>>>> dm_plane_helper_cleanup_fb.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> With non-DC, the pin in amdgpu_display_crtc_page_flip_target is paired with the unpin in dm_plane_helper_cleanup_fb
> >>>>>>>> This should say amdgpu_display_unpin_work_func.
> >>>>>>>
> >>>>>>> Ah! So that is the classic (e.g. non atomic) path?
> >>>>>>
> >>>>>> Presumably.
> >>>>>>
> >>>>>>
> >>>>>>>>> & dce_v*_crtc_disable. One thing I notice is that the pin is guarded by if (!adev->enable_virtual_display), but the unpins seem unconditional. So could this be about virtual display, and the problem is actually trying to unpin a BO that was never pinned?
> >>>>>>>
> >>>>>>> Nope, my educated guess is rather that we free up the BO before amdgpu_display_unpin_work_func is called.
> >>>>>>>
> >>>>>>> E.g. not pin unbalance, but rather use after free.
> >>>>>>
> >>>>>> amdgpu_display_crtc_page_flip_target calls amdgpu_bo_ref(work->old_abo), and amdgpu_display_unpin_work_func calls amdgpu_bo_unref(&work->old_abo) only after amdgpu_bo_unpin. So what you describe could only happen if there's an imbalance elsewhere such that amdgpu_bo_unref is called more often than amdgpu_bo_ref, or maybe if amdgpu_bo_reserve fails in amdgpu_display_unpin_work_func (in which case the "failed to reserve buffer after flip" error message should appear in dmesg).
> >>>>>
> >>>>>
> >>>>> Actually, each call to amdgpu_display_crtc_page_flip_target() will
> >>>>>
> >>>>> 1, init a work(amdgpu_display_unpin_work_func) to unpin an old buffer
> >>>>> (crtc->primary->fb), the work will be queued in dce_vX_0_pageflip_irq().
> >>>>>
> >>>>> 2, pin a new buffer, assign it to crtc->primary->fb. But how to unpin it?
> >>>>> Next call.
> >>>>>
> >>>>> The problem is the pinned buffer of last call to
> >>>>> amdgpu_display_crtc_page_flip_target() is not unpinned.
> >>>>
> >>>> It's unpinned in dce_v*_0_crtc_disable.
> >>>
> >>> I just found crtc->primary->fb is NULL when came in dce_v*_0_crtc_disable().
> >>> So it's not unpinned...
> >>
> >> __drm_helper_disable_unused_functions sets crtc->primary->fb = NULL only after calling crtc_funcs->disable. Maybe this path can get hit for a CRTC which was already disabled, in which case crtc->primary->fb == NULL in dce_v*_0_crtc_disable is harmless.
> >>
> >> Have you checked for the issue I described below? Should be pretty easy to catch.
> >>
> >>
> >>>> I think I've found the problem though: dce_v*_0_crtc_do_set_base pin the BO from target_fb unconditionally, but unpin the BO from the fb parameter only if it's different from the former. So if they're the same, the BO's pin count is incremented by 1.
> >
> > Form my observations, amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is
> > never called.
>
> It would be expected to happen when the screen turns off, e.g. due to DPMS.
>
>
> > Though a single call to dce_v*_0_crtc_do_set_base() will
> > only pin the BO, I found it will be unpinned in next call to
> > dce_v*_0_crtc_do_set_base().
>
> Yeah, that's the normal case when the new BO is different from the old one.
>
> To catch the case I described, try something like
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>
> index 18a7b3bd633b..5726bd87a355 100644
>
> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>
> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>
> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>
> return r;
>
>
>
> if (!atomic) {
>
> + WARN_ON_ONCE(target_fb == fb);
>
> r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>
> if (unlikely(r != 0)) {
>
> amdgpu_bo_unreserve(abo);
>
I did some tests, the warning can be triggered.
pin/unpin operations in *_crtc_do_set_base() and
amdgpu_display_crtc_page_flip_target() are mixed.
Regards,
Lang
>
> --
> Earthling Michel Dänzer | https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2F&data=04%7C01%7CLang.Yu%40amd.com%7C07b2a4c13d2f4eba9ebb08d9a8300cd3%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637725746610767105%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=jct%2F2NG7LamVRzzpkSy9XjQIDZRBIutSeK3VQ2jh524%3D&reserved=0
> Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-16 3:27 ` Lang Yu
@ 2021-11-16 7:14 ` Christian König
2021-11-16 8:00 ` Lang Yu
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-16 7:14 UTC (permalink / raw)
To: Lang Yu, Michel DDDnzer
Cc: dri-devel, Kazlauskas, Nicholas, amd-gfx list, Christian KKKnig
Am 16.11.21 um 04:27 schrieb Lang Yu:
> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>> [SNIP]
>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>> only pin the BO, I found it will be unpinned in next call to
>>> dce_v*_0_crtc_do_set_base().
>> Yeah, that's the normal case when the new BO is different from the old one.
>>
>> To catch the case I described, try something like
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> index 18a7b3bd633b..5726bd87a355 100644
>>
>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>
>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>
>> return r;
>>
>>
>>
>> if (!atomic) {
>>
>> + WARN_ON_ONCE(target_fb == fb);
>>
>> r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>
>> if (unlikely(r != 0)) {
>>
>> amdgpu_bo_unreserve(abo);
>>
> I did some tests, the warning can be triggered.
>
> pin/unpin operations in *_crtc_do_set_base() and
> amdgpu_display_crtc_page_flip_target() are mixed.
Ok sounds like we narrowed down the root cause pretty well.
Question is now how can we fix this? Just not pin the BO when target_fb
== fb?
Thanks,
Christian.
>
> Regards,
> Lang
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-16 7:14 ` Christian König
@ 2021-11-16 8:00 ` Lang Yu
2021-11-16 8:09 ` Christian König
0 siblings, 1 reply; 28+ messages in thread
From: Lang Yu @ 2021-11-16 8:00 UTC (permalink / raw)
To: Christian Koenig
Cc: dri-devel, Michel DDDnzer, Kazlauskas, Nicholas, amd-gfx list,
Christian KKKnig
On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
> Am 16.11.21 um 04:27 schrieb Lang Yu:
> > On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> > > [SNIP]
> > > > Though a single call to dce_v*_0_crtc_do_set_base() will
> > > > only pin the BO, I found it will be unpinned in next call to
> > > > dce_v*_0_crtc_do_set_base().
> > > Yeah, that's the normal case when the new BO is different from the old one.
> > >
> > > To catch the case I described, try something like
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > >
> > > index 18a7b3bd633b..5726bd87a355 100644
> > >
> > > --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> > >
> > > @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> > >
> > > return r;
> > >
> > >
> > >
> > > if (!atomic) {
> > >
> > > + WARN_ON_ONCE(target_fb == fb);
> > >
> > > r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> > >
> > > if (unlikely(r != 0)) {
> > >
> > > amdgpu_bo_unreserve(abo);
> > >
> > I did some tests, the warning can be triggered.
> >
> > pin/unpin operations in *_crtc_do_set_base() and
> > amdgpu_display_crtc_page_flip_target() are mixed.
>
> Ok sounds like we narrowed down the root cause pretty well.
>
> Question is now how can we fix this? Just not pin the BO when target_fb ==
> fb?
That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
any more.
The pin/unpin logic,
1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
old_fb is NULL.
2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
unpins old fb.
3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
.....
x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
If the logic is wrong, please correct me.
Regards,
Lang
> Thanks,
> Christian.
>
> >
> > Regards,
> > Lang
> >
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-16 8:00 ` Lang Yu
@ 2021-11-16 8:09 ` Christian König
2021-11-16 14:10 ` Alex Deucher
0 siblings, 1 reply; 28+ messages in thread
From: Christian König @ 2021-11-16 8:09 UTC (permalink / raw)
To: Lang Yu
Cc: dri-devel, Michel DDDnzer, Kazlauskas, Nicholas, amd-gfx list,
Christian KKKnig
Am 16.11.21 um 09:00 schrieb Lang Yu:
> On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
>> Am 16.11.21 um 04:27 schrieb Lang Yu:
>>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>>>> [SNIP]
>>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>>>> only pin the BO, I found it will be unpinned in next call to
>>>>> dce_v*_0_crtc_do_set_base().
>>>> Yeah, that's the normal case when the new BO is different from the old one.
>>>>
>>>> To catch the case I described, try something like
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> index 18a7b3bd633b..5726bd87a355 100644
>>>>
>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>
>>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>>>
>>>> return r;
>>>>
>>>>
>>>>
>>>> if (!atomic) {
>>>>
>>>> + WARN_ON_ONCE(target_fb == fb);
>>>>
>>>> r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>>>
>>>> if (unlikely(r != 0)) {
>>>>
>>>> amdgpu_bo_unreserve(abo);
>>>>
>>> I did some tests, the warning can be triggered.
>>>
>>> pin/unpin operations in *_crtc_do_set_base() and
>>> amdgpu_display_crtc_page_flip_target() are mixed.
>> Ok sounds like we narrowed down the root cause pretty well.
>>
>> Question is now how can we fix this? Just not pin the BO when target_fb ==
>> fb?
> That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
> any more.
>
> The pin/unpin logic,
>
> 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> old_fb is NULL.
>
> 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> unpins old fb.
>
> 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>
> 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
>
> 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>
> .....
>
> x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
>
> And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
>
> If the logic is wrong, please correct me.
I can't fully judge because I'm not that deep with my head in the old
display code, but from a ten mile high view it sounds sane to me. Michel
what do you think?
BTW: Nicholas are there any plans to get rid of all that stuff? It would
be a really nice cleanup of rather flaky code I think.
Thanks,
Christian.
>
> Regards,
> Lang
>
>> Thanks,
>> Christian.
>>
>>> Regards,
>>> Lang
>>>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-16 8:09 ` Christian König
@ 2021-11-16 14:10 ` Alex Deucher
2021-11-16 14:50 ` Michel Dänzer
0 siblings, 1 reply; 28+ messages in thread
From: Alex Deucher @ 2021-11-16 14:10 UTC (permalink / raw)
To: Christian König
Cc: Michel DDDnzer, amd-gfx list, Christian KKKnig, dri-devel,
Lang Yu, Kazlauskas, Nicholas
On Tue, Nov 16, 2021 at 3:09 AM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 16.11.21 um 09:00 schrieb Lang Yu:
> > On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
> >> Am 16.11.21 um 04:27 schrieb Lang Yu:
> >>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
> >>>> [SNIP]
> >>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
> >>>>> only pin the BO, I found it will be unpinned in next call to
> >>>>> dce_v*_0_crtc_do_set_base().
> >>>> Yeah, that's the normal case when the new BO is different from the old one.
> >>>>
> >>>> To catch the case I described, try something like
> >>>>
> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> index 18a7b3bd633b..5726bd87a355 100644
> >>>>
> >>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
> >>>>
> >>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
> >>>>
> >>>> return r;
> >>>>
> >>>>
> >>>>
> >>>> if (!atomic) {
> >>>>
> >>>> + WARN_ON_ONCE(target_fb == fb);
> >>>>
> >>>> r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
> >>>>
> >>>> if (unlikely(r != 0)) {
> >>>>
> >>>> amdgpu_bo_unreserve(abo);
> >>>>
> >>> I did some tests, the warning can be triggered.
> >>>
> >>> pin/unpin operations in *_crtc_do_set_base() and
> >>> amdgpu_display_crtc_page_flip_target() are mixed.
> >> Ok sounds like we narrowed down the root cause pretty well.
> >>
> >> Question is now how can we fix this? Just not pin the BO when target_fb ==
> >> fb?
> > That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
> > any more.
> >
> > The pin/unpin logic,
> >
> > 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > old_fb is NULL.
> >
> > 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > unpins old fb.
> >
> > 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
> >
> > 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
> > unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
> >
> > 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
> >
> > .....
> >
> > x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
> >
> > And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
> >
> > If the logic is wrong, please correct me.
>
> I can't fully judge because I'm not that deep with my head in the old
> display code, but from a ten mile high view it sounds sane to me. Michel
> what do you think?
>
> BTW: Nicholas are there any plans to get rid of all that stuff? It would
> be a really nice cleanup of rather flaky code I think.
We just need to add analog support to the DC code. Darren was looking into it.
Alex
>
> Thanks,
> Christian.
>
> >
> > Regards,
> > Lang
> >
> >> Thanks,
> >> Christian.
> >>
> >>> Regards,
> >>> Lang
> >>>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: Questions about KMS flip
2021-11-16 14:10 ` Alex Deucher
@ 2021-11-16 14:50 ` Michel Dänzer
0 siblings, 0 replies; 28+ messages in thread
From: Michel Dänzer @ 2021-11-16 14:50 UTC (permalink / raw)
To: Alex Deucher, Christian König
Cc: dri-devel, Lang Yu, Christian KKKnig, amd-gfx list, Kazlauskas, Nicholas
On 2021-11-16 15:10, Alex Deucher wrote:
> On Tue, Nov 16, 2021 at 3:09 AM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>> Am 16.11.21 um 09:00 schrieb Lang Yu:
>>> On Tue, Nov 16, 2021 at 08:14:08AM +0100, Christian KKKnig wrote:
>>>> Am 16.11.21 um 04:27 schrieb Lang Yu:
>>>>> On Mon, Nov 15, 2021 at 01:04:15PM +0100, Michel DDDnzer wrote:
>>>>>> [SNIP]
>>>>>>> Though a single call to dce_v*_0_crtc_do_set_base() will
>>>>>>> only pin the BO, I found it will be unpinned in next call to
>>>>>>> dce_v*_0_crtc_do_set_base().
>>>>>> Yeah, that's the normal case when the new BO is different from the old one.
>>>>>>
>>>>>> To catch the case I described, try something like
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> index 18a7b3bd633b..5726bd87a355 100644
>>>>>>
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/dce_v11_0.c
>>>>>>
>>>>>> @@ -1926,6 +1926,7 @@ static int dce_v11_0_crtc_do_set_base(struct drm_crtc *crtc,
>>>>>>
>>>>>> return r;
>>>>>>
>>>>>>
>>>>>>
>>>>>> if (!atomic) {
>>>>>>
>>>>>> + WARN_ON_ONCE(target_fb == fb);
>>>>>>
>>>>>> r = amdgpu_bo_pin(abo, AMDGPU_GEM_DOMAIN_VRAM);
>>>>>>
>>>>>> if (unlikely(r != 0)) {
>>>>>>
>>>>>> amdgpu_bo_unreserve(abo);
>>>>>>
>>>>> I did some tests, the warning can be triggered.
>>>>>
>>>>> pin/unpin operations in *_crtc_do_set_base() and
>>>>> amdgpu_display_crtc_page_flip_target() are mixed.
>>>> Ok sounds like we narrowed down the root cause pretty well.
>>>>
>>>> Question is now how can we fix this? Just not pin the BO when target_fb ==
>>>> fb?
>>> That worked. I did a few simple tests and didn't observe ttm_bo_release warnings
>>> any more.
>>>
>>> The pin/unpin logic,
>>>
>>> 1, fist crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> old_fb is NULL.
>>>
>>> 2, second crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> unpins old fb.
>>>
>>> 3, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>>>
>>> 4, third crtc_mode_set, dce_v*_0_crtc_do_set_base() pins crtc->primary->fb(new),
>>> unpins old fb (it is pinned in last call to amdgpu_display_crtc_page_flip_target)
>>>
>>> 5, amdgpu_display_crtc_page_flip_target() pin/unpin operations.
>>>
>>> .....
>>>
>>> x, reboot, amdgpu_display_suspend_helper() is called, the last pinned fb was unpinned.
>>>
>>> And I didn't observe amdgpu_bo_unpin() in dce_v*_0_crtc_disable() is called.
>>>
>>> If the logic is wrong, please correct me.
>>
>> I can't fully judge because I'm not that deep with my head in the old
>> display code, but from a ten mile high view it sounds sane to me. Michel
>> what do you think?
Sounds good to me.
Would be nice to address the other issue identified in this thread as well:
The pin in amdgpu_display_crtc_page_flip_target is guarded by if (!adev->enable_virtual_display), but the corresponding unpins in
amdgpu_display_unpin_work_func & dce_v*_0_crtc_disable aren't. This probably results in pin count underflows with virtual display.
--
Earthling Michel Dänzer | https://redhat.com
Libre software enthusiast | Mesa and Xwayland developer
^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2021-11-16 14:50 UTC | newest]
Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-04 12:51 Questions about KMS flip Christian König
2021-11-04 16:44 ` Harry Wentland
2021-11-05 12:20 ` Ville Syrjälä
2021-11-05 18:13 ` Daniel Vetter
2021-11-08 7:44 ` Christian König
2021-11-08 14:59 ` Daniel Vetter
2021-11-10 10:18 ` Christian König
2021-11-10 13:26 ` Daniel Vetter
2021-11-12 12:47 ` Christian König
2021-11-12 12:47 ` Christian König
2021-11-12 14:29 ` Michel Dänzer
2021-11-12 14:30 ` Michel Dänzer
2021-11-12 15:03 ` Christian König
2021-11-12 16:10 ` Michel Dänzer
2021-11-15 6:41 ` Lang Yu
2021-11-15 6:41 ` Lang Yu
2021-11-15 8:38 ` Michel Dänzer
2021-11-15 9:04 ` Lang Yu
2021-11-15 9:49 ` Michel Dänzer
2021-11-15 11:31 ` Lang Yu
2021-11-15 12:04 ` Michel Dänzer
2021-11-16 3:27 ` Lang Yu
2021-11-16 7:14 ` Christian König
2021-11-16 8:00 ` Lang Yu
2021-11-16 8:09 ` Christian König
2021-11-16 14:10 ` Alex Deucher
2021-11-16 14:50 ` Michel Dänzer
2021-11-15 7:25 ` Christian König
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.